Osgeo-library

From OSGeo
Jump to navigation Jump to search

osgeo-library

PDF figure, table, equation extraction, document indexing, and semantic search service running on Gallery_Container.

It is used by the Matrix chat assistant to search OSGeo-related documents and retrieve extracted visual elements.

Started at reboot from the ominiverdi crontab:

~/github/osgeo-library/servers/start-server.sh

Verified running processes on osgeo7-gallery include:

python -m doclibrary.servers.api
python -m doclibrary.servers.mcp

What can users ask?

Examples for the Matrix chat assistant:

!oc list documents in the OSGeo library
!oc find the Snyder map projections document
!oc show information about the usgs_snyder document
!oc list equations on page 44 of usgs_snyder
!oc show metadata for page 44 of usgs_snyder
!oc find figures or equations about Mercator projection

Semantic search requires the OSGeo-Knowledge-Embedding-Server to be running.

MCP server

The MCP tools are provided by the doclibrary MCP server:

/home/ominiverdi/github/osgeo-library/.venv/bin/python -m doclibrary.servers.mcp

In the Matrix chat bridge, these tools appear with the doclibrary_ prefix.

Live status

At last check, the service reported:

Database: OK (107 documents)
Embedding server: NOT AVAILABLE
Embedding URL: http://localhost:8094/embedding

When the embedding server is unavailable, semantic search tools such as doclibrary_search_documents and doclibrary_search_visual_elements return an error.

MCP tools and examples

doclibrary_get_library_status

Checks document library, database, and embedding server status.

Example visible tool call:

[doclibrary_get_library_status]

Live result included:

Database: OK (107 documents)
Embedding server: NOT AVAILABLE

doclibrary_list_documents

Lists available documents with slugs, titles, page counts, and summaries.

Example visible tool call:

[doclibrary_list_documents]

Live result examples included:

  • aibench
  • aiseg
  • alpine_change
  • usgs_snyder

doclibrary_find_document

Finds documents by title, slug, or filename.

Example visible tool call:

query=snyder, limit=3 [doclibrary_find_document]

Live result example:

slug: usgs_snyder
title: Usgs Snyder
source_file: usgs_snyder1987.pdf
pages: 397

doclibrary_get_document_info

Shows metadata for a document, including page count, summary, keywords, license, and indexed element counts.

Example visible tool call:

document_slug=usgs_snyder [doclibrary_get_document_info]

Live result included:

total_pages: 397
figures: 63
tables: 69
equations: 909

doclibrary_search_documents

Semantic search over document text and extracted visual elements.

Example visible tool call:

query=mercator projection, limit=3 [doclibrary_search_documents]

Requires the embedding server.

doclibrary_search_visual_elements

Semantic search over figures, tables, equations, charts, and diagrams.

Example visible tool call:

query=mercator equation, element_type=equation, document_slug=usgs_snyder, limit=3 [doclibrary_search_visual_elements]

Requires the embedding server.

doclibrary_list_elements

Lists extracted elements from a document, optionally filtered by type or page.

Example visible tool call:

document_slug=usgs_snyder, element_type=equation, page=44, limit=5 [doclibrary_list_elements]

Live result examples included:

  • Equation (5-10a)
  • Equation (5-10b)
  • Equation (5-11) and (5-12)
  • Equation (5-12a)

doclibrary_get_element_details

Gets metadata and description for a specific extracted element.

Example visible tool call:

document_slug=usgs_snyder, element_label=Equation (5-10a), page_number=44 [doclibrary_get_element_details]

doclibrary_get_element_image

Returns the cropped image for a specific figure, table, equation, chart, or diagram.

Example visible tool call:

document_slug=usgs_snyder, element_label=Equation (5-10a), page_number=44 [doclibrary_get_element_image]

doclibrary_get_page_image

Returns the full page image for a document page.

Example visible tool call:

document_slug=usgs_snyder, page_number=44 [doclibrary_get_page_image]

doclibrary_get_page_metadata

Returns page text, summary, keywords, size, and visual elements without transferring the page image.

Example visible tool call:

document_slug=usgs_snyder, page_number=44 [doclibrary_get_page_metadata]

Live result included page 44 summary and keywords for oblique/transverse map projection formulas.

doclibrary_list_documents_paginated

Lists documents with pagination, summaries, keywords, and license information.

Example visible tool call:

page=1, page_size=3 [doclibrary_list_documents_paginated]

Related services

Contact: ominiverdi, Lorenzo Becchi, or SAC channel.