Difference between revisions of "Osgeo-library"
Ominiverdi (talk | contribs) |
|||
| (5 intermediate revisions by 2 users not shown) | |||
| Line 1: | Line 1: | ||
| − | osgeo-library | + | = osgeo-library = |
| − | + | PDF figure, table, equation extraction, document indexing, and semantic search service running on [[Gallery_Container]]. | |
| − | ** | + | |
| + | It is used by the [[LLM-based_Chat_Assistant|Matrix chat assistant]] to search OSGeo-related documents and retrieve extracted visual elements. | ||
| + | |||
| + | * GitHub: https://github.com/ominiverdi/osgeo-library | ||
| + | * Local path: <code>/home/ominiverdi/github/osgeo-library</code> | ||
| + | * Database: PostgreSQL on [[Gallery_Container|Gallery]] | ||
| + | * MCP server name: <code>doclibrary</code> | ||
| + | |||
| + | Started at reboot from the <code>ominiverdi</code> crontab: | ||
| + | |||
| + | <pre> | ||
| + | ~/github/osgeo-library/servers/start-server.sh | ||
| + | </pre> | ||
| + | |||
| + | Verified running processes on <code>osgeo7-gallery</code> include: | ||
| + | |||
| + | <pre> | ||
| + | python -m doclibrary.servers.api | ||
| + | python -m doclibrary.servers.mcp | ||
| + | </pre> | ||
| + | |||
| + | == What can users ask? == | ||
| + | |||
| + | Examples for the [[LLM-based_Chat_Assistant|Matrix chat assistant]]: | ||
| + | |||
| + | <pre> | ||
| + | !oc list documents in the OSGeo library | ||
| + | !oc find the Snyder map projections document | ||
| + | !oc show information about the usgs_snyder document | ||
| + | !oc list equations on page 44 of usgs_snyder | ||
| + | !oc show metadata for page 44 of usgs_snyder | ||
| + | !oc find figures or equations about Mercator projection | ||
| + | </pre> | ||
| + | |||
| + | Semantic search requires the [[OSGeo-Knowledge-Embedding-Server]] to be running. | ||
| + | |||
| + | == MCP server == | ||
| + | |||
| + | The MCP tools are provided by the <code>doclibrary</code> MCP server: | ||
| + | |||
| + | <pre> | ||
| + | /home/ominiverdi/github/osgeo-library/.venv/bin/python -m doclibrary.servers.mcp | ||
| + | </pre> | ||
| + | |||
| + | In the [[LLM-based_Chat_Assistant|Matrix chat bridge]], these tools appear with the <code>doclibrary_</code> prefix. | ||
| + | |||
| + | == Live status == | ||
| + | |||
| + | At last check, the service reported: | ||
| + | |||
| + | <pre> | ||
| + | Database: OK (107 documents) | ||
| + | Embedding server: NOT AVAILABLE | ||
| + | Embedding URL: http://localhost:8094/embedding | ||
| + | </pre> | ||
| + | |||
| + | When the embedding server is unavailable, semantic search tools such as <code>doclibrary_search_documents</code> and <code>doclibrary_search_visual_elements</code> return an error. | ||
| + | |||
| + | == MCP tools and examples == | ||
| + | |||
| + | === <code>doclibrary_get_library_status</code> === | ||
| + | |||
| + | Checks document library, database, and embedding server status. | ||
| + | |||
| + | Example visible tool call: | ||
| + | |||
| + | <pre> | ||
| + | [doclibrary_get_library_status] | ||
| + | </pre> | ||
| + | |||
| + | Live result included: | ||
| + | |||
| + | <pre> | ||
| + | Database: OK (107 documents) | ||
| + | Embedding server: NOT AVAILABLE | ||
| + | </pre> | ||
| + | |||
| + | === <code>doclibrary_list_documents</code> === | ||
| + | |||
| + | Lists available documents with slugs, titles, page counts, and summaries. | ||
| + | |||
| + | Example visible tool call: | ||
| + | |||
| + | <pre> | ||
| + | [doclibrary_list_documents] | ||
| + | </pre> | ||
| + | |||
| + | Live result examples included: | ||
| + | |||
| + | * <code>aibench</code> | ||
| + | * <code>aiseg</code> | ||
| + | * <code>alpine_change</code> | ||
| + | * <code>usgs_snyder</code> | ||
| + | |||
| + | === <code>doclibrary_find_document</code> === | ||
| + | |||
| + | Finds documents by title, slug, or filename. | ||
| + | |||
| + | Example visible tool call: | ||
| + | |||
| + | <pre> | ||
| + | query=snyder, limit=3 [doclibrary_find_document] | ||
| + | </pre> | ||
| + | |||
| + | Live result example: | ||
| + | |||
| + | <pre> | ||
| + | slug: usgs_snyder | ||
| + | title: Usgs Snyder | ||
| + | source_file: usgs_snyder1987.pdf | ||
| + | pages: 397 | ||
| + | </pre> | ||
| + | |||
| + | === <code>doclibrary_get_document_info</code> === | ||
| + | |||
| + | Shows metadata for a document, including page count, summary, keywords, license, and indexed element counts. | ||
| + | |||
| + | Example visible tool call: | ||
| + | |||
| + | <pre> | ||
| + | document_slug=usgs_snyder [doclibrary_get_document_info] | ||
| + | </pre> | ||
| + | |||
| + | Live result included: | ||
| + | |||
| + | <pre> | ||
| + | total_pages: 397 | ||
| + | figures: 63 | ||
| + | tables: 69 | ||
| + | equations: 909 | ||
| + | </pre> | ||
| + | |||
| + | === <code>doclibrary_search_documents</code> === | ||
| + | |||
| + | Semantic search over document text and extracted visual elements. | ||
| + | |||
| + | Example visible tool call: | ||
| + | |||
| + | <pre> | ||
| + | query=mercator projection, limit=3 [doclibrary_search_documents] | ||
| + | </pre> | ||
| + | |||
| + | Requires the embedding server. | ||
| + | |||
| + | === <code>doclibrary_search_visual_elements</code> === | ||
| + | |||
| + | Semantic search over figures, tables, equations, charts, and diagrams. | ||
| + | |||
| + | Example visible tool call: | ||
| + | |||
| + | <pre> | ||
| + | query=mercator equation, element_type=equation, document_slug=usgs_snyder, limit=3 [doclibrary_search_visual_elements] | ||
| + | </pre> | ||
| + | |||
| + | Requires the embedding server. | ||
| + | |||
| + | === <code>doclibrary_list_elements</code> === | ||
| + | |||
| + | Lists extracted elements from a document, optionally filtered by type or page. | ||
| + | |||
| + | Example visible tool call: | ||
| + | |||
| + | <pre> | ||
| + | document_slug=usgs_snyder, element_type=equation, page=44, limit=5 [doclibrary_list_elements] | ||
| + | </pre> | ||
| + | |||
| + | Live result examples included: | ||
| + | |||
| + | * <code>Equation (5-10a)</code> | ||
| + | * <code>Equation (5-10b)</code> | ||
| + | * <code>Equation (5-11) and (5-12)</code> | ||
| + | * <code>Equation (5-12a)</code> | ||
| + | |||
| + | === <code>doclibrary_get_element_details</code> === | ||
| + | |||
| + | Gets metadata and description for a specific extracted element. | ||
| + | |||
| + | Example visible tool call: | ||
| + | |||
| + | <pre> | ||
| + | document_slug=usgs_snyder, element_label=Equation (5-10a), page_number=44 [doclibrary_get_element_details] | ||
| + | </pre> | ||
| + | |||
| + | === <code>doclibrary_get_element_image</code> === | ||
| + | |||
| + | Returns the cropped image for a specific figure, table, equation, chart, or diagram. | ||
| + | |||
| + | Example visible tool call: | ||
| + | |||
| + | <pre> | ||
| + | document_slug=usgs_snyder, element_label=Equation (5-10a), page_number=44 [doclibrary_get_element_image] | ||
| + | </pre> | ||
| + | |||
| + | === <code>doclibrary_get_page_image</code> === | ||
| + | |||
| + | Returns the full page image for a document page. | ||
| + | |||
| + | Example visible tool call: | ||
| + | |||
| + | <pre> | ||
| + | document_slug=usgs_snyder, page_number=44 [doclibrary_get_page_image] | ||
| + | </pre> | ||
| + | |||
| + | === <code>doclibrary_get_page_metadata</code> === | ||
| + | |||
| + | Returns page text, summary, keywords, size, and visual elements without transferring the page image. | ||
| + | |||
| + | Example visible tool call: | ||
| + | |||
| + | <pre> | ||
| + | document_slug=usgs_snyder, page_number=44 [doclibrary_get_page_metadata] | ||
| + | </pre> | ||
| + | |||
| + | Live result included page 44 summary and keywords for oblique/transverse map projection formulas. | ||
| + | |||
| + | === <code>doclibrary_list_documents_paginated</code> === | ||
| + | |||
| + | Lists documents with pagination, summaries, keywords, and license information. | ||
| + | |||
| + | Example visible tool call: | ||
| + | |||
| + | <pre> | ||
| + | page=1, page_size=3 [doclibrary_list_documents_paginated] | ||
| + | </pre> | ||
| + | |||
| + | == Related services == | ||
| + | |||
| + | * [[LLM-based Chat Assistant]] | ||
| + | * [[OSGeo-Knowledge-Embedding-Server]] | ||
| + | |||
| + | '''Contact:''' ominiverdi, Lorenzo Becchi, or SAC channel. | ||
| + | |||
| + | [[Category:Services]] | ||
| + | [[Category:AI-Services]] | ||
Latest revision as of 11:09, 5 June 2026
osgeo-library
PDF figure, table, equation extraction, document indexing, and semantic search service running on Gallery_Container.
It is used by the Matrix chat assistant to search OSGeo-related documents and retrieve extracted visual elements.
- GitHub: https://github.com/ominiverdi/osgeo-library
- Local path:
/home/ominiverdi/github/osgeo-library - Database: PostgreSQL on Gallery
- MCP server name:
doclibrary
Started at reboot from the ominiverdi crontab:
~/github/osgeo-library/servers/start-server.sh
Verified running processes on osgeo7-gallery include:
python -m doclibrary.servers.api python -m doclibrary.servers.mcp
What can users ask?
Examples for the Matrix chat assistant:
!oc list documents in the OSGeo library !oc find the Snyder map projections document !oc show information about the usgs_snyder document !oc list equations on page 44 of usgs_snyder !oc show metadata for page 44 of usgs_snyder !oc find figures or equations about Mercator projection
Semantic search requires the OSGeo-Knowledge-Embedding-Server to be running.
MCP server
The MCP tools are provided by the doclibrary MCP server:
/home/ominiverdi/github/osgeo-library/.venv/bin/python -m doclibrary.servers.mcp
In the Matrix chat bridge, these tools appear with the doclibrary_ prefix.
Live status
At last check, the service reported:
Database: OK (107 documents) Embedding server: NOT AVAILABLE Embedding URL: http://localhost:8094/embedding
When the embedding server is unavailable, semantic search tools such as doclibrary_search_documents and doclibrary_search_visual_elements return an error.
MCP tools and examples
doclibrary_get_library_status
Checks document library, database, and embedding server status.
Example visible tool call:
[doclibrary_get_library_status]
Live result included:
Database: OK (107 documents) Embedding server: NOT AVAILABLE
doclibrary_list_documents
Lists available documents with slugs, titles, page counts, and summaries.
Example visible tool call:
[doclibrary_list_documents]
Live result examples included:
aibenchaisegalpine_changeusgs_snyder
doclibrary_find_document
Finds documents by title, slug, or filename.
Example visible tool call:
query=snyder, limit=3 [doclibrary_find_document]
Live result example:
slug: usgs_snyder title: Usgs Snyder source_file: usgs_snyder1987.pdf pages: 397
doclibrary_get_document_info
Shows metadata for a document, including page count, summary, keywords, license, and indexed element counts.
Example visible tool call:
document_slug=usgs_snyder [doclibrary_get_document_info]
Live result included:
total_pages: 397 figures: 63 tables: 69 equations: 909
doclibrary_search_documents
Semantic search over document text and extracted visual elements.
Example visible tool call:
query=mercator projection, limit=3 [doclibrary_search_documents]
Requires the embedding server.
doclibrary_search_visual_elements
Semantic search over figures, tables, equations, charts, and diagrams.
Example visible tool call:
query=mercator equation, element_type=equation, document_slug=usgs_snyder, limit=3 [doclibrary_search_visual_elements]
Requires the embedding server.
doclibrary_list_elements
Lists extracted elements from a document, optionally filtered by type or page.
Example visible tool call:
document_slug=usgs_snyder, element_type=equation, page=44, limit=5 [doclibrary_list_elements]
Live result examples included:
Equation (5-10a)Equation (5-10b)Equation (5-11) and (5-12)Equation (5-12a)
doclibrary_get_element_details
Gets metadata and description for a specific extracted element.
Example visible tool call:
document_slug=usgs_snyder, element_label=Equation (5-10a), page_number=44 [doclibrary_get_element_details]
doclibrary_get_element_image
Returns the cropped image for a specific figure, table, equation, chart, or diagram.
Example visible tool call:
document_slug=usgs_snyder, element_label=Equation (5-10a), page_number=44 [doclibrary_get_element_image]
doclibrary_get_page_image
Returns the full page image for a document page.
Example visible tool call:
document_slug=usgs_snyder, page_number=44 [doclibrary_get_page_image]
doclibrary_get_page_metadata
Returns page text, summary, keywords, size, and visual elements without transferring the page image.
Example visible tool call:
document_slug=usgs_snyder, page_number=44 [doclibrary_get_page_metadata]
Live result included page 44 summary and keywords for oblique/transverse map projection formulas.
doclibrary_list_documents_paginated
Lists documents with pagination, summaries, keywords, and license information.
Example visible tool call:
page=1, page_size=3 [doclibrary_list_documents_paginated]
Related services
Contact: ominiverdi, Lorenzo Becchi, or SAC channel.