Difference between revisions of "Osgeo-library"

From OSGeo
Jump to navigation Jump to search
 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
osgeo-library
+
= osgeo-library =
  
PDF figure/table/equation extraction and semantic search
+
PDF figure, table, equation extraction, document indexing, and semantic search service running on [[Gallery_Container]].
  
Runs in [[Gallery_Container]]
+
It is used by the [[LLM-based_Chat_Assistant|Matrix chat assistant]] to search OSGeo-related documents and retrieve extracted visual elements.
  
 
* GitHub: https://github.com/ominiverdi/osgeo-library
 
* GitHub: https://github.com/ominiverdi/osgeo-library
* CLI command: <code>osgeo-library</code> (as ominiverdi user)
+
* Local path: <code>/home/ominiverdi/github/osgeo-library</code>
 +
* Database: PostgreSQL on [[Gallery_Container|Gallery]]
 +
* MCP server name: <code>doclibrary</code>
 +
 
 +
Started at reboot from the <code>ominiverdi</code> crontab:
 +
 
 +
<pre>
 +
~/github/osgeo-library/servers/start-server.sh
 +
</pre>
 +
 
 +
Verified running processes on <code>osgeo7-gallery</code> include:
 +
 
 +
<pre>
 +
python -m doclibrary.servers.api
 +
python -m doclibrary.servers.mcp
 +
</pre>
 +
 
 +
== What can users ask? ==
 +
 
 +
Examples for the [[LLM-based_Chat_Assistant|Matrix chat assistant]]:
 +
 
 +
<pre>
 +
!oc list documents in the OSGeo library
 +
!oc find the Snyder map projections document
 +
!oc show information about the usgs_snyder document
 +
!oc list equations on page 44 of usgs_snyder
 +
!oc show metadata for page 44 of usgs_snyder
 +
!oc find figures or equations about Mercator projection
 +
</pre>
 +
 
 +
Semantic search requires the [[OSGeo-Knowledge-Embedding-Server]] to be running.
 +
 
 +
== MCP server ==
 +
 
 +
The MCP tools are provided by the <code>doclibrary</code> MCP server:
 +
 
 +
<pre>
 +
/home/ominiverdi/github/osgeo-library/.venv/bin/python -m doclibrary.servers.mcp
 +
</pre>
 +
 
 +
In the [[LLM-based_Chat_Assistant|Matrix chat bridge]], these tools appear with the <code>doclibrary_</code> prefix.
 +
 
 +
== Live status ==
 +
 
 +
At last check, the service reported:
 +
 
 +
<pre>
 +
Database: OK (107 documents)
 +
Embedding server: NOT AVAILABLE
 +
Embedding URL: http://localhost:8094/embedding
 +
</pre>
 +
 
 +
When the embedding server is unavailable, semantic search tools such as <code>doclibrary_search_documents</code> and <code>doclibrary_search_visual_elements</code> return an error.
 +
 
 +
== MCP tools and examples ==
 +
 
 +
=== <code>doclibrary_get_library_status</code> ===
 +
 
 +
Checks document library, database, and embedding server status.
 +
 
 +
Example visible tool call:
 +
 
 +
<pre>
 +
[doclibrary_get_library_status]
 +
</pre>
 +
 
 +
Live result included:
 +
 
 +
<pre>
 +
Database: OK (107 documents)
 +
Embedding server: NOT AVAILABLE
 +
</pre>
 +
 
 +
=== <code>doclibrary_list_documents</code> ===
 +
 
 +
Lists available documents with slugs, titles, page counts, and summaries.
 +
 
 +
Example visible tool call:
 +
 
 +
<pre>
 +
[doclibrary_list_documents]
 +
</pre>
 +
 
 +
Live result examples included:
 +
 
 +
* <code>aibench</code>
 +
* <code>aiseg</code>
 +
* <code>alpine_change</code>
 +
* <code>usgs_snyder</code>
 +
 
 +
=== <code>doclibrary_find_document</code> ===
 +
 
 +
Finds documents by title, slug, or filename.
 +
 
 +
Example visible tool call:
 +
 
 +
<pre>
 +
query=snyder, limit=3 [doclibrary_find_document]
 +
</pre>
 +
 
 +
Live result example:
 +
 
 +
<pre>
 +
slug: usgs_snyder
 +
title: Usgs Snyder
 +
source_file: usgs_snyder1987.pdf
 +
pages: 397
 +
</pre>
 +
 
 +
=== <code>doclibrary_get_document_info</code> ===
 +
 
 +
Shows metadata for a document, including page count, summary, keywords, license, and indexed element counts.
 +
 
 +
Example visible tool call:
 +
 
 +
<pre>
 +
document_slug=usgs_snyder [doclibrary_get_document_info]
 +
</pre>
 +
 
 +
Live result included:
 +
 
 +
<pre>
 +
total_pages: 397
 +
figures: 63
 +
tables: 69
 +
equations: 909
 +
</pre>
 +
 
 +
=== <code>doclibrary_search_documents</code> ===
 +
 
 +
Semantic search over document text and extracted visual elements.
 +
 
 +
Example visible tool call:
 +
 
 +
<pre>
 +
query=mercator projection, limit=3 [doclibrary_search_documents]
 +
</pre>
 +
 
 +
Requires the embedding server.
 +
 
 +
=== <code>doclibrary_search_visual_elements</code> ===
 +
 
 +
Semantic search over figures, tables, equations, charts, and diagrams.
 +
 
 +
Example visible tool call:
 +
 
 +
<pre>
 +
query=mercator equation, element_type=equation, document_slug=usgs_snyder, limit=3 [doclibrary_search_visual_elements]
 +
</pre>
 +
 
 +
Requires the embedding server.
 +
 
 +
=== <code>doclibrary_list_elements</code> ===
 +
 
 +
Lists extracted elements from a document, optionally filtered by type or page.
 +
 
 +
Example visible tool call:
 +
 
 +
<pre>
 +
document_slug=usgs_snyder, element_type=equation, page=44, limit=5 [doclibrary_list_elements]
 +
</pre>
 +
 
 +
Live result examples included:
 +
 
 +
* <code>Equation (5-10a)</code>
 +
* <code>Equation (5-10b)</code>
 +
* <code>Equation (5-11) and (5-12)</code>
 +
* <code>Equation (5-12a)</code>
 +
 
 +
=== <code>doclibrary_get_element_details</code> ===
 +
 
 +
Gets metadata and description for a specific extracted element.
 +
 
 +
Example visible tool call:
 +
 
 +
<pre>
 +
document_slug=usgs_snyder, element_label=Equation (5-10a), page_number=44 [doclibrary_get_element_details]
 +
</pre>
 +
 
 +
=== <code>doclibrary_get_element_image</code> ===
 +
 
 +
Returns the cropped image for a specific figure, table, equation, chart, or diagram.
 +
 
 +
Example visible tool call:
 +
 
 +
<pre>
 +
document_slug=usgs_snyder, element_label=Equation (5-10a), page_number=44 [doclibrary_get_element_image]
 +
</pre>
 +
 
 +
=== <code>doclibrary_get_page_image</code> ===
 +
 
 +
Returns the full page image for a document page.
 +
 
 +
Example visible tool call:
 +
 
 +
<pre>
 +
document_slug=usgs_snyder, page_number=44 [doclibrary_get_page_image]
 +
</pre>
 +
 
 +
=== <code>doclibrary_get_page_metadata</code> ===
 +
 
 +
Returns page text, summary, keywords, size, and visual elements without transferring the page image.
 +
 
 +
Example visible tool call:
 +
 
 +
<pre>
 +
document_slug=usgs_snyder, page_number=44 [doclibrary_get_page_metadata]
 +
</pre>
 +
 
 +
Live result included page 44 summary and keywords for oblique/transverse map projection formulas.
 +
 
 +
=== <code>doclibrary_list_documents_paginated</code> ===
 +
 
 +
Lists documents with pagination, summaries, keywords, and license information.
 +
 
 +
Example visible tool call:
 +
 
 +
<pre>
 +
page=1, page_size=3 [doclibrary_list_documents_paginated]
 +
</pre>
 +
 
 +
== Related services ==
 +
 
 +
* [[LLM-based Chat Assistant]]
 +
* [[OSGeo-Knowledge-Embedding-Server]]
 +
 
 +
'''Contact:''' ominiverdi, Lorenzo Becchi, or SAC channel.
 +
 
 +
[[Category:Services]]
 +
[[Category:AI-Services]]

Latest revision as of 11:09, 5 June 2026

osgeo-library

PDF figure, table, equation extraction, document indexing, and semantic search service running on Gallery_Container.

It is used by the Matrix chat assistant to search OSGeo-related documents and retrieve extracted visual elements.

Started at reboot from the ominiverdi crontab:

~/github/osgeo-library/servers/start-server.sh

Verified running processes on osgeo7-gallery include:

python -m doclibrary.servers.api
python -m doclibrary.servers.mcp

What can users ask?

Examples for the Matrix chat assistant:

!oc list documents in the OSGeo library
!oc find the Snyder map projections document
!oc show information about the usgs_snyder document
!oc list equations on page 44 of usgs_snyder
!oc show metadata for page 44 of usgs_snyder
!oc find figures or equations about Mercator projection

Semantic search requires the OSGeo-Knowledge-Embedding-Server to be running.

MCP server

The MCP tools are provided by the doclibrary MCP server:

/home/ominiverdi/github/osgeo-library/.venv/bin/python -m doclibrary.servers.mcp

In the Matrix chat bridge, these tools appear with the doclibrary_ prefix.

Live status

At last check, the service reported:

Database: OK (107 documents)
Embedding server: NOT AVAILABLE
Embedding URL: http://localhost:8094/embedding

When the embedding server is unavailable, semantic search tools such as doclibrary_search_documents and doclibrary_search_visual_elements return an error.

MCP tools and examples

doclibrary_get_library_status

Checks document library, database, and embedding server status.

Example visible tool call:

[doclibrary_get_library_status]

Live result included:

Database: OK (107 documents)
Embedding server: NOT AVAILABLE

doclibrary_list_documents

Lists available documents with slugs, titles, page counts, and summaries.

Example visible tool call:

[doclibrary_list_documents]

Live result examples included:

  • aibench
  • aiseg
  • alpine_change
  • usgs_snyder

doclibrary_find_document

Finds documents by title, slug, or filename.

Example visible tool call:

query=snyder, limit=3 [doclibrary_find_document]

Live result example:

slug: usgs_snyder
title: Usgs Snyder
source_file: usgs_snyder1987.pdf
pages: 397

doclibrary_get_document_info

Shows metadata for a document, including page count, summary, keywords, license, and indexed element counts.

Example visible tool call:

document_slug=usgs_snyder [doclibrary_get_document_info]

Live result included:

total_pages: 397
figures: 63
tables: 69
equations: 909

doclibrary_search_documents

Semantic search over document text and extracted visual elements.

Example visible tool call:

query=mercator projection, limit=3 [doclibrary_search_documents]

Requires the embedding server.

doclibrary_search_visual_elements

Semantic search over figures, tables, equations, charts, and diagrams.

Example visible tool call:

query=mercator equation, element_type=equation, document_slug=usgs_snyder, limit=3 [doclibrary_search_visual_elements]

Requires the embedding server.

doclibrary_list_elements

Lists extracted elements from a document, optionally filtered by type or page.

Example visible tool call:

document_slug=usgs_snyder, element_type=equation, page=44, limit=5 [doclibrary_list_elements]

Live result examples included:

  • Equation (5-10a)
  • Equation (5-10b)
  • Equation (5-11) and (5-12)
  • Equation (5-12a)

doclibrary_get_element_details

Gets metadata and description for a specific extracted element.

Example visible tool call:

document_slug=usgs_snyder, element_label=Equation (5-10a), page_number=44 [doclibrary_get_element_details]

doclibrary_get_element_image

Returns the cropped image for a specific figure, table, equation, chart, or diagram.

Example visible tool call:

document_slug=usgs_snyder, element_label=Equation (5-10a), page_number=44 [doclibrary_get_element_image]

doclibrary_get_page_image

Returns the full page image for a document page.

Example visible tool call:

document_slug=usgs_snyder, page_number=44 [doclibrary_get_page_image]

doclibrary_get_page_metadata

Returns page text, summary, keywords, size, and visual elements without transferring the page image.

Example visible tool call:

document_slug=usgs_snyder, page_number=44 [doclibrary_get_page_metadata]

Live result included page 44 summary and keywords for oblique/transverse map projection formulas.

doclibrary_list_documents_paginated

Lists documents with pagination, summaries, keywords, and license information.

Example visible tool call:

page=1, page_size=3 [doclibrary_list_documents_paginated]

Related services

Contact: ominiverdi, Lorenzo Becchi, or SAC channel.