Simple Catalog Interface
This page helps document a conversation in the Geodata Committee about developing simple protocols for discovering and syndicating metadata describing where geodata is available. The conversation is visible and joinable through the osgeo geodata mailing list (archives for Aug. 2006)
- See Also: Geodata Metadata Requirements
Stefan Keller wrote an excellent overview describing some of the background to why there is a felt need for a simple metadata exchange/search interface
Requirements
- we want a simple web interface
- we want it to be as existing-standards-compliant as possible
- we want it to be very easily syndicatable and contributable as possible
- we not only want to register a static resource but also web services
Web addressable interfaces
http://www.gis.hsr.ch/wiki/OAI-PMH - Stefan Keller's comparison of OAI-PMH and WFS
Metadata models
see Geodata Metadata Requirements
Data exchange formats
Implementations
See Also
- log of IRC meeting about service crawlers and metadata exchange on 2006/08/10
- Geodata Committee Eighth Meeting - summary of the above meeting
- Paul Ramsey on simple web services catalogues 2005/10
- more from Paul Ramsey 2006/02
- Andrew Hallam on Lightweight Web Resource Catalogue 2006/06 (good comments)
- Tom Kralidis' owscat
- the mapdex data catalog search API
Notes from Metadata + Catalog BOF at FOSS4G2006
stefan - ISO issues - need to describe services - being merged with ebRIM - compulsory core is 300 attributes - this is overkill for user+producer requirements
- has a vision of how to overcome the problem
tom from owscat - hard time finding implementations of cat-2 that are available - simple requirement to discover service resources - is doing capabilities indexing and spitting out layer / index metadata - search clients are WFS clients
my spiel
schuyler - metacarta - interested in contributing solutions - + helping the geodata committee
adoyle - simple as possible, right as possible - search interfaces - human interfaces - how do text search indexes make it seem so easy - "how do we make it seem so hard?"
nedh - i'm not even sure what CSW is but i think that's what i'm interested in
Josh: CAT was its previous name - a semi-abstract spec - CSW was the description of the HTTP service. ebRIM - CSW - "the spec is big for a reason because it tries to address a lot of different communities" - corba binding, z39.50 binding.
OGC wants to contrib to simple discovery -
interested in simple profile development - work happening in geonetwork
Raj from OGC
tryign to catalogue everything - data, data services, symbols, so much process. 2 parts of simple answer -1/ do we only want to catalogue a class of thing? 2/ what comes out of the information model and letting the programmers develop
Rob Atkinson - can't build simple profiles - wants to catalogue all this kind of thing - too many implementations - little in the way of interop
"care about not the objects, but the relationships between them; this service implements this feature type" . ability to *query* relationships - this is contained in the ebRIM metamodel
rob: "protocol is not the issue"
stef: "profile is not the issue"
getting the information model right
metadata slots bound to *vocabularies* that are domain defined - this sounds much more like OWL - "guide the user through the semantic space"
Jeroen - geonetwork opensource - macmini
tom from owscat - extending WFS - putting metadata model through it. wms 1.1.1 capabilities + wfs 1.0.0 metadata - everyone's waiting for geonetwork to implement this. wfs interesting engine for query. josh suggests small proxy to offer a csw interface over it - a geoserver module.
SLD to WMS - i care about the features that it support - this is what jody is saying - query via SLD so i know what to query for.
size of model. relationship issue. stef talks about OAI-PMH approach jeroen says geonetwork will soon support an OAI-PMH interface
then we still need to define the minimal model
jeroen - in UNEP - work on defining a metadata subset - avoiding later refactoring of metadata - concern about making new versions of existing standards.
"metadata exchange protocol"
data for internal and external use
internal model - smallest common denominator
cascading query services - lightweight interface at the indexing level.
integrating search services. google talks dublin core, talks oai-pmh
rob: flexibility of the meta model - we can't predict what we're going to receive or need to express. - no guarantee of consistency -
how do we agree to agree?
stef - it's not easy to chain services - discovery first, chaining afterwards.
lightweight protocol which doesn't include filter.
OAI for propagation - is the data useful when you propagate it - our information mode
Notes from Metadata + Catalog BOF at FOSS4G2006 (ff.)
Thank you, Jo, for the notes. I hope you are all safe back home from Switzerland! I uploaded my slides at FOSS4G2006. and I took again some time to define an minimal metadata information model regarding to a metadata exchange protocol. --SFK 11:06, 19 September 2006 (CEST)
Metadata information model (proposal)
Some design considerations:
- Based on Dublin Core (DC) and Catalogue Services Specification 2.0.1, OGC 04-021r3, p.22.
- Dublin Core need refined semantics of some properties/attributes.
- Have had hard times with the abundance use of namespaces. This is because DC specs and other XML 'practices' specialize properties/attribute types instead of specializing whole classes.
- All properties/attributes have cardinality 1 except where really needed for automation!
- Take all information one can in an automated manner, e.g. from geodata data sets.
Table:
Attr. name | Attr. type | Explanation | Status |
dc:identifier[1] | string | Unique id to identify a resource (URI); see UUID but also OAI-PMH! | tbd. |
dc:title[1] | string | Title of the resource. | Ok |
dc:description[1] | string | A description of the resource (why dct:abstract?) | Ok |
dc:subject[1] | string | Could be ISO 19115 classification or keyw., comma separated?? | tbd. |
dc:relation[unbounded] | URI | Reference to other data providers or to ‘friends’ as indicated here. | tbd. |
dct:type[1] | string | Type of original resource, like vector, raster, grid geodata. | tbd. |
dct:format[unbounded] | URI | enum of‘http, ftp, WMS, WFS’ (= Well known data access services), ‘Filter Service’ else ‘WSDL’. | tbd. |
dct:modified[1] | date | Date of last (published) change of resource. (Automated sync. from dataset) | Ok |
dct:spatial[1] | dcmiBox:Box with CRS | (Automated sync. from dataset) | Ok? |
dc:language[1] | enum | RFC 1766 (ISO 639, followed optionally by country ISO 3166) | Ok |
dc:source[1] | URL (preferred) or string | Lineage information about the resource | Ok? |
dc:rights[1] | URL (preferred) or string | License information about the resource | Ok? |
dc:publisher[1] | structure (refinement of string) | Civic Address or URI to point to (xAL/KML?) | tbd. |
Remarks:
- DC attributes/properties left as they are...: Audience; Contributor; Creator.
- No additinal DC attributes/properties required; only two of them needed to be specialized (see dct:...); still some attributes/properties need some specialized recommended meaning (see tbd.).
- dct:modified and dct:spatial can be sync'ed from dataset.
- Attribute 'relation': This was'nt discussed yet. Simply helps harvesters to discover more (meta) data providers.
- See for some general explanations about dc/dct: http://cicharvest.grainger.uiuc.edu/qualifieddc.asp
- Note that OAM-PMH puts a XML enveloppe around this metadata and adds a header containing two attributes: 'identifier' to identify an metadata record and 'datestamp' as date of last (published) change of metadata record.
- Assume metadata (as opposite to geodata) is always free and open information.
- An encoding still has to be discussed (see following example). need schemaLocation in OSGeo!?
Example
Notes:
- Example values are purely fictive.
- XML Schema (= geometadc.xsd) still tbd.
- This record is not yet validated!
- Took 'geometadc' as enveloppe name.
<geometadc:qualifieddc xmlns:geometadc="http://www.osgeo.org/schemas/geometa/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dct="http://purl.org/dc/terms/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://http://www.osgeo.org/schemas/geometa/ geometadc.xsd"> <dc:identifier>f264-77d2-09ce-aa39-f0f0</dc:identifier> <dc:title>National Elevation Mapping Service for Texas</dc:title> <dc:description>Elevation data collected for the National Elevation Dataset (NED) based on 30m horizontal and 15m vertical accuracy. </dc:description> <dc:subject>Elevation, Hypsography, and Contours</dc:subject> <dc:relation>f264-77d2-09ce-aa39-f0f0</dc:relation> <dc:type>grid geodata</dc:type> <dc:format>uri:http://www.gis.hsr.ch/wms</dc:format> <dc:format>uri:http://www.gis.hsr.ch/data/poi_data_rapperswil.shp</dc:format> <dct:modified>2004-03-01</dct:modified> <dct:spatial> <Box projection="EPSG:4326" name="Geographic"> <northlimit>34.353</northlimit> <eastlimit>-96.223</eastlimit> <southlimit>28.229</southlimit> <westlimit>-108.44</westlimit> </Box> </dct:spatial> <dc:language>en</dc:language> <dc:source>lineage: ...</dc:source> <dc:rights>uri:http://www.usgs.gov/pubprod/</dc:rights> <dc:publisher>U.S. Geological Survey</dc:publisher> </dct:description>