Talk:DCLite4G
OAI-PMH - stefan, why don't we want to require GetRecord ??
- Just thought, ListRecords (incl. temporal query parameters) would be enough for harvesting. But I don't mind. -- Stefan 09:30, 15 March 2007 (CET)
Interfaces
My current thinking after talking to pramsey about Simple Catalog Interfaces is to separate out interfaces for different classes of data search / retrieval tasks. According to this narrative: data, services (including presentation) and "relationships" are three different classes of thing which CSW/ebRIM is trying to treat of all at once and that is why it is being so slow and overcomplex.
Service metadata
Many people are more interested in repositories of information about web services. They want to do realtime access "find-bind".
Package metadata
Right now this is the domain of things like shape files, GML files etc. Theoretically it could be any resource you visit and the resource doesnt change between visits. This also applies to local filesystem and data management.
Publishing metadata
Other people are more interested in the publish side - real data syndication between repositories - the kind of thing that OAI-PRE will one day support. In the meantime we have to make something out of what we have got. Jeroen writes,
For the harvesting of the catalog itself, GeoNetwork has a custom interface/ process. This process will go out to some XML service (provided by another GeoNetwork node at this stage) and will request a very brief result set from the other catalog that contains file identifier (UUID), time stamp and catalog ID. By comparing those with its internally cached records it will decide to request new or updated records and it will remove those not found anymore.
Implementations
Implementations of OAI-PMH
Each entry has a name of the implementation, whether it's a software or a webapplication (or an instance of both), progamming language and an example weblink). Some are using oai_dc 'DClite4G style'.
Harvesters:
- Google (Lang. n/a; Ex. spec. http://www.google.com/sitemaps )
- OAI Home - Tools (Lang. many; Home http://www.openarchives.org/tools/tools.html)
- geometa.info (Lang. Java; Ex.: http://geometa.info )
Metadata Providers:
- GeoNetwork? (Provider; Webapplication/Open source; Lang. Java+XSLT; Ex. tbd.)
- GeoShop.com (Lang. C; Ex. http://urlx.org/ibbinfoshop.ch/1d987 )
- Geometa-Ed. HSR (Lang. PHP; Ex. http://urlx.org/geometa.info/0363d )
- Geometa-Ed. Chur (Lang. PHP; Ex. http://urlx.org/geometa.info/67333 )
- List of OAI tools: http://www.openarchives.org/tools/tools.html
- List of OAI data providers: http://www.openarchives.org/
Candidate Projects (harvesters and metadata providers):
- uDig (Provider+Harvester; Desktop Software/Open source; Lang. Java; Ex. tbd.)
- OSGEO (Provider; Instance/Open Source; Lang. C+Scripts; Ex. tbd.)
- Canton of Aargau? (Provider; Instance/Closed source; Lang .NET; Ex. tbd).
DCLite4G is an effort to
- establish a common information model with mappings to well known formats for geospatial metadata
- provide a vocabulary/recommendations for spatial extensions to Dublin Core
- support simple interfaces for collecting and querying geospatial metadata
Getting involved
To get involved first of all subscribe to the Mailing List (see below). Then read all corresponding pages of this Wiki. If you find that something is missing or incorrect, correct or add it as required. To do this you need to create an account for this Wiki. Sorry, still working on Single Sign On...
Once you get the hang of it and feel like needing more involvement you can join any interest group or project and contribute to the process. This can happen from many perspectives - as a user, portal operator, developer, decision maker or name it.
Mailing List
Feel welcome to join the discussion and development mailing list:
TODO
- extract more of the specifics from Geodata Metadata Requirements. (This has changed a lot from the original version in the history & current version which has data sets differentiated from data sources, which may be files or databases or services)
- update docs on Geodata Metadata Model
- draw some UML to appease people?
Information Model
Core Model
Dublin Core
The Dublin Core Metadata Initiative is an open organization engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models. An introduction to the topic can be read in the Usage Guide.
ISO19115
FGDC
Namespace
The dclite4g namespace provides a common naming scheme for properties specific to geographic data sets and services which are not covered by the existing Dublin Core or GeoRSS standards.
The namespace will come to live at http://xmlns.com/2006/dclite4g/ .
- extents
- accuracy
- scale
fill in here from Geodata Metadata Requirements
References
- Dublin Core lite for Geo
- OGC Catalog Services 2 Specification 6.3.3, Core returnable properties
- OGC ebRIM profile of CSW specification, Appendix B.5, Table B.3 - Slots defined in the Basic package
- iGeoResourceInfo class in uDig
- GeodataCommons Metadata Whitepaper
Query Interface
OAI-PMH
The Harvesting Protocol (version 2.0) specification together with Implementation Guidelines
Following are specific guidelines for a minimal OAI-PMH implementation of a so called 'data provider' using only the mandatory 'unqualified' Dublin Core (DC):
- Only three operations (verbs) are needed: Identify, ListMetadataFormats and ListRecords.
- Following operations are not required (initially): ListIdentifiers, ListSets, GetRecord.
- No incremental harvesting (resumption process for ListXxx operations with more than 1000 records)
- No compression as defined in the OAI-PMH spec. (compression at lower http level still possible)
- Date granularity may be 'day' not seconds (YYYY-MM-DD)
- Keeping track of deleted record may not be supported (deletedRecord=no)
- Mandatory DC supported as data model is sufficient for a start but with specific semantics (e.g. coverage, relation) (see also example below):
- dc:description contains dct:abstract
- dc:coverage contains bounding box encoding as defined in http://georss.org/simple.html#Box
- dc:date means in fact dct:modified
- dc:relation is filled in with dclite4g:onLineSrc. If dc:type='service' dct:hasPart can be derived from GetCapabilities.
Additional bounding box query property for ListRecords.
Others
References
- OWSCat
- Simple Catalog Interface - links to articles, etc
See Also
- http://www.geometa.info/ - German geospatial data search service using dclite4g + OAI-PMH.