Geodata Metadata Requirements
- 1 Why this document exists
- 2 Draft Metadata model
- 2.1 Data Set
- 2.2 Person
- 2.3 Organization
- 2.4 Spatial Data Organization
- 2.5 License information
- 2.6 Publication date
- 2.7 timespan
- 2.8 Time Period
- 2.9 Spatial Domain
- 2.10 Taxonomy/Ontology
- 3 References
- 4 Comments
Why this document exists
One goal of the Public Geospatial Data Project is to offer, in the future, a repository of reusable public geographic data that can support open source geospatial software projects, both inside and outside the foundation.
One big requirement for a potential Geodata Repository is that there be a well-defined baseline for metadata. This can be seen as a quality assurance effort - data won't be accepted without a certain amount of metadata.
The Federal Geographic Data Committee metadata standard emphasises conformance, but doesn't emphasise exchangeability / reusability. FDGC is standard for "Spatial Data Infrastucture" efforts, but doesn't have much of a "geospatial web" orientation.
There are some properties in addition to FGDC which it would be really useful to have - different distribution channels like WFS, bittorrent which have come into existence since FGDC was originally defined. For many elements, FGDC asks for full-text descriptions. More structure in descriptions would help with automating discovery or re-use.
This is a "straw-person" set of suggestions, and comment / additional references would be gratefully received.
Draft Metadata model
This picks an arbitrary namespace for an OWL schema that maps to most, if not all, of the FGDC mandatory properties and provides some extra ones.
Title of the data set. Corresponds to Dublin Core title
Text description of the data set. Corresponds to Dublin Core description element.
A person responsible for publication of the data set - name and contact email address. These properties are well-defined in the FOAF vocabulary.
A organization responsible for publication of the data set - name and contact email address. These properties are well-defined in the FOAF vocabulary.
Spatial Data Organization
Vector, Raster or Point data, as described in FGDC. (cf http://biology.usgs.gov/fgdc.metadata/version2/sdorg.htm )
URL from which the data can be downloaded via different protocols
For Vector data in GML
For Raster data described in GML
File at HTTP URL
URL of bittorrent .torrent tracker file.
Other Web API
For example, OpenStreetmap API ( http://wiki.openstreetmap.org/index.php/REST )
Emphasis on public geographic data licenses: PGL, possible LPGL, Public Domain, Creative Commons-type licenses. These can be represented by URLs.
Corresponds to Dublin Core date: ISO compliant date of publication.
start date and end date
A lot of this can be inferred either using GDAL/OGR or collected from a WMS/WFS GetCapabilities. It would be nice to bypass human error on collecting this kind of metadata.
FGDC specifies north, east, west, south bounding co-ordinates. It doesn't specify a projection in which these should be described. For reasons of simplicity it could make sense to require these be in WGS84 (EPSG:4236) - for the same reasons GeoRSS decided to mandate WGS84, rather than complicate matters by dictating that people also specify an SRS.
Original projection of the data (reference to an
Currently undecided; would be good to refer this to current well-known thesauri for data themes.
- Semantic Web for Earth and Environmental Terminology OWL Ontologies at NASA
- Dublin Core metadata model for documents
- FOAF metadata model for people and organisations
- DOAP metadata model for open source software projects and code repositories
metadata isn't an easy task. The balance between completeness and people simply ignoring to generate it...
I wish I had had a prexisting plan of how to index and search for the data sets on extent and 'type' that we were adding
- Specifications of a data set
- Data Type
- Spatial Extent
- Coordinate System/Projection
- Target Scale/Precision
- Attribute Data
- Specifications of a data set
- I would like to propose an additional element for the metadata model--data source (or lineage). If the data is derived from some other data, we should be able to backtrack and look at its parents. "Lineage" is a conditional element in FGDC but I think it's important enough that we should include it in our model.