Geodata Metadata Requirements

= Why this document exists =

One goal of the Public Geospatial Data Project is to offer, in the future, a repository of reusable public geographic data that can support open source geospatial software projects, both inside and outside the foundation.

One big requirement for a potential Geodata Repository is that there be a well-defined baseline for metadata. This can be seen as a quality assurance effort - data won't be accepted without a certain amount of metadata.

The |US Federal Geographic Data Committee metadata standard emphasises conformance, but doesn't emphasise exchangeability / reusability. FDGC is standard for "Spatial Data Infrastucture" efforts, but doesn't have much of a "geospatial web" orientation.

There are some properties in addition to FGDC which it would be really useful to have - different distribution channels like WFS, bittorrent which have come into existence since FGDC was originally defined. For many elements, FGDC asks for full-text descriptions. More structure in descriptions would help with automating discovery or re-use.

This is a "straw-person" set of suggestions, and comment / additional references would be gratefully received.

= Draft Metadata model =

Graph illustrating a basic metadata model generated from an RDF model of what OSGeo Geodata Committee participants have identified as their core needs for metadata.

This picks an arbitrary namespace for an OWL schema that maps to most, if not all, of the FGDC mandatory properties and provides some extra ones.



title
Title of the data set. Corresponds to Dublin Core title

description
Text description of the data set. Corresponds to Dublin Core description element.

Person
A person responsible for publication of the data set - name and contact email address. These properties are well-defined in the FOAF vocabulary.

Organization
A organization responsible for publication of the data set - name and contact email address. These properties are well-defined in the FOAF vocabulary.

Spatial Data Organization
Vector, Raster or Point data, as described in FGDC. (cf http://biology.usgs.gov/fgdc.metadata/version2/sdorg.htm )

datasource
URL from which the data can be downloaded via different protocols

WFS
For Vector data in GML

WMS
For Raster data described in GML

BitTorrent
URL of bittorrent .torrent tracker file.

Other Web API
For example, OpenStreetmap API ( http://wiki.openstreetmap.org/index.php/REST )

License information
Emphasis on public geographic data licenses: PGL, possible LPGL, Public Domain, Creative Commons-type licenses. These can be represented by URLs.

Publication date
Corresponds to Dublin Core date: ISO compliant date of publication.

Spatial Domain
A lot of this can be inferred either using GDAL/OGR or collected from a WMS/WFS GetCapabilities. It would be nice to bypass human error on collecting this kind of metadata.

bounding coordinates
FGDC specifies north, east, west, south bounding co-ordinates. It doesn't specify a projection in which these should be described. For reasons of simplicity it could make sense to require these be in WGS84 (EPSG:4236) - for the same reasons GeoRSS decided to mandate WGS84, rather than complicate matters by dictating that people also specify an SRS.

Projection (Raster, Vector, Coverage)
Original projection of the data (reference to an ?)

Horizontal and vertical datum;

Horizontal and vertical units.

Resolution (Raster,Coverage)
(property of DataSet)

e.g. map units per pixel where map units are defined by SRS

can be different in horizontal / vertical axes e.g. non square pixels

Colour Depth (Raster)
8/16/24 bit etc - this is useful rather than required

Scale (Vector)
Map scale at which vectors are considered accurate

Quantified as a fractional/dimensionless number - 'inches per inch' - on a scale between 1 and 0 - or inverse scale such as 1:50000 - and we would want to store this in a consistent way.

Layers
DataSet has multiple Layers

Extent
can be non-rectangular

Scale Hinting
minscale / maxscale - cf resolution and scale - are these actually properties of layers and not really of data sets? (eg data set contains multiple layers - will they be in any way likely to contain different scale properties?)

Taxonomy/Ontology
Currently undecided; would be good to refer this to current well-known thesauri for data themes.

= Discovery =

Requirements
A discovery resource is essential to expose resultant metadata as per this document. Below are requirements:

Publish

 * ability to publish/register a web service
 * ability to publish/register a static resource
 * ability to harvest and classify public and private resources
 * ability to establish and maintain user/group/role based authentication
 * ability to provide a RESTful authentication mechansim

Find

 * ability to discover the existence of a web service
 * ability to discover the existence of a resource which is available via web services (i.e WMS layer, WFS feature type)
 * ability to discover the existence of a static resource (dataset, document, etc.)

Bind

 * ability to perform discovery operations with spatial, aspatial and temporal predicates
 * ability to provide a RESTful request API
 * ability to provide responses in XML
 * ability to expose resource/service metadata in a manner which facilitates dynamic connection to a resource/service

Other Relevant Info

 * Simple_Catalog_Interface
 * OSGeodata on GISpunkt Wiki - These pages are about the search of an open, lean and mean "protocol for the incremental exchange of metadata about geographic resources between systems". Profiled specifications like WFS or OAI-PMH are currently on our short list. Delving into 'Open Archives Initiative Protocol for Metadata Harvesting' (OAI-PMH) is strongly encouraged. It's a low barrier interoperability specification based around metadata harvesting model, it's stable (subsequent revisions are backwards compatible) and uses unqualified Dublin Core as default metadata information model; there exist open source tools (like OAICat) and it has been adopted among others by Google and Yahoo! but it's not a search protocol.

= References =

Geospatial

 * FGDC geospatial metadata model


 * GEON geospatial metadata model


 * DIF geospatial metadata model


 * GeoRSS


 * WFS

RDF

 * Resource Description Framework


 * RDF Primer


 * OWL Web Ontology Language Guide


 * Semantic Web for Earth and Environmental Terminology OWL Ontologies at NASA


 * Dublin Core metadata model for documents


 * FOAF metadata model for people and organisations


 * DOAP metadata model for open source software projects and code repositories

From Geodata Packaging Working Group:

 * Specifications of a data set
 * Creator
 * Date
 * License
 * Data Type
 * Topic
 * Spatial Extent
 * Coordinate System/Projection
 * Target Scale/Precision
 * Attribute Data