Why DCLite4G

From OSGeo
Jump to navigation Jump to search

Why DCLite4G

The following points are set out to help answer the questions:

  • Several geospatial metadata standards already exist, right? Is DCLite4G a new one?
  • On what basis are properties of geographic data included in DCLite4G?
  • How will the use of DCLite4G benefit me as a data provider?

Background

The best known metadata standards are the international ISO19115:2003 standard describing Metadata for Geographic Information and the Federal Geographic Data Committee Common Core (hereafter FGDC) guidelines for metadata. At time of writing (April/May 2007) there is an effort to replace the FGDC standard with a North American (US, Canada) specific profile of ISO19915.

However, ISO19115 makes no attempt to describe geospatial web services provide protocol and access specific for data published online. Web Services are described by a separate standard, ISO19119. Nor does ISO19115 offer any recommended format for serialising metadata in order to share it - this is covered by another separate standard, ISO19139. The format is verbose and not common in use, and the full ISO19115 model mandates the inclusion of many properties which are not much use in finding data or in managing it.

Many national Spatial Data Infrastructure initiatives take the approach of recommending, for publication of metadata about geospatial information resources, a profile or subset of ISO19115.

Within the US the FGDC metadata standard is in common use. FGDC is a profile, not a format, for metadata - there is XML schema associated with it but the most common expression of FGDC metadata is in a tab-delimited format.

On the other hand, there is Dublin Core - a generic vocabulary for metadata about all kinds of (electronic) documents and data, common in library science and in a broader information retrieval community. Specifications and standards documents often offer a mapping of geodata properties to Dublin Core.

Given all this, why then seek to define a "new" model for metadata? DCLite4G represents for those who have worked on it, the "simplest useful thing"; a common core that is found within all the existing metadata standards and inside many different software packages.

The minimal model represents an overlap of the FGDC and ISO19115 standard. It corresponds closely to the minimal model proposed in the draft Implementing Rules for Metadata describing spatial data infrastructure capacity as mandated by the INSPIRE Directive.

It is not an intention to prescribe an XML format but to illustrate a model which can be represented using any of the common metadata standard representations. Samples are offered of DCLite4G being used as a 'mix-in' vocabulary in Atom and RDF feeds. One could use DCLite4G to define properties used in an ebRIM description.

An aim is to integrate systems rather than to require certain standards. A common subset of well-known standards for geographic metadata is identified, building on common work in the open source geospatial community. This subset corresponds closely to the draft abstract model described by the Implementing Rules for Metadata in the INSPIRE European SDI. However, more emphasis is placed on machine-readable and repeatable properties of data: unique identifiers which can be used to annotate the provenance and processing history of data sets; URLs to represent license constraints; contact details for people responsible for data sets. The minimal model can be rendered as ISO19115/19139, FGDC, RDF, a Dublin Core Application Profile, GeoRSS/Atom, in KML Metadata: presenting as flexible a face to third party search services as possible.<

1. Making the minimum mandatory

It is better to get some metadata than none. There is a baseline necessary for a repository to do its work, manage rights and publish data useful to clients; this is what DCLite4G was originally designed to provide. The emphasis is on what is needed to rediscover and reuse the data.

2. Focusing on online resources

One reason for complexity in the ISO19115 and FGDC standards is the need to describe legacy, non-computer-based systems of spatial information and maps using the same standard. Separate standards appear to describe information available via web services or modern internet protocols. Meanwhile the documenting and cataloguing of simple use cases, of files and OGC standard web services publishing geographic information, is neglected because of a complexity bar to casual implmentation of the standard. A focus on online resources, and a focus on data and describing it, not on describing metadata and filling it out.

3. Machine Reusability

Emphasis on: nique identifiers which can be used to annotate the provenance and processing history of data sets; URLs to represent license constraints; URLs to represent classification schemes where provided, not references to code lists; contact details for people responsible for data sets.

4. Adaptable Ontology

Taxonomy and ontology are left very open. The use of URI schemes for keywords is recommended. Data annotation distributed. The emphasis is not on how the data is described, but on how it is used.

History of DCLite4G

DCLite4G came about at first from a discussion of minimal model necessary to manage a public Geodata Repository run by OSGeo. The original model was based on the FGDC Common Core standard for geospatial metadata and now overlaps with ISO19115. Stefan Keller came across the model through online discussion and formalised the abstract model approach and the idea of presenting a Dublin Core Application Profile. Jo Walsh collated the different wiki versions into the current DCLite4G model description.

References

  • [UNSDI]
  • [GSDI]
  • Czech / Swiss / other national refs
  • [INSPIRE] Draft Implementing Rules on Metadata
  • [CGD] Commons of Geographic Data, White Paper on Metadata, Onsrud et al