Difference between revisions of "DCLite4G"

From OSGeo
Jump to navigation Jump to search
(noted changes to model, not sure what status of Layer is currently)
Line 17: Line 17:
  
 
One goal in this is to make it more possible to trace and annotate the recombination of different sources of data into one package - keeping track of what is known as "lineage" of data packages.  
 
One goal in this is to make it more possible to trace and annotate the recombination of different sources of data into one package - keeping track of what is known as "lineage" of data packages.  
 +
 +
== Data Set ==
 +
 +
An item of geographic information. Should have at least a title and a URI; ideally the URI is that of a resource where it is possible to access the information directly.
  
 
=== Properties ===
 
=== Properties ===
 
{|
 
{|
 
|-
 
|-
| '''Name''' || A name given to a data set. Does not have to be unique or "canonical".
+
| '''Title''' || A name given to a data set. Does not have to be unique or "canonical".
 
|-
 
|-
 
| '''Abstract''' || A text description of the contents of a data set, remarks on anything interesting contained therein.  
 
| '''Abstract''' || A text description of the contents of a data set, remarks on anything interesting contained therein.  
 
|-
 
|-
 
| '''Keywords''' || A layer may have none or many keywords associated with it.
 
| '''Keywords''' || A layer may have none or many keywords associated with it.
 +
|-
 +
| '''Series''' || Indicates a Series to which a data set may belong. (see below)
 
|-
 
|-
 
| '''Publisher''' || Contact information for the Agent publishing the data set
 
| '''Publisher''' || Contact information for the Agent publishing the data set
Line 33: Line 39:
 
| '''Publication Date''' || Date on which the data was made available   
 
| '''Publication Date''' || Date on which the data was made available   
 
|-  
 
|-  
| '''License''' || URL of a data license whose terms cover the distribution of this data set.
+
| '''License''' || URL of a data license whose terms cover the distribution of this data set, OR a text string describing the terms on which the data is made available ("Public Domain")
 
|-
 
|-
| '''Object type''' || Currently, one of 'Vector','Raster','Coverage'
 
|-
 
 
| '''Extents''' || A bounding box or polygon specified in WGS84, regardless of the original projection and datum of the data.
 
| '''Extents''' || A bounding box or polygon specified in WGS84, regardless of the original projection and datum of the data.
 
|-
 
|-
 
| '''Projection''' || The original spatial projection of the data
 
| '''Projection''' || The original spatial projection of the data
 
|-
 
|-
| ''Scale'' || (for Raster, Coverage types)
+
| '''Format''' || If the Source is a service this would be, for example OGC:WMS or OGC:WCS. For files, this is a text identifier which follows as a guideline the output of ''gdalinfo''.
 +
|-
 +
| '''URI''' || A URI giving the location of the data. If this is an OGC web service publishing geographic data, the URI would be that of a HTTP-based GetCapabilities request to it.
 +
|-
 +
| ''Scale'' ||
 
|-
 
|-
| ''Resolution'' || (for Raster, Coverage types)
+
| ''Resolution'' ||  
 
|-
 
|-
| ''Colour Depth'' || (for Raster, Coverage types) e.g. 8, 16, 24 bit
+
| ''Colour Depth'' ||  
|-
+
|-
| ''Transparency'' || (for Raster type)
+
| ''Transparency'' ||  
 
|}
 
|}
  
 
=== Relations ===
 
=== Relations ===
  
 +
* A Data Set can be part of a Series of associated items
 
* A Data Set can have optionally many Layers
 
* A Data Set can have optionally many Layers
* A Data Set can have optionally many Data Sources
 
 
* A Data Set is made available by an Agent  
 
* A Data Set is made available by an Agent  
 
* A Data Set is originally compiled by an Agent
 
* A Data Set is originally compiled by an Agent
 
* A Data Set may be a source of another Data Set
 
* A Data Set may be a source of another Data Set
  
 +
== Series ==
  
== Data Source ==
+
A Series is a collection of Data Sets. It provides a flexible container for a number of Data Sets which have common properties. Thus if all the Data Sets in a Series share a given property (e.g. they all have the same title, or the same date range or the same spatial extents), then that property can be attached to the Series and implicitly inherited by any Data Set belonging to the Series.
 
 
=== Properties ===
 
  
{|
+
Thus a Series can be most usefully used to make an association between:
|-
 
| '''Name''' || If desired, a text name indicating a specific data source (otherwise one can infer that the name is shared with the data set)
 
|-
 
| '''URI''' || A URI giving the location of the data. If this is a W*S resource, the URI would be that of a GetCapabilities request to it.
 
|-
 
| '''Abstract''' || Any notes or other marginalia of interest regarding this data source
 
|-
 
| '''Format''' || If the Source is a service this would be, for example OGC:WMS or OGC:WCS. For files, this is a text identifier which follows as a guideline the output of ''gdalinfo''.
 
|}
 
  
=== Relations ===
+
* A collection of data sets which are adjacent in space, yet represent one temporal snapshot (such as a collection of tiles comprising one larger image or model)
 +
* A collection of data sets which are adjacent in time and may represent one spatial snapshot (such as a sequence of observations collected by an earth imaging satellite)
 +
* A collection of data sets which are "the same" underlying data being made available in different formats, projections, or resolutions.
  
* A Data Source is an instance of exactly one Data Set
+
A Series can carry any of the properties that can be attached to a series.
* A Data Source can be a component of another Data Source (e.g. a shapefile may also be one tile within a tile index
 
  
 
== Layer ==
 
== Layer ==
Line 129: Line 128:
 
Thus DCLite4G attempts not to provide a full model for metadata for geographic information but to reuse properties from other well-known namespaces or ontologies - GeoRSS, Dublin Core, FOAF - and provide a subset of Dublin Core with extra rigour of expression suitable for better machine readability and reuse. This is also what is known as a ''Dublin Core Application Profile''.
 
Thus DCLite4G attempts not to provide a full model for metadata for geographic information but to reuse properties from other well-known namespaces or ontologies - GeoRSS, Dublin Core, FOAF - and provide a subset of Dublin Core with extra rigour of expression suitable for better machine readability and reuse. This is also what is known as a ''Dublin Core Application Profile''.
  
http://xmlns.com/2007/dclite4g/ is the namespace reserved for DCLite4G. It should and will come to contain an OWL ontology showing the structure, providing where possible mappings to various metadata schemas.
+
http://xmlns.com/2008/dclite4g/ is the namespace reserved for DCLite4G. It should and will come to contain an OWL ontology showing the structure, providing where possible mappings to various metadata schemas.
  
 
= Examples =
 
= Examples =
Line 157: Line 156:
 
* [http://www.geodatacommons.umaine.edu/wpapers/CGD%20Metadata%20White%20Paper%20v2.pdf GeodataCommons Metadata Whitepaper]
 
* [http://www.geodatacommons.umaine.edu/wpapers/CGD%20Metadata%20White%20Paper%20v2.pdf GeodataCommons Metadata Whitepaper]
 
* Open FOSSGIS Community [[Response to INSPIRE Metadata Draft]] Implementing Rules
 
* Open FOSSGIS Community [[Response to INSPIRE Metadata Draft]] Implementing Rules
 +
 +
= History =
 +
 +
* July 20th 2008
 +
The first version of the DCLite4G model dates to late 2006/early 2007. In mid 2008 the vocabulary is significantly updated and somewhat simplified, after several iterations of a corresponding data registry and search service. The distinction between an abstract "Data Set" which has properties common to potentially many different "Data Sources" has been dropped. A "Series" is introduced, which takes most of the functions of "Data Set". Thus "Data Source" is now renamed "Data Set", resolving the semantic ambiguity in the model. Please see the history of this page

Revision as of 00:43, 20 July 2008

DCLite4G is short for "Dublin Core Lightweight Profile for Geospatial". It is a minimal information model for metadata about geospatial data. DCLite4G consists of:

  • an abstract information model
  • an implementation reference exploiting common standard vocabularies - Dublin Core, GeoRSS.
  • a namespace used to define extra properties needed to usefully specify the properties of geospatial data.

The URL at which the namespace lives, contains an OWL ontology showing the structure, providing where possible mappings to various metadata schemas.


Information Model

Data Set

A Data Set is an abstract object. It corresponds to the ideal of a data set, independent of a physical form or an encoding in which it is being distributed. For example, "TIGER 2005" or "OpenStreetmap from 2007-04-20", or "SRTM" would be considered data sets.

A Data Set can be associated with one or many Data Sources which correspond to concrete resources which can be got at in order to explore the data. All the properties which are shared across multiple Data Sources, are properties of the Data Set. If the properties of a Source are different - e.g. a smaller spatial extent, or a generalised subset of features in the same extent - then it is an instance of a different Data Set.

One goal in this is to make it more possible to trace and annotate the recombination of different sources of data into one package - keeping track of what is known as "lineage" of data packages.

Data Set

An item of geographic information. Should have at least a title and a URI; ideally the URI is that of a resource where it is possible to access the information directly.

Properties

Title A name given to a data set. Does not have to be unique or "canonical".
Abstract A text description of the contents of a data set, remarks on anything interesting contained therein.
Keywords A layer may have none or many keywords associated with it.
Series Indicates a Series to which a data set may belong. (see below)
Publisher Contact information for the Agent publishing the data set
Originator Contact information for the original creator of the data set
Publication Date Date on which the data was made available
License URL of a data license whose terms cover the distribution of this data set, OR a text string describing the terms on which the data is made available ("Public Domain")
Extents A bounding box or polygon specified in WGS84, regardless of the original projection and datum of the data.
Projection The original spatial projection of the data
Format If the Source is a service this would be, for example OGC:WMS or OGC:WCS. For files, this is a text identifier which follows as a guideline the output of gdalinfo.
URI A URI giving the location of the data. If this is an OGC web service publishing geographic data, the URI would be that of a HTTP-based GetCapabilities request to it.
Scale
Resolution
Colour Depth
Transparency

Relations

  • A Data Set can be part of a Series of associated items
  • A Data Set can have optionally many Layers
  • A Data Set is made available by an Agent
  • A Data Set is originally compiled by an Agent
  • A Data Set may be a source of another Data Set

Series

A Series is a collection of Data Sets. It provides a flexible container for a number of Data Sets which have common properties. Thus if all the Data Sets in a Series share a given property (e.g. they all have the same title, or the same date range or the same spatial extents), then that property can be attached to the Series and implicitly inherited by any Data Set belonging to the Series.

Thus a Series can be most usefully used to make an association between:

  • A collection of data sets which are adjacent in space, yet represent one temporal snapshot (such as a collection of tiles comprising one larger image or model)
  • A collection of data sets which are adjacent in time and may represent one spatial snapshot (such as a sequence of observations collected by an earth imaging satellite)
  • A collection of data sets which are "the same" underlying data being made available in different formats, projections, or resolutions.

A Series can carry any of the properties that can be attached to a series.

Layer

Vector data formats often provide multiple Layers of data as part of a data set. OGC Web Services also often provide multiple Layers through one service. This is useful metadata which it would be a shame to throw away.

Properties

Name The layer's name, free text
Abstract A description of the contents of the layer, with any remarks of interest
Keywords A layer may have none or many keywords associated with it.

Relations

  • A Data Set can have none or many Layers

(note - though we get information about layers through looking at a Data Source, they are related to the abstract Data Set, because the Layers on all the different Sources will be the same.

Agent

An Agent may be a person, an organisation, or a machine agent. This term derives from foaf:Agent, and is used in the same way as 'Principal' is used in security terms.

Properties

Name The name of the agent
Email A contact email address given as mailto:foo@example.com
Description More random details about the Agent
Location A WGS84 lat-lon pair, or a URI indicating a near location

Relations

  • An Agent is the original provider of such-and-such a Data Set
  • An Agent is the publisher of such-and-such a Data Set
  • An Agent is the maintainer of such-and-such a Data Source


Namespace / Ontology

It is possible or advisable to extend many XML-based metadata carrier formats with namespaces which can provide semantics for different properties, taking a "mix-in" approach with the use of small vocabularies for different domains.

Thus DCLite4G attempts not to provide a full model for metadata for geographic information but to reuse properties from other well-known namespaces or ontologies - GeoRSS, Dublin Core, FOAF - and provide a subset of Dublin Core with extra rigour of expression suitable for better machine readability and reuse. This is also what is known as a Dublin Core Application Profile.

http://xmlns.com/2008/dclite4g/ is the namespace reserved for DCLite4G. It should and will come to contain an OWL ontology showing the structure, providing where possible mappings to various metadata schemas.

Examples

Example serialisations of a DCLite4G minimal model in different common formats:

DCLite4G Python libraries with genshi templates used to produce them.

References

History

  • July 20th 2008

The first version of the DCLite4G model dates to late 2006/early 2007. In mid 2008 the vocabulary is significantly updated and somewhat simplified, after several iterations of a corresponding data registry and search service. The distinction between an abstract "Data Set" which has properties common to potentially many different "Data Sources" has been dropped. A "Series" is introduced, which takes most of the functions of "Data Set". Thus "Data Source" is now renamed "Data Set", resolving the semantic ambiguity in the model. Please see the history of this page