Difference between revisions of "DCLite4G"

From OSGeo
Jump to navigation Jump to search
m (Reverted edit of OzaIai, changed back to last version by Arnulf Christl)
(replacing with minimal rewrite + material from the older 'requirements' pages)
Line 1: Line 1:
'''DCLite4G''' is an effort to
+
= DCLite4G =
* establish a common information model with mappings to well known formats for geospatial metadata
 
* provide a vocabulary/recommendations for spatial extensions to Dublin Core
 
* support simple interfaces for collecting and querying geospatial metadata
 
  
== Getting involved ==
+
DCLite4G is short for "Dublin Core Lightweight Profile for Geospatial". It is a ''minimal information model for metadata about geospatial data''. DCLite4G consists of:
To get involved first of all subscribe to the Mailing List (see below). Then read all corresponding pages of this Wiki. If you find that something is missing or incorrect, correct or add it as required. To do this you need to create an account for this Wiki. Sorry, still working on Single Sign On...
 
  
Once you get the hang of it and feel like needing more involvement you can join any interest group or project and contribute to the process. This can happen from many perspectives - as a user, portal operator, developer, decision maker or name it.  
+
* an abstract information model
 +
* an implementation reference exploiting common standard vocabularies - [http://dublincore.org/ Dublin Core], [http://georss.org GeoRSS].
 +
* a namespace used to define extra properties needed to usefully specify the properties of geospatial data.
 +
The URL at which the namespace lives, contains an OWL ontology showing the structure, providing where possible mappings to various metadata schemas.
  
=== Mailing List ===
+
[[TableOfContents]]
Feel welcome to join the discussion and development [http://lists.eogeo.org/mailman/listinfo/dclite4g mailing list]:
 
* http://lists.eogeo.org/mailman/listinfo/dclite4g
 
  
== TODO ==
+
= Information Model =
 +
 
 +
== Data Set ==
 +
 
 +
A Data Set is an abstract object. It corresponds to the ideal of a data set, independent of a physical form or an encoding in which it is being distributed. For example, "TIGER 2005" or "OpenStreetmap from 2007-04-20", or "SRTM" would be considered data sets.
 +
 
 +
A Data Set can be associated with one or many Data Sources which correspond to concrete resources which can be got at in order to explore the data. All the properties which are shared across multiple Data Sources, are properties of the Data Set. If the properties of a Source are different - e.g. a smaller spatial extent, or a generalised subset of features in the same extent - then it is an instance of a different Data Set.
  
* extract more of the specifics from [[Geodata Metadata Requirements]]. (This has changed a lot from the original version in the history & current version which has data sets differentiated from data sources, which may be files or databases or services)
+
One goal in this is to make it more possible to trace and annotate the recombination of different sources of data into one package - keeping track of what is known as "lineage" of data packages.
* update docs on [[Geodata Metadata Model]]
 
* draw some UML to appease people?
 
  
= Information Model =
+
=== Properties ===
 +
{|
 +
|-
 +
| '''Name''' || A name given to a data set. Does not have to be unique or "canonical".
 +
|-
 +
| '''Abstract''' || A text description of the contents of a data set, remarks on anything interesting contained therein.
 +
|-
 +
| '''Keywords''' || A layer may have none or many keywords associated with it.
 +
|-
 +
| '''Publisher''' || Contact information for the Agent publishing the data set
 +
|-
 +
| '''Originator''' || Contact information for the original creator of the data set
 +
|-
 +
| '''Publication Date''' || Date on which the data was made available 
 +
|-
 +
| '''License''' || URL of a data license whose terms cover the distribution of this data set.
 +
|-
 +
| '''Object type''' || Currently, one of 'Vector','Raster','Coverage'
 +
|-
 +
| '''Extents''' || A bounding box or polygon specified in WGS84, regardless of the original projection and datum of the data.
 +
|-
 +
| '''Projection''' || The original spatial projection of the data
 +
|-
 +
| '''Resolution''' || (for Raster, Coverage types)
 +
|-
 +
| '''Colour Depth''' || (for Raster, Coverage types) e.g. 8, 16, 24 bit
 +
|- 
 +
| '''Transparency''' || (for Raster type)
 +
|}
 +
 +
=== Relations ===
 +
 
 +
* A Data Set can have optionally many Layers
 +
* A Data Set can have optionally many Data Sources
 +
* A Data Set is made available by an Agent
 +
* A Data Set is originally compiled by an Agent
 +
* A Data Set may be a source of another Data Set
 +
 
 +
 
 +
== Data Source ==
  
== Core Model ==
+
=== Properties ===
  
[[Image:Metadata.png]]
+
{|
=== Dublin Core ===
+
|-
[http://dublincore.org/ The Dublin Core Metadata Initiative] is an open organization engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models. An introduction to the topic can be read in the [http://dublincore.org/documents/usageguide/ Usage Guide].
+
| '''Name''' || If desired, a text name indicating a specific data source (otherwise one can infer that the name is shared with the data set)
 +
|-
 +
| '''URI''' || A URI giving the location of the data. If this is a W*S resource, the URI would be that of a GetCapabilities request to it.
 +
|-
 +
| '''Abstract''' || Any notes or other marginalia of interest regarding this data source
 +
|-
 +
| '''Format''' || If the Source is a service this would be, for example OGC:WMS or OGC:WCS. For files, this is a text identifier which follows as a guideline the output of ''gdalinfo''.
 +
|}
  
=== ISO19115 ===
+
=== Relations ===
  
=== FGDC ===
+
* A Data Source is an instance of exactly one Data Set
 +
* A Data Source can be a component of another Data Source (e.g. a shapefile may also be one tile within a tile index
  
== Namespace ==
+
== Layer ==
  
The '''dclite4g''' namespace provides a common naming scheme for properties specific to geographic data sets and services which are not covered by the existing Dublin Core or GeoRSS standards.
+
Vector data formats often provide multiple Layers of data as part of a data set. OGC Web Services also often provide multiple Layers through one service. This is useful metadata which it would be a shame to throw away.
  
The namespace will come to live at http://xmlns.com/2006/dclite4g/ .
+
=== Properties ===
  
* extents
+
{|
* accuracy
+
|-
* scale
+
| '''Name''' || The layer's name, free text
 +
|-
 +
| '''Abstract''' || A description of the contents of the layer, with any remarks of interest
 +
|-
 +
| '''Keywords''' || A layer may have none or many keywords associated with it.
 +
|}
  
 +
=== Relations ===
  
''' fill in here from [[Geodata Metadata Requirements]] '''
+
* A Data Set can have none or many Layers
 +
(note - though we get information about layers through looking at a Data Source, they are related to the abstract Data Set, because the Layers on all the different Sources will be the same.
  
== References ==
+
== Agent ==
  
* [http://geometa.info/rappiinfo/wiki/index.php/DClite4G Dublin Core lite for Geo]
+
An Agent may be a person, an organisation, or a machine agent. This term derives from foaf:Agent, and is used in the same way as 'Principal' is used in security terms.
* [http://portal.opengeospatial.org/files/?artifact_id=5929&version=2 OGC Catalog Services 2 Specification] '''6.3.3''', ''Core returnable properties''
 
* [http://portal.opengeospatial.org/files/?artifact_id=12604&version=1&format=pdf OGC ebRIM profile of CSW specification], '''Appendix B.5, Table B.3''' - ''Slots defined in the Basic package''
 
* [http://udig.refractions.net/docs/api-udig/net.refractions.udig.catalog/net/refractions/udig/catalog/IGeoResourceInfo.html  iGeoResourceInfo class in uDig]
 
* [http://www.geodatacommons.umaine.edu/wpapers/CGD%20Metadata%20White%20Paper%20v2.pdf GeodataCommons Metadata Whitepaper]
 
  
= Query Interface =
+
=== Properties ===
  
== OAI-PMH ==
+
{|
 +
|-  
 +
| '''Name''' || The name of the agent
 +
|-
 +
| '''Email''' || A contact email address given as mailto:foo@example.com
 +
|-
 +
| '''Description''' || More random details about the Agent
 +
|-
 +
| '''Location''' || A WGS84 lat-lon pair, or a URI indicating a near location
 +
|}
  
[http://www.openarchives.org/OAI/openarchivesprotocol.html The Harvesting Protocol (version 2.0) specification] together with [http://www.openarchives.org/OAI/2.0/guidelines.htm Implementation Guidelines]
+
=== Relations ===
  
Following are specific guidelines for a minimal OAI-PMH implementation of a so called 'data provider' using only the mandatory 'unqualified' Dublin Core (DC):
+
* An Agent is the original provider of such-and-such a Data Set
 +
* An Agent is the publisher of such-and-such a Data Set
 +
* An Agent is the maintainer of such-and-such a Data Source
  
* Only '''three''' operations (verbs) are needed: Identify, ListMetadataFormats and ListRecords.
 
* Following operations are not required (initially): ListIdentifiers, ListSets, GetRecord.
 
* No incremental harvesting (resumption process for ListXxx operations with more than 1000 records)
 
* No compression as defined in the OAI-PMH spec. (compression at lower http level still possible)
 
* Date granularity may be 'day' not seconds (YYYY-MM-DD)
 
* Keeping track of deleted record may not be supported (deletedRecord=no)
 
* Mandatory DC supported as data model is sufficient for a start but with specific semantics (e.g. coverage, relation) (see also example below):
 
** dc:description contains dct:abstract
 
** dc:coverage contains bounding box encoding as defined in http://georss.org/simple.html#Box
 
** dc:date means in fact dct:modified
 
** dc:relation is filled in with dclite4g:onLineSrc. If dc:type='service' dct:hasPart can be derived from GetCapabilities.
 
  
Additional bounding box query property for ListRecords.
+
= Namespace / Ontology =
  
== Others ==
+
It is possible or advisable to extend many XML-based metadata carrier formats with ''namespaces'' which can provide semantics for different properties, taking a "mix-in" approach with the use of small vocabularies for different domains.
  
 +
Thus DCLite4G attempts not to provide a full model for metadata for geographic information but to reuse properties from other well-known namespaces or ontologies - GeoRSS, Dublin Core, FOAF - and provide a subset of Dublin Core with extra rigour of expression suitable for better machine readability and reuse. This is also what is known as a ''Dublin Core Application Profile''.
  
== References ==
+
http://xmlns.com/2007/dclite4g/ is the namespace reserved for DCLite4G. It should and will come to contain an OWL ontology showing the structure, providing where possible mappings to various metadata schemas.
  
* [http://devgeo.cciw.ca/owscat/docs/index.html OWSCat]
 
* [[Simple Catalog Interface]] - links to articles, etc
 
  
= See Also =
+
== References ==
  
* http://www.geometa.info/ - German geospatial data search service using dclite4g + OAI-PMH.
+
[]

Revision as of 05:40, 29 May 2007

DCLite4G

DCLite4G is short for "Dublin Core Lightweight Profile for Geospatial". It is a minimal information model for metadata about geospatial data. DCLite4G consists of:

  • an abstract information model
  • an implementation reference exploiting common standard vocabularies - Dublin Core, GeoRSS.
  • a namespace used to define extra properties needed to usefully specify the properties of geospatial data.
The URL at which the namespace lives, contains an OWL ontology showing the structure, providing where possible mappings to various metadata schemas.

TableOfContents

Information Model

Data Set

A Data Set is an abstract object. It corresponds to the ideal of a data set, independent of a physical form or an encoding in which it is being distributed. For example, "TIGER 2005" or "OpenStreetmap from 2007-04-20", or "SRTM" would be considered data sets.

A Data Set can be associated with one or many Data Sources which correspond to concrete resources which can be got at in order to explore the data. All the properties which are shared across multiple Data Sources, are properties of the Data Set. If the properties of a Source are different - e.g. a smaller spatial extent, or a generalised subset of features in the same extent - then it is an instance of a different Data Set.

One goal in this is to make it more possible to trace and annotate the recombination of different sources of data into one package - keeping track of what is known as "lineage" of data packages.

Properties

Name A name given to a data set. Does not have to be unique or "canonical".
Abstract A text description of the contents of a data set, remarks on anything interesting contained therein.
Keywords A layer may have none or many keywords associated with it.
Publisher Contact information for the Agent publishing the data set
Originator Contact information for the original creator of the data set
Publication Date Date on which the data was made available
License URL of a data license whose terms cover the distribution of this data set.
Object type Currently, one of 'Vector','Raster','Coverage'
Extents A bounding box or polygon specified in WGS84, regardless of the original projection and datum of the data.
Projection The original spatial projection of the data
Resolution (for Raster, Coverage types)
Colour Depth (for Raster, Coverage types) e.g. 8, 16, 24 bit
Transparency (for Raster type)

Relations

  • A Data Set can have optionally many Layers
  • A Data Set can have optionally many Data Sources
  • A Data Set is made available by an Agent
  • A Data Set is originally compiled by an Agent
  • A Data Set may be a source of another Data Set


Data Source

Properties

Name If desired, a text name indicating a specific data source (otherwise one can infer that the name is shared with the data set)
URI A URI giving the location of the data. If this is a W*S resource, the URI would be that of a GetCapabilities request to it.
Abstract Any notes or other marginalia of interest regarding this data source
Format If the Source is a service this would be, for example OGC:WMS or OGC:WCS. For files, this is a text identifier which follows as a guideline the output of gdalinfo.

Relations

  • A Data Source is an instance of exactly one Data Set
  • A Data Source can be a component of another Data Source (e.g. a shapefile may also be one tile within a tile index

Layer

Vector data formats often provide multiple Layers of data as part of a data set. OGC Web Services also often provide multiple Layers through one service. This is useful metadata which it would be a shame to throw away.

Properties

Name The layer's name, free text
Abstract A description of the contents of the layer, with any remarks of interest
Keywords A layer may have none or many keywords associated with it.

Relations

  • A Data Set can have none or many Layers

(note - though we get information about layers through looking at a Data Source, they are related to the abstract Data Set, because the Layers on all the different Sources will be the same.

Agent

An Agent may be a person, an organisation, or a machine agent. This term derives from foaf:Agent, and is used in the same way as 'Principal' is used in security terms.

Properties

Name The name of the agent
Email A contact email address given as mailto:foo@example.com
Description More random details about the Agent
Location A WGS84 lat-lon pair, or a URI indicating a near location

Relations

  • An Agent is the original provider of such-and-such a Data Set
  • An Agent is the publisher of such-and-such a Data Set
  • An Agent is the maintainer of such-and-such a Data Source


Namespace / Ontology

It is possible or advisable to extend many XML-based metadata carrier formats with namespaces which can provide semantics for different properties, taking a "mix-in" approach with the use of small vocabularies for different domains.

Thus DCLite4G attempts not to provide a full model for metadata for geographic information but to reuse properties from other well-known namespaces or ontologies - GeoRSS, Dublin Core, FOAF - and provide a subset of Dublin Core with extra rigour of expression suitable for better machine readability and reuse. This is also what is known as a Dublin Core Application Profile.

http://xmlns.com/2007/dclite4g/ is the namespace reserved for DCLite4G. It should and will come to contain an OWL ontology showing the structure, providing where possible mappings to various metadata schemas.


References

[]