Difference between revisions of "Data Quality"

From OSGeo
Jump to navigation Jump to search
Line 26: Line 26:
 
== Themes ==  
 
== Themes ==  
  
* Starting with Nothing  
+
=== Starting with Nothing ===
  
* Attestation
+
The traditional ISO data quality model assumes theoretically perfect data. Many measures and tests can only be run in comparison with a known higher quality, more "authoritative" dataset.
 +
Perfect data does not look like a reasonable assumption.
  
* Edit-time quality reporting
+
=== Attestation ===
  
* ...  
+
Peer review for data quality / social aspect to data sources.
 +
 
 +
=== Edit-time quality reporting ===
 +
 
 +
The JOSM Validator model, looking at logical consistency of edits to OSM before commit. Again there's a data production bias here - how many research users of OSM, for example, are active editors?
  
 
== Interviews ==
 
== Interviews ==

Revision as of 05:53, 27 October 2010

For now this page is to discuss a proposal for a short project (4-7 months) looking at data quality approaches to collaborative online sources of information. This is something that could be an interesting fit for the geospatial strand of JISC funding call 15/10 on infrastructures for education and research.

Overview

  • INSPIRE does not mandate quality standards but Joint Research Commission recognises that not to consider quality, is an oversight.
  • The ISO standards regarding quality of geographic information are oriented towards quality assurance in the data production process
  • This means a lack of focus on the value of data quality information from the end-user's perspective - what problems are we helping to solve by publishing data quality information?
  • For example, OS Research has done extensive work on a "vernacular gazetteer" of shapes for social names, but data quality concerns prohibit its release, even for research.
  • Geodata world has its domain specific problems, can benefit from looking at lighter weight /

differently conceived quality approaches from other domains.

  • The aim should be to encourage and support the publication of more data of variable, knowably unknown quality.
  • Quality currently looks like a niche issue. New developments in data sharing over the internet will raise priority for machine-reusable descriptions of data quality (distributed databases; multiple copies of the same resource unsynchronised, or variably edited; more collaborative mapping projects along lines of OSM and OpenAddresses; lossy or transient datastores; linked data pollution)

Fit for 15/10?

See the briefing paper on the JISC geospatial strand for more context - up to 9 months duration between Feb and Dec 2011.

  • Briefing emphasises infrastructure development, re-use of tools and services, both those directly supported by JISC and others popular on the web

Themes

Starting with Nothing

The traditional ISO data quality model assumes theoretically perfect data. Many measures and tests can only be run in comparison with a known higher quality, more "authoritative" dataset. Perfect data does not look like a reasonable assumption.

Attestation

Peer review for data quality / social aspect to data sources.

Edit-time quality reporting

The JOSM Validator model, looking at logical consistency of edits to OSM before commit. Again there's a data production bias here - how many research users of OSM, for example, are active editors?

Interviews

Participants in this document / proposal

Jo Walsh