Difference between revisions of "Data Quality"

From OSGeo
Jump to navigation Jump to search
Line 5: Line 5:
 
* INSPIRE does not mandate quality standards but Joint Research Commission recognises that not to consider quality, is an oversight.
 
* INSPIRE does not mandate quality standards but Joint Research Commission recognises that not to consider quality, is an oversight.
  
* The ISO standards regarding quality of geographic information are oriented towards quality assurance in the data production process
+
* The ISO standards regarding quality of geographic information are oriented towards quality assurance in the data production process. They also assume that the end user will accept the product as homogeneous, where a generic quality statement applies to all objects and areas.
  
* This means a lack of focus on the value of data quality information from the end-user's perspective - what problems are we helping to solve by publishing data quality information?
+
* This means a lack of focus on the value of data quality information from the end-user's perspective - what problems are we helping to solve by publishing data quality information?  
 +
 
 +
* Potentially, this will allow to move from the question 'Is dataset X useful for task Y?' to 'Is dataset X useful for task Y at location Z?' - after all, each researcher is working at a specific scale and purpose, so introducing scale and location explicitly to the decision making should assist in the data selection and fitness for purpose analysis.
  
 
* For example, OS Research has done extensive work on a "vernacular gazetteer" of shapes for social names, but data quality concerns prohibit its release, even for research.
 
* For example, OS Research has done extensive work on a "vernacular gazetteer" of shapes for social names, but data quality concerns prohibit its release, even for research.
 +
 +
* In addition, emerging OS Research on the usability of geographical information is exposing the producer-centric nature of the datasets, and the need to develop novel, user-centric approaches to data production and delivery
  
 
* Geodata world has its domain specific problems, can benefit from looking at lighter weight /  
 
* Geodata world has its domain specific problems, can benefit from looking at lighter weight /  

Revision as of 06:44, 27 October 2010

For now this page is to discuss a proposal for a short project (4-7 months) looking at data quality approaches to collaborative online sources of information. This is something that could be an interesting fit for the geospatial strand of JISC funding call 15/10 on infrastructures for education and research.

Overview

  • INSPIRE does not mandate quality standards but Joint Research Commission recognises that not to consider quality, is an oversight.
  • The ISO standards regarding quality of geographic information are oriented towards quality assurance in the data production process. They also assume that the end user will accept the product as homogeneous, where a generic quality statement applies to all objects and areas.
  • This means a lack of focus on the value of data quality information from the end-user's perspective - what problems are we helping to solve by publishing data quality information?
  • Potentially, this will allow to move from the question 'Is dataset X useful for task Y?' to 'Is dataset X useful for task Y at location Z?' - after all, each researcher is working at a specific scale and purpose, so introducing scale and location explicitly to the decision making should assist in the data selection and fitness for purpose analysis.
  • For example, OS Research has done extensive work on a "vernacular gazetteer" of shapes for social names, but data quality concerns prohibit its release, even for research.
  • In addition, emerging OS Research on the usability of geographical information is exposing the producer-centric nature of the datasets, and the need to develop novel, user-centric approaches to data production and delivery
  • Geodata world has its domain specific problems, can benefit from looking at lighter weight /

differently conceived quality approaches from other domains.

  • The aim should be to encourage and support the publication of more data of variable, knowably unknown quality.
  • Quality currently looks like a niche issue. New developments in data sharing over the internet will raise priority for machine-reusable descriptions of data quality (distributed databases; multiple copies of the same resource unsynchronised, or variably edited; more collaborative mapping projects along lines of OSM and OpenAddresses; lossy or transient datastores; linked data pollution)

Fit for 15/10?

See the briefing paper on the JISC geospatial strand for more context - up to 9 months duration between Feb and Dec 2011.

  • Briefing emphasises infrastructure development, re-use of tools and services, both those directly supported by JISC and others popular on the web
  • We'd be looking at a mixture of service/tool re-use and structured interviews with academic geodata users regarding their concerns around quality.
  • Exploring the concept of fit-for-use from a user-centred perspective (in contrast to producer centred view). This should guide the development of user-centred metadata discovery.

Themes

Starting with Nothing

The traditional ISO data quality model assumes theoretically perfect data. Many measures and tests can only be run in comparison with a known higher quality, more "authoritative" dataset. Perfect data does not look like a reasonable assumption.

Attestation

Peer review for data quality / social aspect to data sources.

Edit-time quality reporting

The JOSM Validator model, looking at logical consistency of edits to OSM before commit. Again there's a data production bias here - how many research users of OSM, for example, are active editors?

Interviews

References


Participants in this document / proposal

Jo Walsh - EDINA, University of Edinburgh