Data Quality

From OSGeo
Jump to navigation Jump to search

For now this page is to discuss a proposal for a short project (4-7 months) looking at data quality approaches to collaborative online sources of information. This is something that could be an interesting fit for the geospatial strand of JISC funding call 15/10 on infrastructures for education and research.

Overview

  • INSPIRE does not mandate quality standards but Joint Research Commission recognises that not to consider quality, is an oversight.
  • The ISO standards regarding quality of geographic information are oriented towards quality assurance in the data production process
  • This means a lack of focus on the value of data quality information from the end-user's perspective - what problems are we helping to solve by publishing data quality information?
  • For example, OS Research has done extensive work on a "vernacular gazetteer" of shapes for social names, but data quality concerns prohibit its release, even for research.
  • Geodata world has its domain specific problems, can benefit from looking at lighter weight /

differently conceived quality approaches from other domains.

  • The aim should be to encourage and support the publication of more data of variable, knowably unknown quality.
  • Quality currently looks like a niche issue. New developments in data sharing over the internet will raise priority for machine-reusable descriptions of data quality (distributed databases; multiple copies of the same resource unsynchronised, or variably edited; more collaborative mapping projects along lines of OSM and OpenAddresses; lossy or transient datastores; linked data pollution)

Fit for 15/10?

See the briefing paper on the JISC geospatial strand for more context - up to 9 months duration between Feb and Dec 2011.

  • Briefing emphasises infrastructure development, re-use of tools and services, both those directly supported by JISC and others popular on the web
  • We'd be looking at a mixture of service/tool re-use and structured interviews with academic geodata users regarding their concerns around quality.
  • Exploring the concept of fit-for-use from a user-centred perspective (in contrast to producer centred view). This should guide the development of user-centred metadata discovery.

Themes

Starting with Nothing

The traditional ISO data quality model assumes theoretically perfect data. Many measures and tests can only be run in comparison with a known higher quality, more "authoritative" dataset. Perfect data does not look like a reasonable assumption.

Attestation

Peer review for data quality / social aspect to data sources.

Edit-time quality reporting

The JOSM Validator model, looking at logical consistency of edits to OSM before commit. Again there's a data production bias here - how many research users of OSM, for example, are active editors?

Interviews

References


Participants in this document / proposal

Jo Walsh - EDINA, University of Edinburgh