Data Quality
For now this page is to discuss a proposal for a short project (4-7 months) looking at data quality approaches to collaborative online sources of information. This is something that could be an interesting fit for the geospatial strand of JISC funding call 15/10 on infrastructures for education and research.
Overview
- INSPIRE does not mandate quality standards but Joint Research Commission recognises that not to consider quality, is an oversight.
- The ISO standards regarding quality of geographic information are oriented towards quality assurance in the data production process
- This means a lack of focus on the value of data quality information from the end-user's perspective - what problems are we helping to solve by publishing data quality information?
- For example, OS Research has done extensive work on a "vernacular gazetteer" of shapes for social names, but data quality concerns prohibit its release, even for research.
- Geodata world has its domain specific problems, can benefit from looking at lighter weight /
differently conceived quality approaches from other domains.
- The aim should be to encourage and support the publication of more data of variable, knowably unknown quality.
- Quality currently looks like a niche issue. New developments in data sharing over the internet will raise priority for machine-reusable descriptions of data quality (distributed databases; multiple copies of the same resource unsynchronised, or variably edited; more collaborative mapping projects along lines of OSM and OpenAddresses; lossy or transient datastores; linked data pollution)
Fit for 15/10?
See the briefing paper on the JISC geospatial strand for more context - up to 9 months duration between Feb and Dec 2011.
- Briefing emphasises infrastructure development, re-use of tools and services, both those directly supported by JISC and others popular on the web
- We'd be looking at a mixture of service/tool re-use and structured interviews with academic geodata users regarding their concerns around quality.
- Exploring the concept of fit-for-use from a user-centred perspective (in contrast to producer centred view). This should guide the development of user-centred metadata discovery.
Themes
Starting with Nothing
The traditional ISO data quality model assumes theoretically perfect data. Many measures and tests can only be run in comparison with a known higher quality, more "authoritative" dataset. Perfect data does not look like a reasonable assumption.
Attestation
Peer review for data quality / social aspect to data sources.
Edit-time quality reporting
The JOSM Validator model, looking at logical consistency of edits to OSM before commit. Again there's a data production bias here - how many research users of OSM, for example, are active editors?
Interviews
References
- http://povesham.wordpress.com/2010/01/29/haiti-%E2%80%93-further-comparisons-and-the-usability-of-geographic-information-in-emergency-situations/
- http://povesham.wordpress.com/2010/10/05/geographical-citizen-science/
- http://unlock.blogs.edina.ac.uk/2010/03/15/notes-on-linked-data-and-geodata-quality/
- http://www.scribd.com/doc/39738777/Perspectives-on-the-re-use-of-data-quality-metadata
Participants in this document / proposal
Jo Walsh - EDINA, University of Edinburgh