Location in CKAN

From OSGeo
Jump to: navigation, search

The Comprehensive Knowledge Archive Network is a project of the Open Knowledge Foundation. It is a free and open source software package, also a web-based service, an API, but more importantly a network of package contributors and maintainers. CKAN.net has a lot of listings for packages of geodata - see the geo and geodata tags. There is no location metadata, though there is an option to add new key-value-pair "tags" to the minimal default metadata.

CKAN aims to encourage and support the emergence of a culture where knowledge packages can be easily discovered and plugged together as is currently possible with software.

The function of this writing is to talk through the CKAN vision and think about different ways in which location information could make it easier to find and download data.

Ckan-vision.png

CKAN is a registry, not a repository - it stores minimal metadata about data sources. Essentially, a title, a project URL, a download URL, and author/maintainer contact information. Entries can be extended with "tags".

An design goal of CKAN is to convert download links into packages of data which can be automatically installed via an application. Links may be to pages listing individual files, or a big dump of XML, or the interface to a web service.

This is being worked on in the separate datapkg project. Many entries in CKAN will require a bit of custom glue to turn them into packages. Interfaces and formats may change over versions, so everything is versioned.


Location in CKAN

Some packages have definite locations - the data is about Spain, or about Boston, etc.

Other packages have global scope.

Some packages are "geographic information", or "GIS data", that is, primarily points lines and polygons or 3D objects, annotated with bits of text or links to other data sets.

But many more packages have a location component within their data.

Simplest useful thing?

Registries/Repositories that store GIS data tend to have "bounding box" metadata for each dataset or series of datasets.

Without a description of data scale/resolution, bbox alone not that helpful for search. Adding a bbox requires data entry which is already quite "specialist", even if user is given a UI to draw a box on a map, and time-consuming.

We could get similar value by creating links from CKAN packages to entries in a gazetteer of place names and approximate locations.

Then using the links to help create collections of packages. "apt-get install London" or "apt-get install London 1973".

Shouldn't we just do this with tags?

User-created 'tags' partly serve this purpose already: for example UK-related packages can be found by looking at the 'uk' tag: http://www.ckan.net/tag/read/uk These are not all data that describes the UK - some are global-scope environmental data sets that have been produced by UK researchers.

There are also tag-spaces that need gardening, or connecting together. Looking at tag/read/london finds only 2 packages. where a fulltext search reveals 3 different tags used for London data: [city-london greater-london-authority london]

Nice to have URLs to provide extra metadata about places and connections between them - see geonames.org semantic web service

Shouldn't we just do this with an OGC Catalogue Service?

Other services handle detailed metadata for GIS data at a great level of detail - GeoNetwork, Go-Geo!, INSPIRE metadata, AGMAP profile, national geoportals, etc.

This is not about turning CKAN into a catalogue service for geodata, because there is already a lot of investment in that area through INSPIRE.

Rather, the focus here is on using location data to help make more sense of other, related datasets.


Notes: I should probably have put this on the OKF wiki, but I have forgotten how to work MoinMoin.