Open Data

From OSGeo
Jump to navigation Jump to search

This page is work in progress and has not been confirmed as an official white paper. Pleas join forces and help to shape the understanding of Open Data.

This page is the live version of a white paper collating definitions of the term "Open Data" applied to the geospatial domain.

The definitions proposed by several different organizations and individuals are used to describe aspects of openness. In order to have a civilized conversation about the implications of opening up data it is necessary to understand different positions. This white paper gives some fix-points (Fixpunkte) in the ongoing process of learning what openness means.

Comments by jmckenna:

  • Who is the intended audience of this paper? What assumptions are therefore made? (this 'intended audience' paragraph with assumptions could be near the beginning of the paper)
  • "openness" is used throughout this page, maybe we should define this term, or use another term

Introduction

The term "Open Data" is currently being used in many different contexts. It must be clear that the term "Open Data" without any additional information is very unspecific. Therefore in any discussion of the topic it is indispensable to first clarify which aspect of Open Data are in the focus:

  • Data access
  • Ownership
  • Copyright
  • Licensing
  • Usage
  • Maintenance
  • Update
  • Authority
  • Cost (of production, dissemination and use)
  • Documentation / Metadata
  • Format (technical, semantic)
  • please add

This page aims at creating a White Paper collating and commenting the definitions proposed by different organizations by applying them to real-world cases from the geospatial domain.


Definition of Terms

Some terms used in this white paper require a more precise definition.

Geodata

Geodata or geospatial data is defined by some common properties that make them special and also different to other sets of data:

  • All datum in a geo-dataset is inherently connected and linked to a larger body through location. It is not readily divisible into distinct and independent datasets.
  • Geodata is inherently linked to other data by location.
  • Location is always fuzzy and impermanent over time. The earth is moving and things on the earth are moving.
  • Precision in geodata is peculiar. The more precision is achieved in technology the more this becomes a problem because subtle movement can result in altered relations.
  • Geodata typically has a history, sometimes also in movement. Geodata that "worked" yesterday may fail to produce the same results after it has been updated over time.
  • ...and so on (please add)

Dataset

A dataset (maybe also datum) is typically part of a collection of data. In the geospatial context each datum is typically inherently linked to others through location (i.e. neighbouring, connected, in a grid with content "bleeding" between borders, etc.).

Service

See also Open Service definition.

Authenticity

Authenticit yis the degree to which an information can be trusted to be original (authentic). Authenticity is typically not defined per se for data.

Authority

Authority is a concept put in place by a governing entity. Authoritative data does not have to be more true or valid or have higher quality than non-authoritative data but it is used as the source for decisions made by the governing authority.

Public Good

A public good is a thing or infrastructure or entity that is typically commonly available for the public. Due to the nature of the public being a collective sum of individuals with different and potentially even opposing needs a public good it is hard to define. In general everything provided by a (democratic?) government is considered a public good for it's citizens.

Infrastructure

An infrastructure is a set of tools or technologies ... (check back with WP)


Open Data Definitions

One central resource for the definition of Open Data is maintained by the Open Knowledge Foundation (http://opendefinition.org). It defines that:

  "A piece of content or data is open if anyone is free to  use,
  reuse, and redistribute it — subject only, at most, to the
  requirement to attribute and/or share-alike."
  ...the Open Software Service Definition (OSSD) defines ‘openness’ in relation 
  to online (software) services. It can be summed up in the statement that:
  "A service is open if its source code is Free/Open Source Software
  and non-personal data is open as in the Open Definition.

Volunteered Geographic Information

Michael Goodchild (PDF 620kB) coined the term to describe any geospatial information that is volunteered by a user to some data collection. This definition does not differentiate on the openness of data collections, they can well be proprietarily owned.

Crowdsourcing

The term crowdsourcing refers to the business process of outsourcing some task to a crowd. A crowd in this context is specifically not under the control of the outsourcing entity. The crowd can but does not have to be rewarded in any way. Sometimes intrinsic values like recognition or fun are compensation enough for the volunteers.

Public Geodata

Public geodata typically refers to data that has been collected by a public institution, for example a mapping agency or cadastral administration. To this date there are ongoing discussions as to the ownership, access and use of such data, including whether it should be open for the public to use or not.

Open Geodata

Open Geodata can refer to any geospatial dataset that is publicly (openly) available, regardless of it's origin. In some legislations (for example the USA) data produced by the government is automatically put into the Public Domain. In other legislations which do not have a concepts of a common good data may be published under a specific open license (for example the Open Government License in the UK). Other data yet has been collected in a collaborative and volunteered effort and is made available openly to the public, sometimes under a license which implements the legal concept of Copyleft (for exampe OpenStreetMap) well known from the Free Software movement. It defines that this data can be used by anybody for any use, given that all modifications are contributed openly back to the public.

Caveats

Some of these concepts conflict. The Open Government License currently in operation in the UK allows anybody to do anything with the data. This includes modifications of the data and allows the redistribution of that changed data which contradicts the authoritative character of the data which might be needed by the vgoverning organization. Accessibility to the public can also pose a problem if the data is published through a medium that is in general not commonly available. This can be lack of availability of IT and connectivity or restrictions to access implemented by governments (for example China).

Proposed Terminology

This paper proposes to use specific prefixes to differentiate the Openness levels of data.

Openly Maintained Data

"Openly Maintained" for data that are commonly and publicly collected, maintained and used. Commonly Maintained of individuals who do not necessarily belong to a common organization, region or work under a common dictate (as in they are volunteers) can be termed as crowd sourced.

Commonly Maintained Data

Not sure whether "commonly maintained" entails that there are copyleft-effect restrictions in the commercial use. Maybe it is perfectly allowable for a copy of the data to drift into individual property as long as it is made sure that this is the case (be honest).


Therefore it is proposed that the term "Open Data" always be accompanied by an explicit license. These can have a stronger copyleft effect (ODbL) or be permissive like the CC0. As of the writing of this paper there is no Open Data compatible license which addresses the need of protecting the data from modifications.

DRM

Some industries have implemented so called Digital Restrictions Management (or DRM, also known as Digital Rights Management) to protect their products from being used without permission. Most types of DRM currently in use either modify the data in a way that can later be traced or encrypt the data and only allow access with a specific key. In the first case the data is changed which cannot be in the interest of authenticity (quality) of the data. In the latter case access is restricted by technology which typically involves proprietary software.

Ensuring and Verifying Authenticity

It may be possible to use CRC (Quersummen) to calculate check sums of data which would allow to verify it's authenticity.

An alternative option would be the provision of the data through a service which can use traditional security measures to provide authenticity. All copies of this data shall be linked to the original source (typically via a URL link) which allows querying the original for consistency checks.



How can Public Data be Open with the above definition when at the same time there is a need for some authority as in "control"? How to make the data available and at the same time making sure that it is not misused as in changed-for-one's-private-profit? Imagine borders (of pretty anything) changed at will. Be it the dictator who wants to enlarge his territory, the company which wants to expand into the environmental protection reserve.

Then the whole Openness issues tips to a side that we do not want to see happen ither.

Litmus Tests

There are a few "Litmus Tests" that can be applied to Open Data to find out how Open they are. The following links give an overview which could serve as a basis to create a regular test by OSGeo.

Summary

There is as of the point in time of writing this white paper no comprehensive or exact definition of Open Data. Whenever this term is used it should therefore reference the level of openness which is provides.


Links