Geodata Repository

The notes on the Talk:Geodata_Repository Talk page for this page describe the background to this effort

A full list of suggestions for public domain data sets that are nice-to-haves is maintained at Geodata Discovery Working Group.

= Getting involved =

The geodata repository has a dedicated blade server at telascience - cf SAC Service Status. Right now this is geodata.telascience.org ; we need to get an osgeo.org domain pointed to it. The plan is to have something very much up, running and demoable by FOSS4G.

How you can help

 * Offer feedback on the Geodata Metadata Requirements - if you have a dataset you would like to contribute, does this model express it adequately?
 * Write or contribute an example Simple Catalog Interface
 * If you would like to install software and/or data on the repository machine, please talk to the good people at SAC or visit #osgeo and #telascience on irc.freenode.net to find out how to get an account and get started

Who is involved now

 * Jo Walsh is working on the backend data store and machine level metadata interfaces (also presenting this material at FOSS4G 2006)
 * Chris Schmidt is helping with an OpenLayers-based browsing interface
 * Schuyler Erle is helping with web service / application glue
 * Sean Gillies' OWSLib is, half-beknownst to him, a big part of all these applications - see the OWSLib API user stories at http://trac.gispython.org/projects/PCL/wiki/OwsCapabilitiesUserStory
 * Markus Neteler is contributing data processing code and resources as part of the GRASS data packaging effort
 * Norm Vine is providing a point of contact for administering the system and general reality-checking to all involved.
 * Martin Spott maintains the contents of the PostGIS-repository on 'zuluviz'.

= Interface Design =

In order to be useful to people, a geodata repository needs a way for users to become quickly + easily informed as to what kind of data they're getting: metadata about the data, and the quality/quantity of data involved.

For Raster images, one way to do this is to include an extent-wide image of the data -- http://openlayers.org/gallery/ has screenshots which show some of what I mean where, although more directed towards applications. By having a whole-extent screenshot overview, users can quickly and easily see what they're getting. Additionally, assuming this data is to be made available as WMS, setting up an OpenLayers instance to allow users to browse the rasters and see at a more detailed level would be beneficial.

Vector data/attribute data would also need to be described, in text or some other way. http://freemap.in/world/ is an example of a browsable map which displays attribute data -- clicking on a country fills the sidebar with data. This interface was set up in about 20 minutes, and given decent map files, this kind of set up could be largely automated, allowing users to (again) get an overview of the data they're looking at before they download it or set it up on their own servers.

= Data sources =

Sources of public geodata for initial setup in a repository.

PostGIS serving vector data
There's a PostGIS server on 'zuluviz.sdsu.edu' (thanks to John Graham for hosting this site) that stores different flavours of vector data. Current datasets include: VMap0, VMap1, AptNav, GSHHS, SWBD, MGRS, TIGER, OSM, GeoNames, CountryCodes and FGSODB - see below for further explanation. An graphical overview is accessible via MapServer.

On Offer

 * Short explanation of available datasets (to be extended - the datasets as well as their explanation ;-):
 * VMap0: Selected subsets of VMap0 polygons, lines and points, starting with a selection that has proven to be useful for creating FlightGear Scenery from it. Added a 'geonameid' column for joining urban areas with GeoNames (see below). Current details explained at the World Custom Scenery Project, will get merged some time.
 * VMap1: First attempt of a selection that would be "nice to have" for FlightGear from VMap1 - and certainly for other purposes as well. Added a 'geonameid' column for joining urban areas with GeoNames. Details similar to VMap0.
 * AptNav: Geometric average of runway center locations as used by the FlightGear and X-Plane flight simulators; data taken from Robin Peel's Airport Database. Locations converted to OGC-style POINT geometries. Use 'icao' column for searching.
 * GSHHS: GSHHS 1.3 shorelines.
 * SWBD: SRTMv2 Water Body Data.
 * MGRS: Military Grid Reference System, alias UTMREF.
 * TIGER: TIGER line data - currently an overview that shows only major roads; to be completed (thanks to Chris Holmes at The Open Planning Project for providing pre-processed data).
 * OSM: OpenStreetMap; data taken from MetaCarta Labs. Added a 'waykey' column for searchable classification of way types [1-14]; schema taken from Map Features.
 * GeoNames: Complete content of the "allCountries" export table from the Geonames.org geographical database. Locations converted to OGC-style POINT geometries. Added a 'pplkey' column for searchable classification of size for populated places [1-6]; schema proposed by Markus Neteler.
 * CountryCodes: Translation table for country codes as proposed here (thanks to Silke Reimer for preparing the table).
 * FGSODB: Complete dump of the FlightGear Scenery Objects Database; models consist of AC3D geometries, RGB textures and in some cases animations that are defined in XML files. Locations converted to OGC-style POINT geometries.

Many thanks to Norman Vine for running the HCRA, the "Human Communications Relay Agent" :-)

Procedures

 * Importing the data is done via different ways.
 * Many datasets are readable with the GDAL/OGR(/OGDI) toolbox. Notably these are the VMap datasets, GSHHS, SWBD and MGRS. The 'ogr2ogr' command is used here, hidden behind a somewhat complex contruct of shell scripts which automates the whole thing. Writing to the DB is accomplished by the PostGIS driver provided by 'ogr2ogr'.
 * Some other datasets are automagically or manually (for those sets which are not expected to change often) transformed into OGC-compilant SQL scripts and run through the respective SQL monitor.
 * Few datasets are being imported right into the DB with Perl/DBI. AptNav for example is parsed from a text file with a home-grown parser in Perl and written to the database table with DBI.
 * Everything that is meant to represent a geometry is stored in the DB using OCG-compilant geometry types POLYGON, LINESTRING and POINT. A POINT for example is written into the DB for example as "INSERT INTO  ([...]) VALUES ([...] GeomFromEWKT('SRID=4326;POINT( )') [...])" and stored internally in the respective, geospatially searchable geometry data type.


 * Different methods for retrieval.
 * Direct read access to the database is available to users on the 'geodata.telascience.org' system. Retrieval in SQL syntax as "SELECT asText(wkb_geometry) FROM  WHERE  LIKE ''".
 * Mapserver as WMS is available for the purpose of simple mapping (see above).
 * GeoServer as WFS-T is being prepared for collaborative editing.
 * Dump into Shapefiles provided occasionally or on request; regular schedule possible if required.
 * Other data access on request, depending on the purpose and use.

TODO

 * Generation of per-country and per-region shapefiles; distribution via HTTP and via geotorrents.org
 * Finish GeoServer WFS-T configuration.
 * Build joins between VMap0/1 urban areas and GeoNames populated places (via names and geographic location).
 * Merge landcover-, stream- and road-layers from VMap1, TIGER and maybe OSM as well into the foundation of VMap0.
 * Limit the "viewing angle" of WMS/WFS-services in order to save the DB-servers' life ....
 * Retrieve exact locations of major river dams and waterfalls (keywords: St. Lawrence, Niagara, Bosporus, Gibraltar, ....).
 * Wait for agreement on OSGeo metadata recommendation/standard; create metadata sets for different types of data sources and join geospatial data with respective metadata.
 * Help is very much appreciated to add a reasonable colour schema to the different types of roads (VMap0/VMap1/TIGER/OSM) as well as populated places (GeoNames) -> please contact Martin.

Old status

 * see Geodata processing at telascience.org

Blue Marble NG

 * Mapserver as WMS

Landsat-7

 * Mapserver as WMS
 * Telascience already serving the raw data - http://onearth.telascience.org/

Status

 * Waiting for disk space to finish unpacking the raw data on zuluviz.sdsu.edu
 * Interim plan is to write a simple WMS wrapper script that generates a GDAL VRT to assemble composites on the fly

= Metadata =


 * Geodata Metadata Requirements - a baseline for what would be asked for / expected of data sets in an OSGeo repository
 * A simple metadata-in-GeoRSS webservice that is also as FGDC compliant as possible/desirable.
 * Simple Catalog Interface

= See Also =


 * Geodata Discovery Working Group - half-dormant
 * MapDex offers a metadata search facility
 * GeoNames is a gazetteer and naming collection
 * http://www.fao.org/geonetwork/srv/en/main.search
 * http://geoportail-geoportal.ainc-inac.gc.ca/metaindex_e.asp
 * http://en.wikipedia.org/wiki/List_of_towns