Geodata Repository

The notes on the Talk:Geodata_Repository Talk page for this page describe the background to this effort

A full list of suggestions for public domain data sets that are nice-to-haves is maintained at Geodata Discovery Working Group.

= Getting involved =

The geodata repository has a dedicated blade server at telascience - cf SAC Service Status. Right now this is geodata.telascience.org ; we need to get an osgeo.org domain pointed to it. The plan is to have something very much up, running and demoable by FOSS4G.

How you can help

 * Offer feedback on the Geodata Metadata Requirements - if you have a dataset you would like to contribute, does this model express it adequately?
 * Write or contribute an example Simple Catalog Interface
 * If you would like to install software and/or data on the repository machine, please talk to the good people at SAC or visit #osgeo and #telascience on irc.freenode.net to find out how to get an account and get started

Who is involved now

 * Jo Walsh is working on the backend data store and machine level metadata interfaces (also presenting this material at FOSS4G 2006)
 * Chris Schmidt is helping with an OpenLayers-based browsing interface
 * Schuyler Erle is helping with web service / application glue
 * Sean Gillies' OWSLib is, half-beknownst to him, a big part of all these applications - see the OWSLib API user stories at http://trac.gispython.org/projects/PCL/wiki/OwsCapabilitiesUserStory
 * Markus Neteler is contributing data processing code and resources as part of the GRASS data packaging effort
 * Norm Vine is providing a point of contact for administering the system and general reality-checking to all involved.
 * Martin Spott maintains the content of the PostGIS-repository on 'zuluviz'.

= Interface Design =

In order to be useful to people, a geodata repository needs a way for users to become quickly + easily informed as to what kind of data they're getting: metadata about the data, and the quality/quantity of data involved.

For Raster images, one way to do this is to include an extent-wide image of the data -- http://openlayers.org/gallery/ has screenshots which show some of what I mean where, although more directed towards applications. By having a whole-extent screenshot overview, users can quickly and easily see what they're getting. Additionally, assuming this data is to be made available as WMS, setting up an OpenLayers instance to allow users to browse the rasters and see at a more detailed level would be beneficial.

Vector data/attribute data would also need to be described, in text or some other way. http://freemap.in/world/ is an example of a browsable map which displays attribute data -- clicking on a country fills the sidebar with data. This interface was set up in about 20 minutes, and given decent map files, this kind of set up could be largely automated, allowing users to (again) get an overview of the data they're looking at before they download it or set it up on their own servers.

= Data sources =

Sources of public geodata for initial setup in a repository.

PostGIS serving vector data
There's a PostGIS server on 'zuluviz.sdsu.edu' (thanks to John Graham for hosting this site) that stores different flavours of vector data; also known as "Landcover-DB". Current datasets include: VMap0, VMap1, AptNav, GSHHS, PGS, SWBD, MGRS, TIGER, StatsCan, OSM, GeoNames, CountryCodes and FGSODB - see below for further explanation.

Access - how to get to it !?

 * Graphical MapServer interface
 * WMS GetCapabilities
 * WFS GetCapabilities
 * Enjoy TileCache !

The web mapping colour schema has been applied in accordance with the Corine Land Cover project.


 * Direct read access to the database is available to users of the 'geodata.telascience.org' system. Retrieval in SQL syntax as "SELECT asText(wkb_geometry) FROM  WHERE  LIKE ''".
 * GeoServer as WFS-T is being prepared for collaborative editing.
 * Dump into Shapefiles provided occasionally or upon request; regular schedule possible if required.
 * Other data access upon request, depending on the purpose and use.

CAVEAT We regret that, due to technical reasons, direct PostGIS database access is currently unavailable.

On Offer !

 * Short explanation of available datasets (to be extended - the number of datasets as well as their explanation ;-):
 * VMap0 (DETAIL): Selected subsets of Vector Smart Map Level 0 polygons, lines and points, starting with a selection that has proven to be useful for creating FlightGear Scenery from it. Added a 'geonameid' column for joining urban areas with GeoNames (see below). Current details explained at the World Custom Scenery Project, will get synced some time.
 * VMap1 (DETAIL): First attempt of a selection that would be "nice to have" for FlightGear from Vector Smart Map Level 1 - and certainly for other purposes as well. Added a 'geonameid' column for joining urban areas with GeoNames. Details similar to VMap0.
 * AptNav (DETAIL): Geometric average of runway center locations as used by the FlightGear and X-Plane flight simulators; data taken from Robin Peel's Airport Database. Locations converted to OGC-style POINT geometries. Use 'icao' column for searching.
 * GSHHS (DETAIL): Global Self-consistent, Hierarchical, High-resolution Shoreline Database 1.3 shorelines.
 * PGS (DETAIL): NGA Prototype Global Shoreline.
 * SWBD (DETAIL): SRTMv2 Water Body Data.
 * MGRS (DETAIL): Military Grid Reference System, alias UTMREF.
 * TIGER (DETAIL): Topologically Integrated Geographic Encoding and Referencing system line data. Roads, railroads and water/stream line data from the 2006se release, water body and landmark polygons from 2005fe (thanks to Chris Holmes at The Open Planning Project for providing pre-processed data).
 * StatsCan (DETAIL): Line data of the Statistics Canada 2006 Road Network File.
 * OSM (DETAIL): OpenStreetMap Import of the "070919" planet dump. Added a 'waykey' column for searchable classification of way types [1-14]; schema taken from Highway Map Features.
 * GeoNames (DETAIL): Complete content of the "allCountries" export table from the Geonames.org geographical database (as of 2007-06-24 23:32 UTC). Locations converted to OGC-style POINT geometries. Added a 'pplkey' column for searchable classification of size for populated places [1-7]; schema proposed by Markus Neteler:
 * continental scale (>= 1:50 million): >= 1 million inhabitants
 * multi-national scale (>= 1:10 million): 500000-1 million inhab.
 * country scale (>= 1:1 million): 100000-499999 inhab.
 * regional scale (>= 1:500000): 50000-99999 inhab.
 * city scale (>= 1:50000): 10000-49999 inhab.
 * local scale: < 10000
 * CountryCodes: Translation table for country codes as proposed here (thanks to Silke Reimer for preparing the table).
 * FGSODB (DETAIL): Complete dump of the FlightGear Scenery Objects Database; models consist of AC3D geometries, RGB textures and in some cases animations that are defined in XML files. Locations converted to OGC-style POINT geometries.
 * SPECIAL Landsat7 (DETAIL): Landuse data at selected areas (example) has been auto-classified from Landsat7-images and converted into suitable polygons at the World Custom Scenery Project. More to come.

Many thanks to Norman Vine for running the HCRA, the "Human Communications Relay Agent" :-)

How the data gets in

 * Many datasets are readable with the GDAL/OGR(/OGDI) toolbox. Notably these are the VMap datasets, GSHHS, PGS, SWBD and MGRS. The 'ogr2ogr' command is used here, hidden behind a somewhat complex contruct of shell scripts which automates the whole thing. Writing to the DB is accomplished by the PostGIS driver provided by 'ogr2ogr'.
 * Some other datasets are automagically or manually (for those sets which are not expected to change often) transformed into OGC-compilant SQL scripts and run through the respective SQL monitor.
 * Few datasets are being imported right into the DB with Perl/DBI. AptNav for example is parsed from a text file with a home-grown parser in Perl and written to the database table with DBI.
 * Everything that is meant to represent a geometry is stored in the DB using OCG-compilant geometry types POLYGON, LINESTRING and POINT. A POINT for example is written into the DB for example as "INSERT INTO  ([...]) VALUES ([...] GeomFromEWKT('SRID=4326;POINT( )') [...])" and stored internally in the respective, geospatially searchable geometry data type.

Phase 1

 * Generation of shapefiles on-the-fly for user-defined region and layer(s). Use 'pgsql2shp' and consider the 2 GByte-per-file limit.
 * Import data from many different sources and shape it into a unified format - first results available (see above).
 * Retrieve exact locations of major river dams and waterfalls (keywords: St. Lawrence, Niagara, Bosporus, Gibraltar, ....).
 * Build joins between VMap0/1 urban areas and GeoNames populated places (via names and geographic vincinity).

Phase 2

 * Rebuild TIGER layers from 2006se release according to these instructions - mostly finished.
 * Generation of static per-country and per-region shapefiles; distribution via HTTP and via geotorrent.org.

Phase 3

 * Design and implement a storage-/data-model for road data that is capable of serving the needs of OpenStreetMap while remaining conformant to OGC-standards. Merge the ideas explained in the OSM New Data Model paper as well as Schuler's OSM on PostGIS initiative - in other words: Try "squaring the circle" ;-)
 * Merge road-layers from VMap1, TIGER (if not already included in OSM) and OSM into the foundation of VMap0 to create a global road network of maximum detail and accuracy.
 * Merge landcover- and stream-layers from VMap1, TIGER and Landsat7-classification into the foundation of VMap0 to create a global landcover dataset of maximum detail and accuracy.

Sequence undefined

 * Wait for agreement on OSGeo metadata recommendation/standard; create metadata sets for different types of data sources and join geospatial data with respective metadata.
 * Limit the "viewing angle" of WMS/WFS-services in order to save the DB-servers' life ....
 * Finish GeoServer WFS-T configuration.
 * Add polygon data that's been automagically retrieved from Landsat7 images at the World Custom Scenery Project - procedure available.

What you can do

 * Help is very much appreciated to add a reasonable colour schema. Work is currently on its way to convert the map display over to the CORINE colour values but this schema doesn't cover all our needs.

Old status

 * see Geodata processing at telascience.org

Blue Marble NG

 * Mapserver as WMS

Status

 * By bittorrent: http://data.freemap.in/view/DataSet/9

Status

 * SRTM30 data is available now by WMS/WCS: http://data.freemap.in/view/DataSet/11

Landsat-7

 * Mapserver as WMS
 * Telascience already serving the raw data - http://onearth.telascience.org/

Status

 * Waiting for disk space to finish unpacking the raw data on zuluviz.sdsu.edu
 * Interim plan is to write a simple WMS wrapper script that generates a GDAL VRT to assemble composites on the fly

Status

 * Create mechanism to have multi-resolution seamless "best of" imagery server Multi-resolution seamless imagery

= Metadata =


 * Geodata Metadata Requirements - a baseline for what would be asked for / expected of data sets in an OSGeo repository
 * A simple metadata-in-GeoRSS webservice that is also as FGDC compliant as possible/desirable.
 * Simple Catalog Interface

= See Also =


 * Geodata Discovery Working Group - half-dormant
 * MapDex offers a metadata search facility
 * GeoNames is a gazetteer and naming collection
 * http://www.fao.org/geonetwork/srv/en/main.search
 * http://geoportail-geoportal.ainc-inac.gc.ca/metaindex_e.asp
 * http://en.wikipedia.org/wiki/List_of_towns
 * http://mapas.mma.gov.br/i3geo/menutemas/servicoswms.php WMS servers collection in RSS format
 * http://mapas.mma.gov.br/i3geo/wscliente.htm Web service client