Difference between revisions of "Geodata formats"

From OSGeo
Jump to navigation Jump to search
Line 8: Line 8:
 
=== Willing Developers/Supporters ===
 
=== Willing Developers/Supporters ===
 
* List developers willing to add interfaces to new format or people willing to fund this development
 
* List developers willing to add interfaces to new format or people willing to fund this development
 +
 +
=== Preliminary Thoughts ===
 +
 +
A dump of my email text to get things started. Please feel free to edit.
 +
 +
The new format --
 +
 +
* '''Should be fast.''' SQLite is plenty fast, and anything that simply "extends" the Shapefile format to inject relational capabilities should be pretty fast. It should definitely be faster than a geodatabase format (such as PostGIS/ArcSDE) and perhaps even faster than Shapefiles especially while accessing attribute data. DBF is sequential, and searching for textual information is particularly expensive. SQLite has been tuned to excellence. I have been working with it for a few years now, and it really is an amazing product, development community, support, and capabilities. That it is in public domain makes for a transfat-free icing on the cake.
 +
* '''Should be unencumbered by licenses and copyrights.''' Ideally, the new format could also be put back into public domain. We want to remove all encumbrances to encourage rapid and wide adoption.
 +
* '''Should be a single file'''. Well, some like multiple files and some like single files. We can achieve both objectives by using a tar-gzipped packaging such as Apple tends to use for much of its stuff (for example, its Pages wordprocessor uses a tgzipped xml file along with other resources for icons and pictures and stuff). Or, if speed is going to be affected because of gzipping and gunzipping, just a package format (I have no idea if this is a Unix thing or a Mac OS thing -- we, in the Mac world, call them packages... they appear like files in the Finder, and like directories in the shell).
 +
* '''Should be easy to transition to.''' By building the new format on the structure of the Shapefile format, and *in fact*, calling it "open shapefiles" or some such thing, we indicate from its name that the transition is not that revolutionary but is evolutionary. This, hopefully, will bring some name-familiarity, and make the transition less scary.
 +
* '''SQLite's lack of datatypes as an issue''' -- I guess that is a matter of preference. I personally quite like that freedom as it gives me, the application developer, complete control over what goes where. SQLite actually does have now a few datatypes that it respects, but doesn't complain about. Since all users will be accessing the data via an application, as long as the application is well defined, it should be fine.
 +
* SQLite excels at one thing that it has been entrusted to do -- retrieve data that it has been entrusted with at extremely fast speeds, and maintain ACID data integrity in case of a programmatic catastrophe. The transactions themselves are worth their price of admission, which, happily, happens to be zero.
 +
* '''Java support'''  -- well, yes, use/work on SQLite JDBC. I have been using it for a few days now and find it to be a pretty competent conduit. Extend it, spatialize it. ANSI standard C is still that magic common denominator that compiles and works predictably on most number of systems. I have a lot against Java, but those who love Java should definitely work on tools for accessing and working with this new format as it would only make the format more widely used and adopted.

Revision as of 12:25, 13 November 2007

Placeholder for talking about geodata formats. Please create content.

New Open Geodata Format

Why

Add details of why we need a new format

Scope

  • Describe Scope of new format

Willing Developers/Supporters

  • List developers willing to add interfaces to new format or people willing to fund this development

Preliminary Thoughts

A dump of my email text to get things started. Please feel free to edit.

The new format --

  • Should be fast. SQLite is plenty fast, and anything that simply "extends" the Shapefile format to inject relational capabilities should be pretty fast. It should definitely be faster than a geodatabase format (such as PostGIS/ArcSDE) and perhaps even faster than Shapefiles especially while accessing attribute data. DBF is sequential, and searching for textual information is particularly expensive. SQLite has been tuned to excellence. I have been working with it for a few years now, and it really is an amazing product, development community, support, and capabilities. That it is in public domain makes for a transfat-free icing on the cake.
  • Should be unencumbered by licenses and copyrights. Ideally, the new format could also be put back into public domain. We want to remove all encumbrances to encourage rapid and wide adoption.
  • Should be a single file. Well, some like multiple files and some like single files. We can achieve both objectives by using a tar-gzipped packaging such as Apple tends to use for much of its stuff (for example, its Pages wordprocessor uses a tgzipped xml file along with other resources for icons and pictures and stuff). Or, if speed is going to be affected because of gzipping and gunzipping, just a package format (I have no idea if this is a Unix thing or a Mac OS thing -- we, in the Mac world, call them packages... they appear like files in the Finder, and like directories in the shell).
  • Should be easy to transition to. By building the new format on the structure of the Shapefile format, and *in fact*, calling it "open shapefiles" or some such thing, we indicate from its name that the transition is not that revolutionary but is evolutionary. This, hopefully, will bring some name-familiarity, and make the transition less scary.
  • SQLite's lack of datatypes as an issue -- I guess that is a matter of preference. I personally quite like that freedom as it gives me, the application developer, complete control over what goes where. SQLite actually does have now a few datatypes that it respects, but doesn't complain about. Since all users will be accessing the data via an application, as long as the application is well defined, it should be fine.
  • SQLite excels at one thing that it has been entrusted to do -- retrieve data that it has been entrusted with at extremely fast speeds, and maintain ACID data integrity in case of a programmatic catastrophe. The transactions themselves are worth their price of admission, which, happily, happens to be zero.
  • Java support -- well, yes, use/work on SQLite JDBC. I have been using it for a few days now and find it to be a pretty competent conduit. Extend it, spatialize it. ANSI standard C is still that magic common denominator that compiles and works predictably on most number of systems. I have a lot against Java, but those who love Java should definitely work on tools for accessing and working with this new format as it would only make the format more widely used and adopted.