Benchmarking 2009

Participants

 * Geoserver
 * Andrea Aime
 * Simone Giannecchini
 * MapServer
 * Daniel Morissette
 * Jeff McKenna
 * Paul Ramsey
 * Frank Warmerdam (raster)
 * PostGIS
 * Paul Ramsey
 * Oracle
 * Mike Smith
 * SDE
 * Mike Smith
 * ArcGIS Server
 * Satish Sankaran
 * ERDAS Apollo (unconfirmed)
 * Shawn Owston
 * Mladen Stojic

Basic Premise
The performance shoot-out presentation at FOSS4G2009 will test how long each Web mapping server takes to generate a map image, from a common set of spatial data, on a common platform. The data will be served by each Web mapping server through the WMS standard, which will serve exactly the same set of LAYERS. A JMeter load will be run on the testing box to measure various aspects of those layers.

Rules of Engagement

 * 1) All parties must contribute any changes that they make to their software for this exercise, back to their community.
 * 2) Comparisons will be made of the latest released software version (i.e. "stable") and also development versions of the software (i.e. "open").  For example, MapServer 5.4.2 will be tested as well as MapServer SVN trunk.  Similarly GeoServer 1.7.6 will be tested as well as GeoServer SVN trunk.
 * 3) Data formats used must be shapefiles for vectors, geotiff files for rasters, and PostgreSQL/PostGIS for database.  If you wish to test other formats/backends (such as SDE or ECW) you may do so, but that is optional for all parties.  See below for access to the required data.
 * 4) Output formats are: jpeg for imagery, gif for non-antialised vector provided we do any, and png24 for antialiased
 * 5) Nearest Neighbor resampling should be used for raster image display

Coordination
Coordination is primarily via the Benchmarking mailing list: http://lists.osgeo.org/mailman/listinfo/benchmarking

Weekly meetings will occur through IRC chat in the #foss4g channel on irc.freenode.net

Next IRC meeting

 * Wed Sept 23rd: http://timeanddate.com/worldclock/fixedtime.html?year=2009&month=09&day=23&hour=15&min=0&sec=0
 * Agenda:
 * moving scripts/files into SVN on BenchmarkingA server
 * FYI data download available
 * new vector data available on BenchmarkingA server (/opt/data/TIGER-2008)
 * what layers to use
 * what scales should be styled

Previous Meetings

 * Wed Sept 16th: http://timeanddate.com/worldclock/fixedtime.html?year=2009&month=09&day=16&hour=15&min=0&sec=0
 * Agenda:
 * Directory structure in /opt/ ?
 * Status of ERDAS
 * status of home for the benchmarking files: http://trac.osgeo.org/osgeo/ticket/435
 * feedback on vector styling
 * Minutes
 * msmith: will setup SDE+Oracle on RedHat by Tues
 * augusttown: will draft gnis_pop & tiger_trac styling
 * jmckenna: possibly have maptools.org host the benchmarking files?
 * frankw: press release should for now include GeoServer, MapServer, ESRI, and 'other servers' (until ERDAS confirms)


 * Wed Sept 9th: http://timeanddate.com/worldclock/fixedtime.html?year=2009&month=09&day=09&hour=15&min=0&sec=0
 * Minutes:
 * jmckenna: Need confirmation from ERDAS so press release can move forward
 * msmith: can setup rhel 5 64 bit with sde 9.3.1 and oracle 11g 64 bit for SDE (need confirmation from ESRI that this setup is OK)
 * aaime: need feedback on styling (icon for point styling, for example)
 * aaime: all should use mailing list more often, for feedback
 * aaime: proposed roads style on mailing list
 * augusttown: are we ok with having a separate machine for database?
 * aaime: postgres should be on that same machine (jmckenna agrees)
 * augusttown: please post possible operating system options for database machine and we will get back to you
 * msmith: pramsey do you want to install postgres/postgis on new db machine?
 * jmckenna: setup an OSGeo server for downloading benchmarking data (ticket)
 * msmith: local port forwarding is ok for doing the testing or do I need to set up the NAT to handle some of the web traffic out?


 * Wed Sept 2nd: http://timeanddate.com/worldclock/fixedtime.html?year=2009&month=09&day=02&hour=14&min=0&sec=0
 * Agenda:
 * update on ERDAS involvement
 * what other web server teams have been contacted? Should more be contacted?
 * work on a press release: http://lists.osgeo.org/pipermail/benchmarking/2009-August/000107.html
 * output format (jpeg, png 24 bit, png 8 bit, gif?)
 * antialiasing (enabled, or not?)
 * interpolation (nearest neighbour, bilinear, bicubic?)
 * specifics on raster formats tests, especially the TIFF part (multilayers, mosaicked, ...)


 * Wed Aug 26
 * discussion logs: http://logs.qgis.org/foss4g/%23foss4g.2009-08-26.log
 * summary:
 * Frank: to confirm with ESRI about Operating System (RHEL or CentOS)
 * Mike: to update wiki with specs when OS switch is done (if required)
 * discussion on how to test (OGC versus native access)...WMS was agreed upon
 * Daniel: volunteers to look at MapServer's WMS vs CGI performance (which is optional)
 * Data: we'll use same data as previous years (but plan to update data to next TIGER or OSM release next year)
 * all agreed to use shapefiles, geotiffs, and PostGIS. to be verified with ESRI.
 * Jeff: to review Andrea's progress re: layer stying
 * foss4g2009: Andrea and Jeff will co-present. (plus an ESRI representative??)
 * press release needs to be drafted: http://lists.osgeo.org/pipermail/benchmarking/2009-August/000107.html

Hardware
Machine A (server)
 * System Type:  Dell PowerEdge 2850
 * Ship Date:   10/21/2004
 * 2 X Processor, 80546K,  3.4G, 1M, Xeon Nocona, 800
 * 4 X DIMM, 2G 400M, 128X72, 8, 240, 2RX4
 * 6 X Hard Drive, 73GB, SCSI, U320, 15K
 * Service Tag:   16GRV51
 * OS: Centos 5.3 + updates

Machine B (testing box)
 * System Type: Dell PowerEdge 1750
 * Ship Date:   9/23/2003
 * 1 x Processor, 80532K, 3.06GHz, 512K 533
 * 2 x Dimm, 512, 266M, 64X72, 8K, 184, 1U
 * 2 x Hard Drive, 300Gb, SCSI, U320, 10K
 * OS: Centos 5.3 + updates

Layers

 * texas_roads_unstyled This test exercises raw data access speed. The lines are black, the background is white, there are a hundred or so lines in each output image. The output is a two-color PNG8 image.
 * texas_roads_styled This test exercises the renderer a bit more. The lines will be "pipe-styled" and different road types will have different styles.
 * texas_roads_styled_scale This test exercises the renderer a bit more. The layer should be a single table/shapefile, the styling would just draw the highways up to a certain scale, and all the roads afterwards. The stress test shall jump between zoom levels to assess the rendering is smart enough to read only the necessary data.
 * texas_roads_styled_labeled This test exercises the render and the label algorithms. Combine the above style with road labels. Labels will follow curves.  TODO(Define the expected labeling look. We know that exact matches between engines will be impossible, this will be a "best efforts" test)
 * point_layer_styled This test exercises the ability of the renderer to build a map in which points are symbolized with a certain number of externally provided icons (png/svg) depending on some point attribute.
 * polygon layers Include one or two test layers to exercise polygon rendering, with solid fill or hatch patterns. For this test the type of classification matters less than the number and complexity of the polygons (number of vertices and rings) which is the aspect we should work on. e.g. render maps with a few complex polygons vs other maps with lots of small polygons.
 * single big ECW file (actual data TBD). This test exercises the ability to read a single wavelet compressed file and return it in JPEG format.
 * file system mosaic of TIFF tiles. Generated out of the ECW file by splitting it into a sizeable amount of tiles, this test checks the ability of the server to efficiently retrieve data from a large collection of images and deal the associated file handling issues (ulimit and the like). It can also serve as a comparison with ECW results.

Data
Data will reside in /opt/data on the testing server. Do not add any data to this directory without also describing it fully on this page.


 * Vector data (original):
 * gnis_names.shp (POINT, EPSG:4326) US named feature points.
 * states.shp (POLYGON, EPSG:4326) US states and demographics.
 * tiger_shp.shp (LINESTRING, EPSG:4326) TIGER roads for Texas.
 * tiger_tracts.shp (POLYGON, EPSG:4326) TIGER census tracts for USA.


 * Vector data (TIGER 2008 of Texas)
 * separated into individual counties (like the TIGER release)
 * /48_TEXAS/County/arealm.shp (POLYGON, EPSG:4269) area landmarks (e.g. parks) for counties.
 * /48_TEXAS/County/areawater.shp (POLYGON, EPSG:4269) area water(e.g. lakes) for counties.
 * /48_TEXAS/County/edges.shp (LINE, EPSG:4269) linework (e.g. roads, rivers) for counties.
 * /48_TEXAS/County/pointlm.shp (POINT, EPSG:4269) point landmarks (e.g. hospital, airport) for counties.
 * /48_TEXAS/county_tileindex.shp (POLYGON, EPSG:4269) county index file used by MapServer.
 * merged into single files for the entire state (required for servers that cannot load many .shp files as a single entity)
 * arealm_merge.shp (POLYGON, EPSG:4269) area landmarks (e.g. parks) for entire state.
 * areawater_merge.shp (POLYGON, EPSG:4269) area water(e.g. lakes) for entire state.
 * edges_merge.shp (LINE, EPSG:4269) linework (e.g. roads, rivers) for entire state.
 * pointlm_merge.shp (POINT, EPSG:4269) point landmarks (e.g. hospital, airport) for entire state.
 * general
 * /48_TEXAS/tl_2008_48_place.shp (POLYGON, EPSG:4269) places (populated areas) for entire state.
 * /tl_2008_us_county.shp (POLYGON, EPSG:4269) county outlines for entire state.
 * /tl_2008_us_state.shp (POLYGON, EPSG:4269) US state outlines.
 * Raster data
 * world-topo-bathy-200409-3x86400x43200.ecw (ECW, EPSG:4326) Medium size ECW file (Bluemarble Next Generation, whole world)
 * world-topo-bathy-200409-3x86400x43200.tif The same as a tiled BigTIFF, with overviews.
 * world-topo-bathy-*-*.tif The same split into eight non-BigTIFF GeoTIFFs, with overviews. world-topo-bathy.{shp,dbf,shx} is a tileindex.
 * PostGIS data
 * dbname=benchmark user=postgres password=postgres port=5432 host=localhost
 * tables: public.gnis_names public.states public.tiger_shp public.tiger_tracts
 * geometry column: "the_geom" in all cases
 * srid: 4326 in all cases

Download

 * http://www.maptools.org/foss4g/ (user/pass: foss4g/foss4g)
 * raster-data.zip (.4 GB)
 * vector-data-original.zip (.5 GB) (old tiger data, merged, for Texas)
 * vector-data-tiger08-tx-counties.zip (1.1 GB) (TIGER 08, stored by county, for Texas)
 * vector-data-tiger08-tx-merged.zip (1.1 GB) (TIGER 08, merged, for Texas)

Software
For those of us who have to build our software, the builds will be staged in /opt/build

Libraries

 * GEOS 3.1.1 installed in /usr/local (pramsey)
 * Proj4 SVN trunk installed in /usr/local (pramsey)
 * libecw installed in /usr/local (pramsey)
 * GDAL 1.6.1 installed in /usr/local (pramsey)
 * AGG 2.5 installed in /usr/local (pramsey)
 * FastCGI installed in /usr/local (pramsey)

Databases

 * PostgreSQL installed in /usr/local/pgsql (pramsey)
 * PostgreSQL 8.3.7
 * PostGIS 1.4.0
 * PGDATA in /home/postgres/data

GeoServer related

 * Sun JDK 1.6.0_16 with JAI, JAI Image-IO and ImageIO-ext native extensions installed in (/opt/jdk1.6.0_16) (aaime)
 * GeoServer binaries and configurations (/opt/geoserver) (still incomplete at the moment) (aaime)

Testing machine

 * Plain Sun JDK 1.6.0_16 (/opt/jdk1.6.0_16) (aaime)
 * JMeter (/opt/jakarta-jmeter-2.3.4) (aaime)
 * symbolic link to start jmeter in /usr/local/bin/jmeter (instructions on how to use it will follow) (aaime)
 * /usr/local/bin/wms_requests.py, simple tool to generate random bounding boxes and sizes to drive JMeter (aaime, tool developed by Frank W.)

SVN
The project files (minus data) will be stored in Subversion (http://svn.osgeo.org/osgeo/foss4g/benchmarking/).

Press Release
FOSS4G_2009_Press_Release_32