Benchmarking 2010/Constellation-SDI

This page describes the experience of the Constellation-SDI team during the FOSS4G 2010 Benchmarking effort.

Benchmark design

We expended a great deal of effort attempting to understand how a benchmarking effort could be designed properly.

The 2009 and 2010 efforts were undertaken in the naive belief that one simply sets up different servers on the same data and makes requests of the servers to compare response times. Belatedly, the 2010 effort is demonstrating that a proper benchmark is more complex, since the data might be in a format useful for a certain class of use cases but meaningless for another scale of usage, the testing can easily be hardware bound limiting useful comparison between servers, servers can be set up to be doing very different work especially in 'best effort'/anything goes configurations, and results can be compared only in superficial ways or in very narrow types of requests. In order to tackle these issues rather than merely pretend they were not serious, we examined what it would take to develop a useful benchmarking protocol, either to stress all the functionality of one particular server or to compare the performance and abilities of various, arbitrary WMS servers.

Developing a WMS benchmarking design which provides useful, comparative metrics of server performance is exceedingly hard. In a recent presentation at the Java Language Summit, Joshua Bloch presented a talk entitled Performance Anxiety which describes the impossibility of developing performant software from first principles in any language due to the enhancements of compilation and machine instruction re-ordering, the necessity of testing to obtain concrete results, and the difficulty of developing proper, statistically rigourous testing metrics.

Since these issues were apparent to us even before this effort, we have been developing tools, benchmark designs and analytic methodologies to test the Constellation-SDI server. This work has been greatly extended during the FOSS4G 2010 benchmarking effort and expanded to consider how to test different WMS servers, possibly built for different uses.

Unfortunately, there is still much distance to go before achieving a solid benchmarking suite. This work will undoubtedly be continued in the future, most likely within the framework of Open Geospatial Consortium (OGC) testing.

Enhancements

This section describes enhancements due to the work during the benchmarking 2010 effort, including improved understanding and workflow by the Constellation-SDI team and ameliorations to the code bases of the Geotoolkit.org library and to the Constellation-SDI server itself.

Benchmarking

Investigate numerous issues with jmeter.
Design simpler scripts.
Examine different configurations to stress different aspects of the WMS server experience.

Geotoolkit

Referencing: fix inverse projection for fake spherical mercator.
Referencing: accelerate raster reprojection.

Coverage: Create a reader for GeoTiff images.

Shapefile: reduce memory usage when leveraging a quad-tree index.
Shapefile: reduce styling to one single pass when painting by symbol rather than by feature.
Shapefile: reduce the reading of non-necessary parts of the files.

DataSource: Enable startup from a coverage mosaic, either a folder or a manager.
DataSource: Create a GeoTiff reader.

Renderer: bypass rendering engine for single raster requests.
Renderer: improve decimation algorithm for vector layers.
Renderer: switch to OGC conformant 96dpi assumption rather than industry standard 72.
Renderer: fix sld parsing errors to handle <litteral> or its absence.
Renderer: optimize the colour model selected for multiple inputs.

Constellation-SDI

Configuration: greatly enhance configurability of the server, with hot reload of data, styles and rendering configuration.

Server: cache ServiceMetadata document for insanely slow data sources.
Server: fix envelopes for data sources in ServiceMetadata document.

Backend: fix multi-threading bug to use classes in a thread-safe manner.

JEE output: enable direct writing of images into output stream.

GUI: build a prototype interface.

Performance Results

This section details the results of running the jmeter scripts against the Constellation-SDI WMS server.

Note: All values reported are in units of responses per second taken from the "Throughput" column of the 'summarizer.py' script. The values are those of the third pass, after the jmeter scripts have looped through the two warmup passes for the various thread counts, from one to sixty-four, and then we measure the last pass.

Session 2010.09.03

This was the first full run of Constellation-SDI using the jmeter scripts.

The runs were performed with the more recent jmeter design where all three runs use different requests so that the server is always asking for new files from disk.

For lack of time, a second run was only done for the two raster request sets; nonetheless, the numbers give us a ballpark estimate of variability between runs.

Raster Results for 2010.09.03
Threads	25831				3857
Threads	Run 1	Run 2	Run 3	Run 4	Run 1	Run 2	Run 3	Run 4
1	4.5	5.7	---	---	5.4	5.4	---	---
2	6.3	6.1	---	---	6.0	5.7	---	---
4	5.1	4.9	---	---	4.7	4.8	---	---
8	4.8	4.6	---	---	4.6	4.5	---	---
16	4.5	4.7	---	---	4.6	4.6	---	---
32	5.2	5.0	---	---	4.8	4.8	---	---
64	4.9	4.8	---	---	4.8	4.6	---	---

Vector Results for 2010.09.03
Threads	4326				3857
Threads	Run 1	Run 2	Run 3	Run 4	Run 1	Run 2	Run 3	Run 4
1	1.5	---	---	---	1.5	---	---	---
2	2.1	---	---	---	2.1	---	---	---
4	2.1	---	---	---	2.3	---	---	---
8	2.2	---	---	---	2.3	---	---	---
16	2.1	---	---	---	2.2	---	---	---
32	2.2	---	---	---	2.3	---	---	---
64	1.8	---	---	---	1.9	---	---	---

The processing power (8 CPUs) of the machine does not seem to have been stressed at any point in the test runs.

The numbers differ in numbers generated on local servers which had a twice as big separation between vector and raster results. Variability also seems high enough that several runs would be needed to discriminate between the various configurations, enough so that we probably need to tighten up the testing to get anything meaningful from numbers such as these.

Session NEXT

This is a placeholder and template for future runs

Raster Results
Threads	25831				3857
Threads	Run 1	Run 2	Run 3	Run 4	Run 1	Run 2	Run 3	Run 4
1	---	---	---	---	---	---	---	---
2	---	---	---	---	---	---	---	---
4	---	---	---	---	---	---	---	---
8	---	---	---	---	---	---	---	---
16	---	---	---	---	---	---	---	---
32	---	---	---	---	---	---	---	---
64	---	---	---	---	---	---	---	---

Vector Results for 2010.09.03
Threads	4326				3857
Threads	Run 1	Run 2	Run 3	Run 4	Run 1	Run 2	Run 3	Run 4
1	---	---	---	---	---	---	---	---
2	---	---	---	---	---	---	---	---
4	---	---	---	---	---	---	---	---
8	---	---	---	---	---	---	---	---
16	---	---	---	---	---	---	---	---
32	---	---	---	---	---	---	---	---
64	---	---	---	---	---	---	---	---