Database Output Storage for PyWPS

Introduction
My name is Jan Pisl and I study at the Czech Technical University in Prague. I will be participating in this year's GSoC. My project’s aim is to develop database storage support for PyWPS that would allow output data to be stored in a database in a way that is compliant with the OGC WPS Standard. I will be using Python 3.6.

Project
Title: Database Output Storage for PyWPS

Student: Jan Pišl

Mentors: Jáchym Čepický, Martin Landa.

Project's repositories: GitHub repository, Project's page on GSoC website

PyWPS
PyWPS is a server side implementation of the OGC Web Processing Service (OGC WPS) standards 1.0.0. It is written in the Python programming language, it runs on Python 2.7, 3.3 or higher and it is tested and developed on Linux. It uses a ConfigParser format for configuration files. It supports a variety of geospatial software and tools such as GRASS GIS, R Project or the GDAL library. Synchronous and asynchronuous invocations are supported. As for request encoding, two options are available - key-value pairs (using HTTP-GET) or XML payload (using HTTPPOST). Every process that is to be deployed on the server is defined as a class and has several mandatory parameters. The key parameter called "handler" gets invoked every time there is an incoming request, it accepts the request and returns a response.

In 2016, it upgraded from PyWPS 3 to PyWPS 4. Some of the more significant changes include every input being considered a list of inputs and all inputs having file, data and stream attributes. These attributes allow better manipulation with data.

Python
Python is a high-level programming language that fully supports object-oriented and structured programming. Developed in the late 1980s, the first version 0.9.0 was released in 1991. In 2008, Python 3.0 was released. Currently, the most up-to-date version available is 3.6. It was designed as a syntactically simple language, using whitespace intendantion instead of brackets and English words rather than punctuation. It is a dynamicallytyped language, which means it is not neccessary to specify a data-type when defining a variable. For its simplicity and readability, Python is often considered a good first programming language to learn. One of the key advantages of Python is its high extensibility. It provides large standard libraries and also an extensive number of other modules, packages and libraries, so most of the common programming tasks are already solved, scripted and made available.

Bonding Period
Bonding Period Report

Week 1
'''What did you get done this period? '''

I defined the first version of “DbStorage”, a class that will allow output data to be stored in one of RDBMSs supported (PostGIS, MS Server, Oracle spatial, SpatiaLite) [1] I downloaded PostgreSQL and PostGIS and run it locally on my computer. I tested it functions correctly by using code I had written previously to GSoC [2] that stores output data in PostGIS. I run some of PyWPS test processes and checked output data does get stored in PostGIS.

'''What do you plan on doing next week (period) ? '''

Implement PGStorage, a class within the DbStorage class that will store output data to PostGIS and returns database name, schema and table as a reference to the client. Start working on implementing MSServerStorage, OciStorage and SpatiaLiteStorage. Download, run and test SpatiaLite.

'''Are you blocked on anything? '''

No.

[1] https://github.com/janpisl/pywps/commit/45b98e04c428c2e62643a42bbebfedf8efc0603e

[2] https://github.com/ctu-geoforall-lab-projects/bp-pisl-2018-pywps/commit/c364be4888d74b2285e5a348afc6c3fe184fbc64

Week 2
'''What did you get done this period? '''

Develop and implement PGStorage, a class within the DbStorage class that will store output data to PostGIS and returns database name, schema and table as a reference to the client.

Study materials on Factory design pattern [2] recommended by my mentor.

Download, run and test SpatiaLite.

'''What do you plan on doing next week (period) ? '''

Develop, implement and test _MSSQLStorage and _OciStorage _MySQLStorage classes

'''Are you blocked on anything? '''

I have some questions related to work I will be doing next week. I will consult with my mentor.

[1] https://github.com/janpisl/pywps/commit/254b1e5697c0b4781544a6329b52eb4688e60a22

[2] http://python-3-patterns-idioms-test.readthedocs.io/en/latest/Factory.html

Week 3
'''What did you get done this period? '''

- Following my mentor's advice, I made a decision that _MSSQLStorage, _OciStorage _MySQLStorage classes will not be implemented at this stage. There will only be PgStorage and SQLiteStorage that store data in a PostgreSQL and SQLite databases, respectively

- Define DbStorageAbstract, an abstract class from which PgStorage and SQLiteStorage will inherit [1]

- Develop and implement SQLiteStorage [2]

- Fix/Update DbStorage class [3]

- Update outputs.py so complex output data is either stored as files or in a database depending on configuration [4]

'''What do you plan on doing next week (period) ? '''

- Develop a mechanism for handling raster output data and implement it.

- Decide how other types of data (such as string, table, CSV, etc.) will be handled and implement it.

'''Are you blocked on anything? '''

There is an error occuring when I run PyWPS and I haven't been able to find out what is causing it. I will discuss this with my mentor on a call on Monday morning.

[1] https://github.com/janpisl/pywps/commit/89ab12a2fd883a0df6c070a4ca573f89b6c31a5a

[2] https://github.com/janpisl/pywps/commit/864d275fe8c47e41c07f8278b8037e80a0b0b838

[3] https://github.com/janpisl/pywps/commit/c0d698cefde9a5085cd7d76bb0e32156a6391880

[4] https://github.com/janpisl/pywps/commit/7f00fa06ac5fd7823d79a363f6163dcc38e7bbb0

Week 4
'''What did you get done this period? '''

-I changed the structure of the code. Classes PgStorage and SQLiteStorage are now stored in separate files in the db folder. FileStorage is in a separate file, too. Rest of the code is in the __init__ files. [1]

- I added the data_type attribute to inout/formats/__init__.py. At the moment, allowed values are either vector or raster using the enum module. This was done so when storing output data it can be differed between raster and vector data and appropriate method can be used. [2]

- The Dbstorage class has been simplified. Instead of creating an instance of the class in outputs.py and calling the get_db_type method in the constructor, get_db_type is now a static method and gets called directly in outputs.py [3]

- PgStorage accesses the data_type value and calls appropriate method (store_vector_output or store_raster_output) [4]

- I implemented the store_raster_output method in the PgStorage class [5]

'''What do you plan on doing next week (period) ? '''

- Implement store_raster_output for SQLiteStorage

- Implement another method for storing other types of data (string, CSV, etc.)

- Write a script that tests whether different data types get stored correctly

- Submit the Phase 1 evaluation.

'''Are you blocked on anything? '''

No.

[1] https://github.com/janpisl/pywps/commit/18c46864ff75fb01e3c7d76f3f4a1b6d2457a22f[[2] https://github.com/janpisl/pywps/commit/4bb079bfcde0b4fcc42393f531e86c47a91507d4

[3] https://github.com/janpisl/pywps/commit/80c61c92e183c06af741fe94d0b172f9a5940609

[4] https://github.com/janpisl/pywps/commit/e024b1e5d1eb8df1a79989a59b14276b2757d98d

[5] https://github.com/janpisl/pywps/commit/a09dfc61729b36eb81ddd0d9adb5bda2effe3029

Week 5
'''What did you get done this period? '''

- I updated the store_vector_output method in the SQLite class [1]

- I implemented a method for storing raster outputs in SQLite [2]

- I implemented a method for storing output data of other datatypes in PostgreSQL [3]

'''What do you plan on doing next week (period) ? '''

- Write test_storage, a unit test that would test all storage options and their correct functionality

- Handle metadata

'''Are you blocked on anything? '''

No.

[1] https://github.com/janpisl/pywps/commit/44e9130c8bbb73841c51a09266a9db62ff2809f9

[2] https://github.com/janpisl/pywps/commit/5bf8eb883343f0caacd47470bf79152ac751d480

[3] https://github.com/janpisl/pywps/commit/50cd11ffa2486ee3de7cac3340874179d81d4e2e

Week 6
'''What did you get done this period? '''

- Database schema gets created automatically when the DbStorage class is initialized. [1]

- I developed "test_storage" module for testing storing output data [2]

- I implemented unittest classes for testing file storage [3], PostgreSQL storage [4] [5] [6] and SQLite [7] [8]

- Some other minor changes and updates to the code and fixing bugs [9]

'''What do you plan on doing next week (period) ? '''

- Address issues revealed by testing

- Handle metadata

'''Are you blocked on anything? '''

No.

[1] https://github.com/janpisl/pywps/commit/524d65f2cb5cdfe5f0bbdf769a9850c32886952e

[2] https://github.com/janpisl/pywps/commit/1c36dd1ee3b1c61d819b92f5a195792f488b1193

[3] https://github.com/janpisl/pywps/commit/2896d7670ed2f38470eab54790ed2e822c42583d

[4] https://github.com/janpisl/pywps/commit/97c8255a9ea3ad52a022d84315eb4e725afe4fc6

[5] https://github.com/janpisl/pywps/commit/16afbea95f1fea31dd3ac9e7a30d63b5d309ad12

[6] https://github.com/janpisl/pywps/commit/b08dee2644888379d08c25919d112b0e29f2423d

[7] https://github.com/janpisl/pywps/commit/c406e21d8ee844e468fe672d236285664a8fde76

[8] https://github.com/janpisl/pywps/commit/a1493e70024d06b3d29d1522edca64af4574de0b

[9] https://github.com/janpisl/pywps/commits/master