Database Output Storage for PyWPS

From OSGeo
Revision as of 12:23, 15 July 2018 by Wiki-Janpisl (talk | contribs)
Jump to navigation Jump to search

Introduction

My name is Jan Pisl and I study at the Czech Technical University in Prague. I will be participating in this year's GSoC. My project’s aim is to develop database storage support for PyWPS that would allow output data to be stored in a database in a way that is compliant with the OGC WPS Standard. I will be using Python 3.6.

Project

Title: Database Output Storage for PyWPS

Student: Jan Pišl

Mentors: Jáchym Čepický, Martin Landa.

Project's repositories: GitHub repository, Project's page on GSoC website

Technology

PyWPS

PyWPS is a server side implementation of the OGC Web Processing Service (OGC WPS) standards 1.0.0. It is written in the Python programming language, it runs on Python 2.7, 3.3 or higher and it is tested and developed on Linux. It uses a ConfigParser format for configuration files. It supports a variety of geospatial software and tools such as GRASS GIS, R Project or the GDAL library. Synchronous and asynchronuous invocations are supported. As for request encoding, two options are available - key-value pairs (using HTTP-GET) or XML payload (using HTTPPOST). Every process that is to be deployed on the server is defined as a class and has several mandatory parameters. The key parameter called "handler" gets invoked every time there is an incoming request, it accepts the request and returns a response.

In 2016, it upgraded from PyWPS 3 to PyWPS 4. Some of the more significant changes include every input being considered a list of inputs and all inputs having file, data and stream attributes. These attributes allow better manipulation with data.

Python

Python is a high-level programming language that fully supports object-oriented and structured programming. Developed in the late 1980s, the first version 0.9.0 was released in 1991. In 2008, Python 3.0 was released. Currently, the most up-to-date version available is 3.6. It was designed as a syntactically simple language, using whitespace intendantion instead of brackets and English words rather than punctuation. It is a dynamicallytyped language, which means it is not neccessary to specify a data-type when defining a variable. For its simplicity and readability, Python is often considered a good first programming language to learn. One of the key advantages of Python is its high extensibility. It provides large standard libraries and also an extensive number of other modules, packages and libraries, so most of the common programming tasks are already solved, scripted and made available.


Schedule

Dates Task Deliverables
Bonding Period (April, 23 - May, 14)

Initial research on GDAL, MapServer, SQLAlchemy, GeoAlchemy, relational databases (PostgreSQL, Oracle, Microsoft SQL Server, MySQL, SQLite,...)

Select which databases will be supported in this project.

Set up environment for work - virtual environment with a distribution of Linux, instances of selected database management systems for testing.

Bonding Period Report
Week 1 (May, 14 - May, 20)

Write a test to ensure newly implemented functionality will work correctly.

Define “DbStorage”, an abstract base class that will ensure derived classes implement particular methods from this class. Define abstract methods that are necessary for writing output data to a remote database.

Week 2 (May, 21 - May, 27)

Write test script that checks “PgStorage” will work correctly and write output vector data to a PostgreSQL database.

Develop “PgStorage“ class that inherits from “DbStorage” and implements proposed functionality.

Week 3 (May, 28 - June, 3)

Write a test (or update test written in the previous week) to ensure correct functionality of newly developed classes.

Develop several classes based on selection in week 1 – e.g. “MSSQLStorage”, “OCIStorage”, “MySQLStorage”, etc.

Week 4 (June, 4 - June, 10)

Write a test to ensure newly implemented functionality will work correctly.

Develop a mechanism for handling raster output data and implement it in the above-mentioned classes. If GDAL does not support writing raster data to a specific database by default, appropriate tool or library must be identified. For example, for writing raster data into PostgreSQL, ”raster2pgsql“ tool can be used.

Decide how other types of data (such as string, table, CSV, etc.) will be handled and implement it.

Week 5 (June, 11 - June, 15) Phase 1 Evaluation

Write a test that checks all data types will get correctly stored in database.

Write and submit weekly report.

Write and submit Phase 1 evaluation.

Deliverables for Phase 1: Correctly functioning and tested module that enables output data (vector, raster or other) to be stored in one of several database management systems.

Week 6 (June, 16 - June, 24)

Write a test to ensure newly implemented functionality will work correctly.

Handle metadata. Currently, logs with information about run processes (such as when a process started and finished) are stored in a SQLite database. Implement a functionality that accesses logs, extracts desired metadata from them and saves them in the same database as output data. Use unique identifier of every process to join metadata (using primary key) with the corresponding output data (foreign key). For each run process, include expiration date to its output data.

Develop a trigger that removes output data when it expires.

Week 7 (June, 25 - July, 1)

Write a test to ensure newly implemented functionality will work correctly.

Establish a connection between database and MapServer and allow MapServer to access output data stored in database.

Week 8 (July, 2 - July, 8)

Write a test to ensure newly implemented functionality will work correctly.

Enable output data to be viewed as WMS services through MapServer.

Edit current code so the reference embodied in the responding document is a URL link to a running MapServer service.

Week 9 (July, 9 - July, 13) Phase 2 Evaluation

Write and submit Phase 2 evaluation.

Deliverables for Phase 2: Correctly functioning and tested module that enables output data to be viewed as a Web Map Service using MapServer.

Week 10 (July, 14 - July, 22)

Update previously developed test to ensure newly implemented functionality will work correctly.

Enable raster output data to be accessed and/or edited as OGC Web Coverage Service.

Week 11 (July, 23 - July, 29)

Update previously developed test to ensure newly implemented functionality will work correctly.

Enable vector output data to be accessed and/or edited as OGC Web Feature Service.

Week 12 (July, 30 - August, 5)

Write a test that ensures newly implemented functionality will work correctly.

Develop a demo that demonstrates implemented functionality.

Week 13 (August, 6 - August, 14) Create documentation that can be added to the current PyWPS documentation.

Finalize work, clean code and prepare a pull request(s) to PyWPS GitHub repository.

Write and submit final report.


Reports

Bonding Period

Bonding Period Report

Week 1

What did you get done this period?

I defined the first version of “DbStorage”, a class that will allow output data to be stored in one of RDBMSs supported (PostGIS, MS Server, Oracle spatial, SpatiaLite) [1] I downloaded PostgreSQL and PostGIS and run it locally on my computer. I tested it functions correctly by using code I had written previously to GSoC [2] that stores output data in PostGIS. I run some of PyWPS test processes and checked output data does get stored in PostGIS.



What do you plan on doing next week (period) ?

Implement PGStorage, a class within the DbStorage class that will store output data to PostGIS and returns database name, schema and table as a reference to the client. Start working on implementing MSServerStorage, OciStorage and SpatiaLiteStorage. Download, run and test SpatiaLite.



Are you blocked on anything?

No.


[1] https://github.com/janpisl/pywps/commit/45b98e04c428c2e62643a42bbebfedf8efc0603e

[2] https://github.com/ctu-geoforall-lab-projects/bp-pisl-2018-pywps/commit/c364be4888d74b2285e5a348afc6c3fe184fbc64

Week 2

What did you get done this period?

Develop and implement PGStorage, a class within the DbStorage class that will store output data to PostGIS and returns database name, schema and table as a reference to the client.


Study materials on Factory design pattern [2] recommended by my mentor.


Download, run and test SpatiaLite.



What do you plan on doing next week (period) ?

Develop, implement and test _MSSQLStorage and _OciStorage _MySQLStorage classes



Are you blocked on anything?

I have some questions related to work I will be doing next week. I will consult with my mentor.



[1] https://github.com/janpisl/pywps/commit/254b1e5697c0b4781544a6329b52eb4688e60a22

[2] http://python-3-patterns-idioms-test.readthedocs.io/en/latest/Factory.html


Week 3

What did you get done this period?

- Following my mentor's advice, I made a decision that _MSSQLStorage, _OciStorage _MySQLStorage classes will not be implemented at this stage. There will only be PgStorage and SQLiteStorage that store data in a PostgreSQL and SQLite databases, respectively

- Define DbStorageAbstract, an abstract class from which PgStorage and SQLiteStorage will inherit [1]

- Develop and implement SQLiteStorage [2]

- Fix/Update DbStorage class [3]

- Update outputs.py so complex output data is either stored as files or in a database depending on configuration [4]


What do you plan on doing next week (period) ?

- Develop a mechanism for handling raster output data and implement it.

- Decide how other types of data (such as string, table, CSV, etc.) will be handled and implement it.



Are you blocked on anything?

There is an error occuring when I run PyWPS and I haven't been able to find out what is causing it. I will discuss this with my mentor on a call on Monday morning.



[1] https://github.com/janpisl/pywps/commit/89ab12a2fd883a0df6c070a4ca573f89b6c31a5a

[2] https://github.com/janpisl/pywps/commit/864d275fe8c47e41c07f8278b8037e80a0b0b838

[3] https://github.com/janpisl/pywps/commit/c0d698cefde9a5085cd7d76bb0e32156a6391880

[4] https://github.com/janpisl/pywps/commit/7f00fa06ac5fd7823d79a363f6163dcc38e7bbb0



Week 4

What did you get done this period?

-I changed the structure of the code. Classes PgStorage and SQLiteStorage are now stored in separate files in the db folder. FileStorage is in a separate file, too. Rest of the code is in the __init__ files. [1]

- I added the data_type attribute to inout/formats/__init__.py. At the moment, allowed values are either vector or raster using the enum module. This was done so when storing output data it can be differed between raster and vector data and appropriate method can be used. [2]

- The Dbstorage class has been simplified. Instead of creating an instance of the class in outputs.py and calling the get_db_type method in the constructor, get_db_type is now a static method and gets called directly in outputs.py [3]

- PgStorage accesses the data_type value and calls appropriate method (store_vector_output or store_raster_output) [4]

- I implemented the store_raster_output method in the PgStorage class [5]


What do you plan on doing next week (period) ?

- Implement store_raster_output for SQLiteStorage

- Implement another method for storing other types of data (string, CSV, etc.)

- Write a script that tests whether different data types get stored correctly

- Submit the Phase 1 evaluation.


Are you blocked on anything?

No.


[1] https://github.com/janpisl/pywps/commit/18c46864ff75fb01e3c7d76f3f4a1b6d2457a22f[[2] https://github.com/janpisl/pywps/commit/4bb079bfcde0b4fcc42393f531e86c47a91507d4

[3] https://github.com/janpisl/pywps/commit/80c61c92e183c06af741fe94d0b172f9a5940609

[4] https://github.com/janpisl/pywps/commit/e024b1e5d1eb8df1a79989a59b14276b2757d98d

[5] https://github.com/janpisl/pywps/commit/a09dfc61729b36eb81ddd0d9adb5bda2effe3029



Week 5

What did you get done this period?


- I updated the store_vector_output method in the SQLite class [1]

- I implemented a method for storing raster outputs in SQLite [2]

- I implemented a method for storing output data of other datatypes in PostgreSQL [3]



What do you plan on doing next week (period) ?

- Write test_storage, a unit test that would test all storage options and their correct functionality

- Handle metadata


Are you blocked on anything?

No.


[1] https://github.com/janpisl/pywps/commit/44e9130c8bbb73841c51a09266a9db62ff2809f9

[2] https://github.com/janpisl/pywps/commit/5bf8eb883343f0caacd47470bf79152ac751d480

[3] https://github.com/janpisl/pywps/commit/50cd11ffa2486ee3de7cac3340874179d81d4e2e



Week 6

What did you get done this period?

- Database schema gets created automatically when the DbStorage class is initialized. [1]

- I developed "test_storage" module for testing storing output data [2]

- I implemented unittest classes for testing file storage [3], PostgreSQL storage [4] [5] [6] and SQLite [7] [8]

- Some other minor changes and updates to the code and fixing bugs [9]



What do you plan on doing next week (period) ?

- Address issues revealed by testing

- Handle metadata


Are you blocked on anything?

No.


[1] https://github.com/janpisl/pywps/commit/524d65f2cb5cdfe5f0bbdf769a9850c32886952e

[2] https://github.com/janpisl/pywps/commit/1c36dd1ee3b1c61d819b92f5a195792f488b1193

[3] https://github.com/janpisl/pywps/commit/2896d7670ed2f38470eab54790ed2e822c42583d

[4] https://github.com/janpisl/pywps/commit/97c8255a9ea3ad52a022d84315eb4e725afe4fc6

[5] https://github.com/janpisl/pywps/commit/16afbea95f1fea31dd3ac9e7a30d63b5d309ad12

[6] https://github.com/janpisl/pywps/commit/b08dee2644888379d08c25919d112b0e29f2423d

[7] https://github.com/janpisl/pywps/commit/c406e21d8ee844e468fe672d236285664a8fde76

[8] https://github.com/janpisl/pywps/commit/a1493e70024d06b3d29d1522edca64af4574de0b

[9] https://github.com/janpisl/pywps/commits/master



Week 8 reportú

What did you get done this period?


- PgStorage and SQLiteStorage classes inherit from DbStorage [1]

- The "store" method is now implemented in the parent class DbStorage

- Method for storing vector outputs is now implemented in the parent class DbStorage [2]

- Method for storing outputs such as CSV, plain text, etc. is now implemented in the parent class DbStorage [3]

- Each class inheriting from DbStorage now implements a method for initializating database (create schema, etc.) [4]

- Fixing bugs and some other minor changes and updates to the code [5]


What do you plan on doing next week (period) ?


- Download, compile and run MapServer

- Establish a connection between MapServer and PostgreSQL


Are you blocked on anything?

No.



[1] https://github.com/janpisl/pywps/commit/889ce577747815554970c3c39cebed15fe1d1ca6

[2] https://github.com/janpisl/pywps/commit/68d193aed029e4c6c4d6e8ec73ac44de8c183b71

[3] https://github.com/janpisl/pywps/commit/341f8a8eb8d99663e15796e373c5bf78fc9364a6

[4] https://github.com/janpisl/pywps/commit/321d6d8d6caf28131062134a5188eaa61c3e4b38

[5] https://github.com/janpisl/pywps/commits/master


Week 9 report

What did you get done this period?


- Together with my mentor we decided not to implement a functionality for handling metadata at this time.

- I established a connection between PostgreSQL and SQLite and Mapserver [1]. This allows sample output data to be served as a WMS service and viewed in e.g. QGIS.

- I created sample requests for both PostgreSQL and SQLite [2]


What do you plan on doing next week (period) ?

- Edit current code and automatize connection with MapServer so any output data can be viewed as a WMS service.


Are you blocked on anything?

No.




[1] https://github.com/janpisl/pywps/commit/d88106eecfa08c30b945a6105f2eeaf978bfda8a

[2] https://github.com/janpisl/pywps/commit/b50360d42f48ac8c99df1af005cbdf3951692465