Database Output Storage for PyWPS

From OSGeo
Revision as of 06:03, 20 May 2018 by Wiki-Janpisl (talk | contribs) (week 1 report added)
Jump to navigation Jump to search

Introduction

My name is Jan Pisl and I study at the Czech Technical University in Prague. I will be participating in this year's GSoC. My project’s aim is to develop database storage support for PyWPS that would allow output data to be stored in a database in a way that is compliant with the OGC WPS Standard. I will be using Python 3.6.

Project

Title: Database Output Storage for PyWPS

Student: Jan Pišl

Mentors: Jáchym Čepický, Martin Landa.

Project's repositories: GitHub repository, Project's page on GSoC website

Technology

PyWPS

PyWPS is a server side implementation of the OGC Web Processing Service (OGC WPS) standards 1.0.0. It is written in the Python programming language, it runs on Python 2.7, 3.3 or higher and it is tested and developed on Linux. It uses a ConfigParser format for configuration files. It supports a variety of geospatial software and tools such as GRASS GIS, R Project or the GDAL library. Synchronous and asynchronuous invocations are supported. As for request encoding, two options are available - key-value pairs (using HTTP-GET) or XML payload (using HTTPPOST). Every process that is to be deployed on the server is defined as a class and has several mandatory parameters. The key parameter called "handler" gets invoked every time there is an incoming request, it accepts the request and returns a response.

In 2016, it upgraded from PyWPS 3 to PyWPS 4. Some of the more significant changes include every input being considered a list of inputs and all inputs having file, data and stream attributes. These attributes allow better manipulation with data.

Python

Python is a high-level programming language that fully supports object-oriented and structured programming. Developed in the late 1980s, the first version 0.9.0 was released in 1991. In 2008, Python 3.0 was released. Currently, the most up-to-date version available is 3.6. It was designed as a syntactically simple language, using whitespace intendantion instead of brackets and English words rather than punctuation. It is a dynamicallytyped language, which means it is not neccessary to specify a data-type when defining a variable. For its simplicity and readability, Python is often considered a good first programming language to learn. One of the key advantages of Python is its high extensibility. It provides large standard libraries and also an extensive number of other modules, packages and libraries, so most of the common programming tasks are already solved, scripted and made available.


Schedule

Dates Task Deliverables
Bonding Period (April, 23 - May, 14)

Initial research on GDAL, MapServer, SQLAlchemy, GeoAlchemy, relational databases (PostgreSQL, Oracle, Microsoft SQL Server, MySQL, SQLite,...)

Select which databases will be supported in this project.

Set up environment for work - virtual environment with a distribution of Linux, instances of selected database management systems for testing.

Bonding Period Report
Week 1 (May, 14 - May, 20)

Write a test to ensure newly implemented functionality will work correctly.

Define “DbStorage”, an abstract base class that will ensure derived classes implement particular methods from this class. Define abstract methods that are necessary for writing output data to a remote database.

Week 2 (May, 21 - May, 27)

Write test script that checks “PgStorage” will work correctly and write output vector data to a PostgreSQL database.

Develop “PgStorage“ class that inherits from “DbStorage” and implements proposed functionality.

Week 3 (May, 28 - June, 3)

Write a test (or update test written in the previous week) to ensure correct functionality of newly developed classes.

Develop several classes based on selection in week 1 – e.g. “MSSQLStorage”, “OCIStorage”, “MySQLStorage”, etc.

Week 4 (June, 4 - June, 10)

Write a test to ensure newly implemented functionality will work correctly.

Develop a mechanism for handling raster output data and implement it in the above-mentioned classes. If GDAL does not support writing raster data to a specific database by default, appropriate tool or library must be identified. For example, for writing raster data into PostgreSQL, ”raster2pgsql“ tool can be used.

Decide how other types of data (such as string, table, CSV, etc.) will be handled and implement it.

Week 5 (June, 11 - June, 15) Phase 1 Evaluation

Write a test that checks all data types will get correctly stored in database.

Write and submit weekly report.

Write and submit Phase 1 evaluation.

Deliverables for Phase 1: Correctly functioning and tested module that enables output data (vector, raster or other) to be stored in one of several database management systems.

Week 6 (June, 16 - June, 24)

Write a test to ensure newly implemented functionality will work correctly.

Handle metadata. Currently, logs with information about run processes (such as when a process started and finished) are stored in a SQLite database. Implement a functionality that accesses logs, extracts desired metadata from them and saves them in the same database as output data. Use unique identifier of every process to join metadata (using primary key) with the corresponding output data (foreign key). For each run process, include expiration date to its output data.

Develop a trigger that removes output data when it expires.

Week 7 (June, 25 - July, 1)

Write a test to ensure newly implemented functionality will work correctly.

Establish a connection between database and MapServer and allow MapServer to access output data stored in database.

Week 8 (July, 2 - July, 8)

Write a test to ensure newly implemented functionality will work correctly.

Enable output data to be viewed as WMS services through MapServer.

Edit current code so the reference embodied in the responding document is a URL link to a running MapServer service.

Week 9 (July, 9 - July, 13) Phase 2 Evaluation

Write and submit Phase 2 evaluation.

Deliverables for Phase 2: Correctly functioning and tested module that enables output data to be viewed as a Web Map Service using MapServer.

Week 10 (July, 14 - July, 22)

Update previously developed test to ensure newly implemented functionality will work correctly.

Enable raster output data to be accessed and/or edited as OGC Web Coverage Service.

Week 11 (July, 23 - July, 29)

Update previously developed test to ensure newly implemented functionality will work correctly.

Enable vector output data to be accessed and/or edited as OGC Web Feature Service.

Week 12 (July, 30 - August, 5)

Write a test that ensures newly implemented functionality will work correctly.

Develop a demo that demonstrates implemented functionality.

Week 13 (August, 6 - August, 14) Create documentation that can be added to the current PyWPS documentation.

Finalize work, clean code and prepare a pull request(s) to PyWPS GitHub repository.

Write and submit final report.


Reports

Bonding Period

Bonding Period Report

Week 1

What did you get done this period?

I defined the first version of “DbStorage”, a class that will allow output data to be stored in one of RDBMSs supported (PostGIS, MS Server, Oracle spatial, SpatiaLite) [1] I downloaded PostgreSQL and PostGIS and run it locally on my computer. I tested it functions correctly by using code I had written previously to GSoC [2] that stores output data in PostGIS. I run some of PyWPS test processes and checked output data does get stored in PostGIS.



What do you plan on doing next week (period) ?

Implement PGStorage, a class within the DbStorage class that will store output data to PostGIS and returns database name, schema and table as a reference to the client. Start working on implementing MSServerStorage, OciStorage and SpatiaLiteStorage. Download, run and test SpatiaLite.



Are you blocked on anything?

No.


[1] https://github.com/janpisl/pywps/commit/45b98e04c428c2e62643a42bbebfedf8efc0603e

[2] https://github.com/ctu-geoforall-lab-projects/bp-pisl-2018-pywps/commit/c364be4888d74b2285e5a348afc6c3fe184fbc64