Difference between revisions of "Database Output Storage for PyWPS"

From OSGeo
Jump to navigation Jump to search
Line 225: Line 225:
  
 
Develop, implement and test _MSSQLStorage and _OciStorage _MySQLStorage classes
 
Develop, implement and test _MSSQLStorage and _OciStorage _MySQLStorage classes
 
  
  
Line 239: Line 238:
  
  
 
[0] https://wiki.osgeo.org/wiki/Database_Output_Storage_for_PyWPS
 
  
 
[1] https://github.com/janpisl/pywps/commit/254b1e5697c0b4781544a6329b52eb4688e60a22
 
[1] https://github.com/janpisl/pywps/commit/254b1e5697c0b4781544a6329b52eb4688e60a22
Line 285: Line 282:
  
  
 
[0] https://wiki.osgeo.org/wiki/Database_Output_Storage_for_PyWPS
 
  
 
[1] https://github.com/janpisl/pywps/commit/89ab12a2fd883a0df6c070a4ca573f89b6c31a5a
 
[1] https://github.com/janpisl/pywps/commit/89ab12a2fd883a0df6c070a4ca573f89b6c31a5a

Revision as of 11:39, 3 June 2018

Introduction

My name is Jan Pisl and I study at the Czech Technical University in Prague. I will be participating in this year's GSoC. My project’s aim is to develop database storage support for PyWPS that would allow output data to be stored in a database in a way that is compliant with the OGC WPS Standard. I will be using Python 3.6.

Project

Title: Database Output Storage for PyWPS

Student: Jan Pišl

Mentors: Jáchym Čepický, Martin Landa.

Project's repositories: GitHub repository, Project's page on GSoC website

Technology

PyWPS

PyWPS is a server side implementation of the OGC Web Processing Service (OGC WPS) standards 1.0.0. It is written in the Python programming language, it runs on Python 2.7, 3.3 or higher and it is tested and developed on Linux. It uses a ConfigParser format for configuration files. It supports a variety of geospatial software and tools such as GRASS GIS, R Project or the GDAL library. Synchronous and asynchronuous invocations are supported. As for request encoding, two options are available - key-value pairs (using HTTP-GET) or XML payload (using HTTPPOST). Every process that is to be deployed on the server is defined as a class and has several mandatory parameters. The key parameter called "handler" gets invoked every time there is an incoming request, it accepts the request and returns a response.

In 2016, it upgraded from PyWPS 3 to PyWPS 4. Some of the more significant changes include every input being considered a list of inputs and all inputs having file, data and stream attributes. These attributes allow better manipulation with data.

Python

Python is a high-level programming language that fully supports object-oriented and structured programming. Developed in the late 1980s, the first version 0.9.0 was released in 1991. In 2008, Python 3.0 was released. Currently, the most up-to-date version available is 3.6. It was designed as a syntactically simple language, using whitespace intendantion instead of brackets and English words rather than punctuation. It is a dynamicallytyped language, which means it is not neccessary to specify a data-type when defining a variable. For its simplicity and readability, Python is often considered a good first programming language to learn. One of the key advantages of Python is its high extensibility. It provides large standard libraries and also an extensive number of other modules, packages and libraries, so most of the common programming tasks are already solved, scripted and made available.


Schedule

Dates Task Deliverables
Bonding Period (April, 23 - May, 14)

Initial research on GDAL, MapServer, SQLAlchemy, GeoAlchemy, relational databases (PostgreSQL, Oracle, Microsoft SQL Server, MySQL, SQLite,...)

Select which databases will be supported in this project.

Set up environment for work - virtual environment with a distribution of Linux, instances of selected database management systems for testing.

Bonding Period Report
Week 1 (May, 14 - May, 20)

Write a test to ensure newly implemented functionality will work correctly.

Define “DbStorage”, an abstract base class that will ensure derived classes implement particular methods from this class. Define abstract methods that are necessary for writing output data to a remote database.

Week 2 (May, 21 - May, 27)

Write test script that checks “PgStorage” will work correctly and write output vector data to a PostgreSQL database.

Develop “PgStorage“ class that inherits from “DbStorage” and implements proposed functionality.

Week 3 (May, 28 - June, 3)

Write a test (or update test written in the previous week) to ensure correct functionality of newly developed classes.

Develop several classes based on selection in week 1 – e.g. “MSSQLStorage”, “OCIStorage”, “MySQLStorage”, etc.

Week 4 (June, 4 - June, 10)

Write a test to ensure newly implemented functionality will work correctly.

Develop a mechanism for handling raster output data and implement it in the above-mentioned classes. If GDAL does not support writing raster data to a specific database by default, appropriate tool or library must be identified. For example, for writing raster data into PostgreSQL, ”raster2pgsql“ tool can be used.

Decide how other types of data (such as string, table, CSV, etc.) will be handled and implement it.

Week 5 (June, 11 - June, 15) Phase 1 Evaluation

Write a test that checks all data types will get correctly stored in database.

Write and submit weekly report.

Write and submit Phase 1 evaluation.

Deliverables for Phase 1: Correctly functioning and tested module that enables output data (vector, raster or other) to be stored in one of several database management systems.

Week 6 (June, 16 - June, 24)

Write a test to ensure newly implemented functionality will work correctly.

Handle metadata. Currently, logs with information about run processes (such as when a process started and finished) are stored in a SQLite database. Implement a functionality that accesses logs, extracts desired metadata from them and saves them in the same database as output data. Use unique identifier of every process to join metadata (using primary key) with the corresponding output data (foreign key). For each run process, include expiration date to its output data.

Develop a trigger that removes output data when it expires.

Week 7 (June, 25 - July, 1)

Write a test to ensure newly implemented functionality will work correctly.

Establish a connection between database and MapServer and allow MapServer to access output data stored in database.

Week 8 (July, 2 - July, 8)

Write a test to ensure newly implemented functionality will work correctly.

Enable output data to be viewed as WMS services through MapServer.

Edit current code so the reference embodied in the responding document is a URL link to a running MapServer service.

Week 9 (July, 9 - July, 13) Phase 2 Evaluation

Write and submit Phase 2 evaluation.

Deliverables for Phase 2: Correctly functioning and tested module that enables output data to be viewed as a Web Map Service using MapServer.

Week 10 (July, 14 - July, 22)

Update previously developed test to ensure newly implemented functionality will work correctly.

Enable raster output data to be accessed and/or edited as OGC Web Coverage Service.

Week 11 (July, 23 - July, 29)

Update previously developed test to ensure newly implemented functionality will work correctly.

Enable vector output data to be accessed and/or edited as OGC Web Feature Service.

Week 12 (July, 30 - August, 5)

Write a test that ensures newly implemented functionality will work correctly.

Develop a demo that demonstrates implemented functionality.

Week 13 (August, 6 - August, 14) Create documentation that can be added to the current PyWPS documentation.

Finalize work, clean code and prepare a pull request(s) to PyWPS GitHub repository.

Write and submit final report.


Reports

Bonding Period

Bonding Period Report

Week 1

What did you get done this period?

I defined the first version of “DbStorage”, a class that will allow output data to be stored in one of RDBMSs supported (PostGIS, MS Server, Oracle spatial, SpatiaLite) [1] I downloaded PostgreSQL and PostGIS and run it locally on my computer. I tested it functions correctly by using code I had written previously to GSoC [2] that stores output data in PostGIS. I run some of PyWPS test processes and checked output data does get stored in PostGIS.



What do you plan on doing next week (period) ?

Implement PGStorage, a class within the DbStorage class that will store output data to PostGIS and returns database name, schema and table as a reference to the client. Start working on implementing MSServerStorage, OciStorage and SpatiaLiteStorage. Download, run and test SpatiaLite.



Are you blocked on anything?

No.


[1] https://github.com/janpisl/pywps/commit/45b98e04c428c2e62643a42bbebfedf8efc0603e

[2] https://github.com/ctu-geoforall-lab-projects/bp-pisl-2018-pywps/commit/c364be4888d74b2285e5a348afc6c3fe184fbc64

Week 2

What did you get done this period?

Develop and implement PGStorage, a class within the DbStorage class that will store output data to PostGIS and returns database name, schema and table as a reference to the client.


Study materials on Factory design pattern [2] recommended by my mentor.


Download, run and test SpatiaLite.



What do you plan on doing next week (period) ?

Develop, implement and test _MSSQLStorage and _OciStorage _MySQLStorage classes



Are you blocked on anything?

I have some questions related to work I will be doing next week. I will consult with my mentor.



[1] https://github.com/janpisl/pywps/commit/254b1e5697c0b4781544a6329b52eb4688e60a22

[2] http://python-3-patterns-idioms-test.readthedocs.io/en/latest/Factory.html


Week 3

What did you get done this period?

- Following my mentor's advice, I made a decision that _MSSQLStorage, _OciStorage _MySQLStorage classes will not be implemented at this stage. There will only be PgStorage and SQLiteStorage that store data in a PostgreSQL and SQLite databases, respectively

- Define DbStorageAbstract, an abstract class from which PgStorage and SQLiteStorage will inherit [1]

- Develop and implement SQLiteStorage [2]

- Fix/Update DbStorage class [3]

- Update outputs.py so complex output data is either stored as files or in a database depending on configuration [4]


What do you plan on doing next week (period) ?

- Develop a mechanism for handling raster output data and implement it.

- Decide how other types of data (such as string, table, CSV, etc.) will be handled and implement it.



Are you blocked on anything?

There is an error occuring when I run PyWPS and I haven't been able to find out what is causing it. I will discuss this with my mentor on a call on Monday morning.



[1] https://github.com/janpisl/pywps/commit/89ab12a2fd883a0df6c070a4ca573f89b6c31a5a

[2] https://github.com/janpisl/pywps/commit/864d275fe8c47e41c07f8278b8037e80a0b0b838

[3] https://github.com/janpisl/pywps/commit/c0d698cefde9a5085cd7d76bb0e32156a6391880

[4] https://github.com/janpisl/pywps/commit/7f00fa06ac5fd7823d79a363f6163dcc38e7bbb0