Database Output Storage for PyWPS
Introduction
My name is Jan Pisl and I study at the Czech Technical University in Prague. I will be participating in this year's GSoC. My project’s aim is to develop database storage support for PyWPS that would allow output data to be stored in a database in a way that is compliant with the OGC WPS Standard. I will be using Python 3.6.
Project
Title: Database Output Storage for PyWPS
Student: Jan Pišl
Mentors: Jáchym Čepický, Martin Landa.
Project's repositories: GitHub repository, Project's page on GSoC website
Technology
PyWPS
PyWPS is a server side implementation of the OGC Web Processing Service (OGC WPS) standards 1.0.0. It is written in the Python programming language, it runs on Python 2.7, 3.3 or higher and it is tested and developed on Linux. It uses a ConfigParser format for configuration files. It supports a variety of geospatial software and tools such as GRASS GIS, R Project or the GDAL library. Synchronous and asynchronuous invocations are supported. As for request encoding, two options are available - key-value pairs (using HTTP-GET) or XML payload (using HTTPPOST). Every process that is to be deployed on the server is defined as a class and has several mandatory parameters. The key parameter called "handler" gets invoked every time there is an incoming request, it accepts the request and returns a response.
In 2016, it upgraded from PyWPS 3 to PyWPS 4. Some of the more significant changes include every input being considered a list of inputs and all inputs having file, data and stream attributes. These attributes allow better manipulation with data.
Python
Python is a high-level programming language that fully supports object-oriented and structured programming. Developed in the late 1980s, the first version 0.9.0 was released in 1991. In 2008, Python 3.0 was released. Currently, the most up-to-date version available is 3.6. It was designed as a syntactically simple language, using whitespace intendantion instead of brackets and English words rather than punctuation. It is a dynamicallytyped language, which means it is not neccessary to specify a data-type when defining a variable. For its simplicity and readability, Python is often considered a good first programming language to learn. One of the key advantages of Python is its high extensibility. It provides large standard libraries and also an extensive number of other modules, packages and libraries, so most of the common programming tasks are already solved, scripted and made available.
Schedule
Dates | Task | Deliverables |
---|---|---|
Bonding Period (April, 23 - May, 14) |
Initial research on GDAL, MapServer, SQLAlchemy, GeoAlchemy, relational databases (PostgreSQL, Oracle, Microsoft SQL Server, MySQL, SQLite,...) Select which databases will be supported in this project. Set up environment for work - virtual environment with a distribution of Linux, instances of selected database management systems for testing. |
Bonding Period Report |
Week 1 (May, 14 - May, 20) |
Write a test to ensure newly implemented functionality will work correctly. Define “DbStorage”, an abstract base class that will ensure derived classes implement particular methods from this class. Define abstract methods that are necessary for writing output data to a remote database. | |
Week 2 (May, 21 - May, 27) |
Write test script that checks “PgStorage” will work correctly and write output vector data to a PostgreSQL database. Develop “PgStorage“ class that inherits from “DbStorage” and implements proposed functionality. | |
Week 3 (May, 28 - June, 3) |
Write a test (or update test written in the previous week) to ensure correct functionality of newly developed classes. Develop several classes based on selection in week 1 – e.g. “MSSQLStorage”, “OCIStorage”, “MySQLStorage”, etc. | |
Week 4 (June, 4 - June, 10) |
Write a test to ensure newly implemented functionality will work correctly. Develop a mechanism for handling raster output data and implement it in the above-mentioned classes. If GDAL does not support writing raster data to a specific database by default, appropriate tool or library must be identified. For example, for writing raster data into PostgreSQL, ”raster2pgsql“ tool can be used. Decide how other types of data (such as string, table, CSV, etc.) will be handled and implement it. | |
Week 5 (June, 11 - June, 15) Phase 1 Evaluation |
Write a test that checks all data types will get correctly stored in database. Write and submit weekly report. Write and submit Phase 1 evaluation. Deliverables for Phase 1: Correctly functioning and tested module that enables output data (vector, raster or other) to be stored in one of several database management systems. | |
Week 6 (June, 16 - June, 24) |
Write a test to ensure newly implemented functionality will work correctly. Handle metadata. Currently, logs with information about run processes (such as when a process started and finished) are stored in a SQLite database. Implement a functionality that accesses logs, extracts desired metadata from them and saves them in the same database as output data. Use unique identifier of every process to join metadata (using primary key) with the corresponding output data (foreign key). For each run process, include expiration date to its output data. Develop a trigger that removes output data when it expires. | |
Week 7 (June, 25 - July, 1) |
Write a test to ensure newly implemented functionality will work correctly. Establish a connection between database and MapServer and allow MapServer to access output data stored in database. | |
Week 8 (July, 2 - July, 8) |
Write a test to ensure newly implemented functionality will work correctly. Enable output data to be viewed as WMS services through MapServer. Edit current code so the reference embodied in the responding document is a URL link to a running MapServer service. | |
Week 9 (July, 9 - July, 13) Phase 2 Evaluation |
Write and submit Phase 2 evaluation. Deliverables for Phase 2: Correctly functioning and tested module that enables output data to be viewed as a Web Map Service using MapServer. | |
Week 10 (July, 14 - July, 22) |
Update previously developed test to ensure newly implemented functionality will work correctly. Enable raster output data to be accessed and/or edited as OGC Web Coverage Service. | |
Week 11 (July, 23 - July, 29) |
Update previously developed test to ensure newly implemented functionality will work correctly. Enable vector output data to be accessed and/or edited as OGC Web Feature Service. | |
Week 12 (July, 30 - August, 5) |
Write a test that ensures newly implemented functionality will work correctly. Develop a demo that demonstrates implemented functionality. | |
Week 13 (August, 6 - August, 14) | Create documentation that can be added to the current PyWPS documentation.
Finalize work, clean code and prepare a pull request(s) to PyWPS GitHub repository. Write and submit final report. |
|
Reports
Bonding Period
Week 1
What did you get done this period?
I defined the first version of “DbStorage”, a class that will allow output data to be stored in one of RDBMSs supported (PostGIS, MS Server, Oracle spatial, SpatiaLite) [1] I downloaded PostgreSQL and PostGIS and run it locally on my computer. I tested it functions correctly by using code I had written previously to GSoC [2] that stores output data in PostGIS. I run some of PyWPS test processes and checked output data does get stored in PostGIS.
What do you plan on doing next week (period) ?
Implement PGStorage, a class within the DbStorage class that will store output data to PostGIS and returns database name, schema and table as a reference to the client. Start working on implementing MSServerStorage, OciStorage and SpatiaLiteStorage. Download, run and test SpatiaLite.
Are you blocked on anything?
No.
[1] https://github.com/janpisl/pywps/commit/45b98e04c428c2e62643a42bbebfedf8efc0603e
Week 2
What did you get done this period?
Develop and implement PGStorage, a class within the DbStorage class that will store output data to PostGIS and returns database name, schema and table as a reference to the client.
Study materials on Factory design pattern [2] recommended by my mentor.
Download, run and test SpatiaLite.
What do you plan on doing next week (period) ?
Develop, implement and test _MSSQLStorage and _OciStorage _MySQLStorage classes
Are you blocked on anything?
I have some questions related to work I will be doing next week. I will consult with my mentor.
[1] https://github.com/janpisl/pywps/commit/254b1e5697c0b4781544a6329b52eb4688e60a22
[2] http://python-3-patterns-idioms-test.readthedocs.io/en/latest/Factory.html
Week 3
What did you get done this period?
- Following my mentor's advice, I made a decision that _MSSQLStorage, _OciStorage _MySQLStorage classes will not be implemented at this stage. There will only be PgStorage and SQLiteStorage that store data in a PostgreSQL and SQLite databases, respectively
- Define DbStorageAbstract, an abstract class from which PgStorage and SQLiteStorage will inherit [1]
- Develop and implement SQLiteStorage [2]
- Fix/Update DbStorage class [3]
- Update outputs.py so complex output data is either stored as files or in a database depending on configuration [4]
What do you plan on doing next week (period) ?
- Develop a mechanism for handling raster output data and implement it.
- Decide how other types of data (such as string, table, CSV, etc.) will be handled and implement it.
Are you blocked on anything?
There is an error occuring when I run PyWPS and I haven't been able to find out what is causing it. I will discuss this with my mentor on a call on Monday morning.
[1] https://github.com/janpisl/pywps/commit/89ab12a2fd883a0df6c070a4ca573f89b6c31a5a
[2] https://github.com/janpisl/pywps/commit/864d275fe8c47e41c07f8278b8037e80a0b0b838
[3] https://github.com/janpisl/pywps/commit/c0d698cefde9a5085cd7d76bb0e32156a6391880
[4] https://github.com/janpisl/pywps/commit/7f00fa06ac5fd7823d79a363f6163dcc38e7bbb0
Week 4 report
What did you get done this period?
-I changed the structure of the code. Classes PgStorage and SQLiteStorage are now stored in separate files in the db folder. FileStorage is in a separate file, too. Rest of the code is in the __init__ files. [1]
- I added the data_type attribute to inout/formats/__init__.py. At the moment, allowed values are either vector or raster using the enum module. This was done so when storing output data it can be differed between raster and vector data and appropriate method can be used. [2]
- The Dbstorage class has been simplified. Instead of creating an instance of the class in outputs.py and calling the get_db_type method in the constructor, get_db_type is now a static method and gets called directly in outputs.py [3]
- PgStorage accesses the data_type value and calls appropriate method (store_vector_output or store_raster_output) [4]
- I implemented the store_raster_output method in the PgStorage class [5]
What do you plan on doing next week (period) ?
- Implement store_raster_output for SQLiteStorage
- Implement another method for storing other types of data (string, CSV, etc.)
- Write a script that tests whether different data types get stored correctly
- Submit the Phase 1 evaluation.
Are you blocked on anything?
No.
[1] https://github.com/janpisl/pywps/commit/18c46864ff75fb01e3c7d76f3f4a1b6d2457a22f[[2] https://github.com/janpisl/pywps/commit/4bb079bfcde0b4fcc42393f531e86c47a91507d4
[3] https://github.com/janpisl/pywps/commit/80c61c92e183c06af741fe94d0b172f9a5940609
[4] https://github.com/janpisl/pywps/commit/e024b1e5d1eb8df1a79989a59b14276b2757d98d
[5] https://github.com/janpisl/pywps/commit/a09dfc61729b36eb81ddd0d9adb5bda2effe3029