User:Vikasmishra95

From OSGeo
Jump to navigation Jump to search

I am an undergraduate student from India pursuing B.E. Electronics & Electrical Engineering and MSc.(Hons.) Economics at BITS Pilani, Goa Campus. I have been contributing to open source organisations for almost 3 years now and have been fortunate to participate in Google Summer of Code twice, in 2014 and in 2016, both the times for OSGeo. In 2014 I worked in implementing advanced groups and permission systems for GeoNode and in 2016 I'm working on implementing remote storage options in PyWPS.


GSoC 2016: Remote Output Storage for PyWPS

Student Name: Vikas Mishra
Organization: PyWPS
Mentors: Jachym Cepicky and Jonas Eberle
Title: Remote Output Storage for PyWPS
Repository: PyWPS-4, browse at: PyWPS
Blog: GSoC 2016 Blog


PyWPS is an open source, light-weight, Python based, implementation of the OGC Web Processing Service (WPS) standard. It provides users with a relatively seamless environment where to code geo-spatial functions and models that are readily exposed to the Internet through the WWW.

PyWPS offers a straightforward WPS development framework with the increasingly popular Python language. Python offers easy access to a vast array of code libraries that can be easily used in the processes, in particular those for geo-spatial data manipulation, e.g. GRASS, GDAL/OGR, Fiona, Shapely, etc., but also to statistics packages (e.g. rpy2 for R statistics) and data analysis tools (e.g. pandas). PyWPS offers storage mechanisms for process inputs and outputs and spawns processes to the background for asynchronous execution requests.


This project idea was listed at GSoC 2016 Ideas.

Brief Description

PyWPS has to store resulting data files at some place. Currently, simple file storage is implemented - all resulting files are stored to defined output directory on the server. When this project is finished, PyWPS will support more sophisticated methods of data storage, like external cloud service and FTP. When it comes to storing data, cloud storage is quickly becoming the method of choice. Storing files remotely rather than locally boasts an array of advantages like syncing, accessibility, security, sharing, backups, etc.

Services and protocols which will be implemented as part of the project are - FTP storage, Google Drive, dropbox, Microsoft oneDrive and box.

Timeline

Please note that this timeline is tentative. Some parts may finish earlier than expected whereas some might take up more time if issues arise which need to be fixed before moving forward.

  • Community bonding period (22nd April - 22nd May): Use the community bonding period to interact with the mentors to build a sense of understanding as mentor-student relationship plays a very important role in the outcome of the project. Also get involved with other developers through the mailing list and get to know what other additions they might want to see in terms of remote storage. Study the codebase and fix some bugs to get familiar with the codebase. This time will also be used in deciding whether to add support for publishing data to geoserver or go for another remote storage service.
  • Week 1 (23rd May - 29th May): In the first week I will work on a concept on how to integrate new storage types while also making it easier for adding any other service in future by making everything modular and focussing on code reusability. The first thing to keep in mind while implementing remote storage is how to manage credentials. The best way to manage credentials is by using a configuration file and whenever the credentials are required they can be pulled from the conf file. Its also important to mention that it is a big security risk to store passwords in plain texts in any file. I’ll use py-bcrypt to encrypt the credentials in the conf file. It gives an added layer of security. This week will be used in creating an infrastructure for managing credentials and configurations for the proposed services. I’ll be using the current configuration file for the configuration. I’ll also create a simple python script which will create the configuration file by prompting the instance owner for the required details.
  • Week 2 and 3 (30th May - 12th June): Implement FTP storage option in PyWPS. The Python standard library contains ftplib which implements the client side of the FTP protocol. This module allows us to perform a large range of automated jobs over an FTP protocol for e.g. upload a file, download a file, mirror a server, etc. We are only concerned with uploading the file here since we want to save the output file in the FTP server as defined in the config file. Use the configuration parser to read in the credentials for the server and create a ftp file storage class similar to FileStorage class which is used currently to store the output file in the server itself.
  • Week 4 and 5 (13th June - 26th June): In these weeks I’ll implement dropbox storage option in PyWPS. Dropbox has built a beautiful and easy to use dropbox-sdk-python which I will use here to implement dropbox storage. This module takes care of all the issues which might occur while uploading a file like it has the ability to autorename the file in case of conflicts if autorename is selected. Since the API will not be exposed to the end user, the options like autorename will be provided by the configuration file.
  • Week 6 and 7 (27th June - 10th July): oneDrive is a cloud storage service from Microsoft similar to dropbox. In these couple of weeks I’ll use onedrive-sdk-python to implement output storage in oneDrive. Similar to the dropbox implementation API won’t be exposed to the end user and user can give options like which directory to save the file in using the configuration file as developed in the first week.
  • Week 8 and 9 (11th July - 24th July): Google Drive is a file hosting service by Google and is one of the most used cloud storage services after dropbox. I’ll use this time to provide Google drive as a storage option in PyWPS. Google provides Drive REST API which can be used to implement this. I’ll read the documentation and implement google drive storage using the REST API. Similar to other implementations user preference can be given in the configuration file.
  • Week 10 and 11 (25th July - 7th August): I have these 2 weeks to implement one more service. I can either implement Box or implement publishing of data to geoserver using its rest API.Box is an online file sharing and content management service for businesses. It would be beneficial to provide ‘box’ as a remote storage option in PyWPS. I’ll use box-python-sdk to integrate this service in PyWPS. The choice will be made in the community bonding period by asking for inputs from all the developers and setting a priority. If we decide to add support for publishing data to geoserver I’ll use GeoServer’s rest API along with gsconfig to build upon some previous works where UMN Mapserver support was added in PyWPS 3 to implement support for usage of GeoServer in PyWPS 4, to be used for output data serving, via OGC services.
  • Week 12 and 13 (8th Aug - 23rd Aug): I’ll use the last couple of weeks to write documentations and tests. Since configuration file is a very important part of this implementation, I need to write exhaustive docs on its usage and format covering almost all the options. I plan to provide example conf file for all the services implemented showing the usage of all the available options in the API. Since I believe in the test based development philosophy I can not finish my project without writing unit tests for all the storage services implemented during the summer.

Note: I have exams from 1st May - 15th May. In this period my contribution rate might drop but I will pick up as soon as my exams get over. Other than this period, I have no other commitments and am able to work on this project full time. If for any reason my code does not get merged into master, I will continue working beyond the summer to ensure that it becomes mergeable.