GSoC 2020 CQL filter implementation on pygeoapi

Idea
pygeoapi, is a Python server implementation of the OGC API suite of standards. OGC API standards define modular API building blocks to spatially enable Web API in a consistent way. This standard specifies the fundamental API building blocks for interacting with features. OGC API Features provides API building blocks to create, modify and query features of geospatial data collection on the Web.

A fundamental operation performed on a collection of features is that of querying in order to obtain a subset of the data which contains feature instances that satisfy some filtering criteria. This project aims to implement these enhanced filtering criteria in a request to a server. CQL will be used to specify how resource instances in a source collection should be filtered to identify a result set. Typically, CQL is used in query operations to identify the subset of resources that should be included in a response document. Each resource instance in the source collection is evaluated using a filtering expression. The overall filter expression always evaluates to true or false. If the expression evaluates to true, the resource instance satisfies the expression and is marked as being in the result set. If the overall filter expression evaluates to false, the data instance is not in the result set.

The project proposal is based on OGC API - Features - Part 3: Common Query Language document that defines the schema for a JSON document that exposes the set of properties or keys that may be used to construct CQL expressions for pygeoapi.

Project proposal
My proposal for GSoC 2020 can be found at: Develop CQL Filter implementation for pygeoapi.

Advantages from this project
On developing CQL feature filter implementation with JSON encoding, any combination of bbox, datetime and parameters for filtering on feature properties will be allowed on pygeoapi. The requirements on these parameters imply that only features matching all the predicates are in the result set. i.e., the logical operator between the predicates is 'AND'. The API definition may be used to determine details, e.g., on filter parameters. This depends on the needs of the client. These are clients that are in general able to use multiple APIs as long as it implements OGC API Features. Thus increases the client’s usage capabilities.

Link to Github repository: Repository

5th May - 31st May
Community bonding period:
 * 1) What I have done during this period?
 * Introduced myself over the channel and shared my proposal over mailing list for suggestions
 * Communicated with mentors and learned about community, working, etc. It was a great experience talking with experts in the domain
 * Created a wiki page for the project "Develop CQL Filter implementation for pygeoapi"
 * Forked the repository of pygeoapi
 * Updated wiki User page and added my personal information
 * Updated links on the wiki Google_Summer_of_Code_2020_Accepted page
 * Joined Gitter account created by mentors which we will be used as a mode of communication over the GSoC 2020 period
 * Gone through the architecture and codebase of pygeoapi
 * Read OGC API - Features - Part 1: Core OGC API - Features - Part 3: Common Query Language documents to understand the standards of OGC API.
 * Understood the implementation of CQL schema, standards and JSON encoding
 * Learned to work with python CQL parser pycql.
 * Took jitsi conference call with the mentors to understand the codebase of the project and discuss on the tasks that needs to covered in coding phase 1 period
 * Edited the GSoC proposal from XML implementation to JSON implementation CQL
 * Discussed with the mentors about their expectations over the GSoC 2020 project period


 * 1) What am I going to achieve for next week?
 * Implement (https://github.com/opengeospatial/ogcapi-features/tree/master/extensions/cql CQL filter) specifications as OpenAPI Document


 * 1) Are there any blockers?
 * No blockers for now

Week 1 - 4 (1st June - 28th June)
Coding phase 1:
 * Week 1 (1st June - 7th June)
 * 1) What I have done during this period?
 * Created a new Swagger OpenAPI Document by implementing sections 6.2, 6.3, 6.4 of OGC API - Features - Part 3: Common Query Language standards.
 * Shared the doc with pygeoapi community for suggestions.


 * 1) What am I going to achieve for next week?
 * Add filter components and definition of various feature class, expressions and operators in the Swagger OpenAPI Document.
 * Consult with mentors on different approaches of implementation.


 * 1) Are there any blockers?
 * No blockers for now.


 * Week 2 (8th June - 14th June)
 * 1) What I have done during this period?
 * Added the feature class implementation of OGC API - Features - Part 3: Common Query Language standards which is an alignment with sections 7.2, 8.2, 8.3, 8.4, 8.5, 8.7, 8.8, 8.9, 8.10, 9.2, 9.3, 9.4, 9.5, 10.2 and 10.3
 * Observed that the json schema does not follow the Swagger OpenAPI latest specifications. Suggested an updated json schema that follows OpenAPI 3.0 standards.
 * An alternative solution considered here needed to convert the json document that supported 2.0 to support 3.0.
 * The suggested CQL extension in the OpenAPI document is to support the 3.0 version.
 * Opened a new issue on ogcapi-features concerning the above observation.
 * The document was reviewed by the mentors and got their approval to proceed with the proposal.
 * Created a new branch in my forked repository for committing all my changes related to CQL implementation.


 * 1) What am I going to achieve for next week?
 * Modify openapi.py file so that it generates the proposed OpenAPI document.
 * Test the code for compliance


 * 1) Are there any blockers?
 * No blockers for now


 * Week 3 (15th June - 21st June)
 * 1) What I have done during this period?
 * Read about Test-driven development (TDD) approach
 * Added filter key in test config file for cql-text and cql-json
 * Added test cases on get_oas_30 and test_cql_filters functions for every piece of the openapi document related to CQL extension in test openapi file
 * On executing the added test cases, all the 20 test cases were passed
 * Modified openapi file so that it generates the proposed OpenAPI document for filter, filter-lang, CQL components, responses and schemas.
 * Committed the branch changes to the forked repository.


 * 1) What am I going to achieve for next week?
 * Test the code for compliance
 * Start documentation


 * 1) Are there any blockers?
 * No blockers for now


 * Week 4 (22nd June - 28th June)
 * 1) What I have done during this period?
 * Modified the test cases and made it more generic for all the providers with CQL support or NO CQL support
 * The test cases are now passed by all types of providers
 * Modified openapi file to generate a generic CQL document depending whether the resources/collections supports CQL filters or not
 * CQL document generation now depends on different providers' config file
 * Tested the code for compliance
 * Refactored code according to flake8 standards
 * Committed the changes on the branch
 * Build the branch on Travis CI


 * 1) What am I going to achieve for next week?
 * Implement documentation for cql branch with read the docs
 * Mockup newly added REST endpoints


 * 1) Are there any blockers?
 * No blockers for now

29th June – 3rd July

 * Evaluation Phase 1: First evaluation period. Mentors have evaluated my code written in week 1-4.

Week 5 - 8 (29th June - 26th July)
Coding phase 2:
 * Week 5 (29th June - 5th July)
 * 1) What I have done during this period?
 * Added documentation for the classes and methods related to CQL branch
 * Added CQL configuration to RTD
 * Checked for linting and flake8 standards on the code
 * Created a clean PR for the OpenAPI CQL branch with successful build


 * 1) What am I going to achieve for next week?
 * Read the pycql documentation and brainstorm about its implementation
 * Mockup newly added REST endpoints and filter configurations


 * 1) Are there any blockers?
 * No blockers for now


 * Week 6 (6th July - 12th July)
 * 1) What I have done during this period?
 * Made fix for queryables response and schema
 * Added no cql filter resource in config file
 * Added unit test for no cql filter resources functionality
 * Studied how to implement a Domain Specific Language(DSL) in python
 * Read about various tools and libraries (LR Parsing, PLY and pyparsing) that are used for parsing DSL in python.
 * Learnt about Lexical Analysis, Regular Expressions, Token Generation, Context Free Grammar, Syntactical Analysis, importance of Precedence and Associativity of the operators in an expression, Shift Reduce parser, computation for Ambiguous Grammars like resolving reduce/reduce conflict or shift/reduce conflict in parser, grammar validation, extensive error checking and creation and evaluation of Abstract Syntax Tree(AST)
 * Read the pycql documentation which can be used as DSL for CQL Filters
 * Experimented the above with pycql codebase
 * Worked with Tokens, Lexer, Parser, YACC, LALR(1) parser and successfully generated a CQL filter AST with the help of pycql package.


 * 1) What am I going to achieve for next week?
 * Write a pytest example code with pycql that validates simple filter expressions
 * Draw a design diagram for a CQL class that handles the input filters and turn them into queries for the different backend providers
 * Write unit tests for the CQL class and its methods


 * 1) Are there any blockers?
 * No blockers for now


 * Week 7 (13th July - 19th July)
 * 1) What I have done during this period?
 * Designed a filter class and generated AST of the CQL filter expression using pycql lexer and parser.
 * Created cql_evaluate.py file for evaluating AST using Recursive Descent Approach.
 * Created filter.py file for defining all the filter evaluating functions to get filtered features.
 * Defined a function for evaluating AttributeExpression of AST.
 * Defined a function for evaluating LiteralExpression of AST.
 * Defined a function for evaluating CombinationConditionNode of AST for logical operators like "AND" and "OR".
 * Defined a function for evaluating ComparisonPredicateNode of AST for conditional operators like "<", ">", "=", "<=", ">=" and "<>".
 * Successfully performed the desired CQL operation on CSV Provider Dataset and retrieved filtered feature dataset.
 * Other query operations like start index, last index, page limit etc. are performed with the CQL query operations.
 * Added unit test cases for combination and comparison operations.
 * Successfully performed tests for Simple CQL filter expressions on CSV dataset('pygeoapi-test.csv').


 * 1) What am I going to achieve for next week?
 * Define functions for evaluating NotConditionNode, BetweenPredicateNode, LikePredicateNode, ArithmeticExpressionNode, InPredicateNode, NullPredicateNode, TemporalPredicateNode, SpatialPredicateNode, and BBoxPredicateNode of AST.
 * Perform the evaluation of CQL Filter expressions for other data providers of pygeoapi.
 * Restructure filter class and make the implementation generic for all the data providers of pygeoapi.


 * 1) Are there any blockers?
 * No blockers for now


 * Week 8 (20th July - 26th July)
 * 1) What I have done during this period?
 * Restructured pycql implementation for CSV Provider.
 * Proposed a design plan for generic implementation of CQL filters for all pygeoapi data providers.
 * Added custom CQL Exception class to handle all the exceptions related to CQL filters.
 * Added more unit test cases.


 * 1) What am I going to achieve for next week?
 * Define functions for evaluating NotConditionNode, BetweenPredicateNode, LikePredicateNode, ArithmeticExpressionNode, InPredicateNode, NullPredicateNode, TemporalPredicateNode, SpatialPredicateNode, and BBoxPredicateNode of AST.
 * Work on the design plan proposed and deliver a generic code base.


 * 1) Are there any blockers?
 * No blockers for now

27th July - 31st July

 * Evaluation Phase 2: First evaluation period. Mentors have evaluated my code written in week 5-8.

Week 9 - 12 (27th July - 31st August)
Coding phase 3:
 * Week 9 (27th July - 2nd August)
 * 1) What I have done during this period?
 * Worked on the implementation of the proposed CQL class design for delivering a generic code base.
 * Created a generic class in cql_evaluate.py file for parsing and evaluating AST on feature data provided by different pygeoapi data providers.
 * Restructured filter class.
 * Implemented CQL query filter for GeoJSON data provider.
 * Added routing for API based on query parameters- limit, start-index and CQL filter expressions.
 * Added code for generating accurate output when result-type=hits or result-type=results.
 * Added code on pagination for the resultant feature list.
 * Added code for invalid query parameter for CQL filter expression.
 * Defined a function for evaluating BetweenPredicateNode of AST.
 * Defined a function for evaluating InPredicateNode of AST.
 * Defined a function for evaluating NullPredicateNode of AST.
 * Added unit tests for CSV and GeoJSON data providers.
 * Added functional tests for flask endpoints.


 * 1) What am I going to achieve for next week?
 * Define functions for evaluating LikePredicateNode, ArithmeticExpressionNode, TemporalPredicateNode, SpatialPredicateNode, and BBoxPredicateNode of AST.
 * Perform the evaluation of CQL Filter expressions for remaining data providers of pygeoapi.


 * 1) Are there any blockers?
 * No blockers for now


 * Week 10 (3rd August - 9th August)
 * 1) What I have done during this period?
 * Defined a plugin implementation of CQL for pygeoapi data providers.
 * Restructured filter class to be more clean and compact.
 * Changed the evaluation of CQL filter from traditional approach to lambda expressions for faster execution and concise coding.
 * Defined a function for evaluating LikePredicateNode of AST.
 * Defined generate_regex function to create regex expression for wildcards(eg. '%') used in LIKE operation.
 * Defined a function for evaluating NOT operators in CQL filters.
 * Refactored the query function of data providers.
 * Added more unit tests for CSV and GeoJSON data providers.
 * Fixed some minor issues in the code base.


 * 1) What am I going to achieve for next week?
 * Define functions for evaluating ArithmeticExpressionNode, TemporalPredicateNode, SpatialPredicateNode, and BBoxPredicateNode of AST.
 * Implement CQL-lang as a parameter query.
 * Perform the evaluation of CQL Filter expressions for SQLite data provider.


 * 1) Are there any blockers?
 * No blockers for now


 * Week 11 (10th August - 16th August)
 * 1) What I have done during this period?
 * Created CQLHandler interface for CQLParser and CQLEvaluator classes.
 * Defined CQLFilter class to provide a generic code base to all the providers to perform CQL filter evaluations.
 * Implemented CQL extension at provider level of collections.
 * Added provider's CQL extension in config file.
 * Added configuration of different CQL filter query language for providers.
 * Restructured CQL Openapi Document generation code.
 * Added more unit test cases to support provider level existence of CQL filter and Openapi Document generation.
 * Explored *shapely* and *fiona* packages to implement Spatial filters like "INTERSECTS", "DISJOINT", "CONTAINS", "WITHIN", "TOUCHES", "CROSSES", "OVERLAPS", "EQUALS", "RELATE", "DWITHIN" and "BEYOND" operations on collection feature set.
 * Defined "Point", "Line" and "Polygon" geometries for filter evaluations.
 * Defined a function for evaluating SpatialPredicateNode of AST.
 * Explored *pytz* and *datatime* package to implement Temporal filters like "BEFORE", "BEFORE OR DURING", "DURING", "DURING OR AFTER", "AFTER" on collection feature set.
 * Defined a function for evaluating TemporalPredicateNode of AST.
 * Added test cases of spatial and temporal filters for CSV and GeoJSON providers.


 * 1) What am I going to achieve for next week?
 * Develop a work flow for taking CQL filter query parameter in JSON format.
 * Perform the evaluation of CQL filter expressions for SQLite data provider.
 * Code refining and minor bug fixes.
 * Start preparing the documentation.


 * 1) Are there any blockers?
 * No blockers for now.


 * Week 12 (17th August - 23th August)
 * 1) What I have done during this period?
 * Performed the evaluation of CQL filter expressions for SQLite data provider.
 * Implemented spatial filters for SQLite feature data.
 * Researched on how to translate CQL filter request to SQL, for using it as a request to the database.
 * Designed an implementation plan to carry out the above requirement.
 * Successfully structured SQL queries from AST generated by *pycql* for SQLite database.
 * Created sqlite_where_clause.py for creating all the 'where clauses' of SQL queries according to the CQL query parameter.
 * Refined sqlite_filter.py file for CQL Filter evaluation and to get filtered feature collection.
 * Restructured cql.py file for handling different aspect of CQL Filter evaluation.
 * Added CQL functional test cases.
 * Added CQL module test cases.
 * Added unit test cases for evaluation of CQL AST predict nodes.
 * Added API unit test cases for CQL query parameter.
 * Added simple and complex CQL filter test cases for SQLite data provider.
 * Improved linting score of the code.


 * 1) What am I going to achieve for next week?
 * Perform the evaluation of CQL filter expressions for PostGreSQL data provider.
 * Code refining and minor bug fixes.
 * Complete the documentation and prepare a video of the final project.
 * Submit the code for final evaluation


 * 1) Are there any blockers?
 * No blockers for now.


 * Week 13 (24th August - 30th August)
 * 1) What I have done during this period?
 * Performed the evaluation of CQL filter expressions for PostGreSQL data provider.
 * Implemented spatial filters for PostGreSQL feature data.
 * Researched on how to translate CQL filter request to SQL with psycopg2 module, for using it as a request to the database.
 * Designed an implementation plan to carry out the above requirement.
 * Successfully structured SQL queries from AST generated by *pycql* for PostGreSQL database.
 * Created postgres_where_clause.py for creating all the 'where clauses' of SQL queries according to the CQL query parameter.
 * Refined postgres_filter.py file for CQL Filter evaluation and to get filtered feature collection.
 * Added unit test cases for evaluation of CQL AST predict nodes.
 * Added simple and complex CQL filter test cases for PostGreSQL data provider.
 * Improved linting score of the code.
 * Fixed the changes on mentor's suggestion.
 * Completed the documentation and prepare a video of the final project.
 * Submitted the code for final evaluation

GSoC 2020 Final Report
CQL filter capabilities are added to pygeoapi as an extension to their software. This implementation allows user to request for features with an underlying layer of multiple simple or complex filter expressions. Thus providing an enhanced flexibility and better user control over response results. The added implementations are as follows:


 * OpenAPI Documentation: Implementation of CQL extension on pygeoapi by following OGC Standards and generated an OpenAPI Document with CQL specifications. Whether a data provider supports CQL filter extension or not is decided from the configuration file. The related CQL schema, components and filter parameters are added in the document.


 * Abstract Syntax Tree for CQL filter expression: Validation of CQL filter expressions and generation of Abstract Syntax Tree from the filter expression. Usage of lexer and parser from pycql library.


 * CQL for CSV and GeoJSON data providers: Evaluation of the Abstract Syntax Tree to filter the feature collections supported by CSV and GeoJSON data providers. pycql library has implementation connection to databases using ORM, but in pygeoapi the data providers don't work with ORM. So the evaluation for all the CQL query operations are developed from scratch and by using efficient methodlogy. The evaluated output is the response from the API.


 * CQL for SQLite data provider: Evaluation of the Abstract Syntax Tree to filter the feature collections supported by SQLite data provider. The AST of the CQL filter request is translated into SQL queries and then used as a request to the database. The evaluated output from the SQLite database is the response from the API.


 * CQL for PostGreSQL data provider: Evaluation of the Abstract Syntax Tree to filter the feature collections supported by PostGreSQL data provider. Like SQLite quesries, the AST of the CQL filter request is translated into PostGreSQL queries by following the syntax of psycopg2 database adapter. The query is then used as a request to the database. The evaluated output from the PostGreSQL database is the response from the API.

Simple Condition Predicate, Combination Predicate, Not Condition Predicate, Between Predicate, Like Predicate, In Predicate, Null Predicate, BBox Predicate, Spatial Predicate and Temporal Predicate
 * CQL predicates: Implementation of the following CQL predicates in pygeoapi to support filtering functionality on features.

Here is a link to the Final Report of GSoC 2020. Final Report

Here is a link to a step-by-step execution flow of the project. Steps

Here is a link to a working example utilizing all the implemented functionalities. Examples

Here is a link to the RTD documentation of CQL filter implementation on pygeoapi. pygeoapi RTD on CQL filters

 All links related to the project Git Wiki

Student's Biography
Farheen is a strong engineering professional with a B.Tech focused in Computer Science Engineering from West Bengal University of Technology, Kolkata, India. Currently she is pursuing M.Tech in Geo-informatics and Natural Resource Engineering under Centre of Studies in Resources Engineering from Indian Institute of Technology Bombay, Mumbai, India. Some more information about me can be obtained by following the link: User

Mentors

 * Francesco Bartoli
 * Jorge Samuel Mendes de Jesus