GSoC 2020 CQL filter implementation on pygeoapi

Idea
pygeoapi, is a Python server implementation of the OGC API suite of standards. OGC API standards define modular API building blocks to spatially enable Web API in a consistent way. This standard specifies the fundamental API building blocks for interacting with features. OGC API Features provides API building blocks to create, modify and query features of geospatial data collection on the Web.

A fundamental operation performed on a collection of features is that of querying in order to obtain a subset of the data which contains feature instances that satisfy some filtering criteria. This project aims to implement these enhanced filtering criteria in a request to a server. CQL will be used to specify how resource instances in a source collection should be filtered to identify a result set. Typically, CQL is used in query operations to identify the subset of resources that should be included in a response document. Each resource instance in the source collection is evaluated using a filtering expression. The overall filter expression always evaluates to true or false. If the expression evaluates to true, the resource instance satisfies the expression and is marked as being in the result set. If the overall filter expression evaluates to false, the data instance is not in the result set.

The project proposal is based on OGC API - Features - Part 3: Common Query Language document that defines the schema for a JSON document that exposes the set of properties or keys that may be used to construct CQL expressions for pygeoapi.

Project proposal
My proposal for GSoC 2020 can be found at: Develop CQL Filter implementation for pygeoapi.

Advantages from this project
On developing CQL feature filter implementation with JSON encoding, any combination of bbox, datetime and parameters for filtering on feature properties will be allowed on pygeoapi. The requirements on these parameters imply that only features matching all the predicates are in the result set. i.e., the logical operator between the predicates is 'AND'. The API definition may be used to determine details, e.g., on filter parameters. This depends on the needs of the client. These are clients that are in general able to use multiple APIs as long as it implements OGC API Features. Thus increases the client’s usage capabilities.

Link to Github repository: Repository

5th May - 31st May
Community bonding period:
 * 1) What I have done during this period?
 * Introduced myself over the channel and shared my proposal over mailing list for suggestions
 * Communicated with mentors and learned about community, working, etc. It was a great experience talking with experts in the domain
 * Created a wiki page for the project "Develop CQL Filter implementation for pygeoapi"
 * Forked the repository of pygeoapi
 * Updated wiki User page and added my personal information
 * Updated links on the wiki Google_Summer_of_Code_2020_Accepted page
 * Joined Gitter account created by mentors which we will be used as a mode of communication over the GSoC 2020 period
 * Gone through the architecture and codebase of pygeoapi
 * Read OGC API - Features - Part 1: Core OGC API - Features - Part 3: Common Query Language documents to understand the standards of OGC API.
 * Understood the implementation of CQL schema, standards and JSON encoding
 * Learned to work with python CQL parser pycql.
 * Took jitsi conference call with the mentors to understand the codebase of the project and discuss on the tasks that needs to covered in coding phase 1 period
 * Edited the GSoC proposal from XML implementation to JSON implementation CQL
 * Discussed with the mentors about their expectations over the GSoC 2020 project period


 * 1) What am I going to achieve for next week?
 * Implement (https://github.com/opengeospatial/ogcapi-features/tree/master/extensions/cql CQL filter) specifications as OpenAPI Document


 * 1) Are there any blockers?
 * No blockers for now

Week 1 - 4 (1st June - 28th June)
Coding phase 1:
 * Week 1 (1st June - 7th June)
 * 1) What I have done during this period?
 * Created a new Swagger OpenAPI Document by implementing sections 6.2, 6.3, 6.4 of OGC API - Features - Part 3: Common Query Language standards.
 * Shared the doc with pygeoapi community for suggestions.


 * 1) What am I going to achieve for next week?
 * Add filter components and definition of various feature class, expressions and operators in the Swagger OpenAPI Document.
 * Consult with mentors on different approaches of implementation.


 * 1) Are there any blockers?
 * No blockers for now.


 * Week 2 (8th June - 14th June)
 * 1) What I have done during this period?
 * Added the feature class implementation of OGC API - Features - Part 3: Common Query Language standards which is an alignment with sections 7.2, 8.2, 8.3, 8.4, 8.5, 8.7, 8.8, 8.9, 8.10, 9.2, 9.3, 9.4, 9.5, 10.2 and 10.3
 * Observed that the json schema does not follow the Swagger OpenAPI latest specifications. Suggested an updated json schema that follows OpenAPI 3.0 standards.
 * An alternative solution considered here needed to convert the json document that supported 2.0 to support 3.0.
 * The suggested CQL extension in the OpenAPI document is to support the 3.0 version.
 * Opened a new issue on ogcapi-features concerning the above observation.
 * The document was reviewed by the mentors and got their approval to proceed with the proposal.
 * Created a new branch in my forked repository for committing all my changes related to CQL implementation.


 * 1) What am I going to achieve for next week?
 * Modify openapi.py file so that it generates the proposed OpenAPI document.
 * Test the code for compliance


 * 1) Are there any blockers?
 * No blockers for now


 * Week 3 (15th June - 21st June)
 * 1) What I have done during this period?
 * Read about Test-driven development (TDD) approach
 * Added filter key in test config file for cql-text and cql-json
 * Added test cases on get_oas_30 and test_cql_filters functions for every piece of the openapi document related to CQL extension in test openapi file
 * On executing the added test cases, all the 20 test cases were passed
 * Modified openapi file so that it generates the proposed OpenAPI document for filter, filter-lang, CQL components, responses and schemas.
 * Committed the branch changes to the forked repository.


 * 1) What am I going to achieve for next week?
 * Test the code for compliance
 * Start documentation


 * 1) Are there any blockers?
 * No blockers for now


 * Week 4 (22nd June - 28th June)
 * 1) What I have done during this period?
 * Modified the test cases and made it more generic for all the providers with CQL support or NO CQL support
 * The test cases are now passed by all types of providers
 * Modified openapi file to generate a generic CQL document depending whether the resources/collections supports CQL filters or not
 * CQL document generation now depends on different providers' config file
 * Tested the code for compliance
 * Refactored code according to flake8 standards
 * Committed the changes on the branch
 * Build the branch on Travis CI


 * 1) What am I going to achieve for next week?
 * Implement documentation for cql branch with read the docs
 * Mockup newly added REST endpoints


 * 1) Are there any blockers?
 * No blockers for now

Week 5 - 8 (29th June - 26th July)
Coding phase 2:
 * Week 5 (29th June - 5th July)
 * 1) What I have done during this period?
 * Added documentation for the classes and methods related to CQL branch
 * Added CQL configuration to RTD
 * Checked for linting and flake8 standards on the code
 * Created a clean PR for the OpenAPI CQL branch with successful build


 * 1) What am I going to achieve for next week?
 * Read the pycql documentation and brainstorm about its implementation
 * Mockup newly added REST endpoints and filter configurations


 * 1) Are there any blockers?
 * No blockers for now


 * Week 6 (6th July - 12th July)
 * 1) What I have done during this period?
 * Made fix for queryables response and schema
 * Added no cql filter resource in config file
 * Added unit test for no cql filter resources functionality
 * Studied how to implement a Domain Specific Language(DSL) in python
 * Read about various tools and libraries (LR Parsing, PLY and pyparsing) that are used for parsing DSL in python.
 * Learnt about Lexical Analysis, Regular Expressions, Token Generation, Context Free Grammar, Syntactical Analysis, importance of Precedence and Associativity of the operators in an expression, Shift Reduce parser, computation for Ambiguous Grammars like resolving reduce/reduce conflict or shift/reduce conflict in parser, grammar validation, extensive error checking and creation and evaluation of Abstract Syntax Tree(AST)
 * Read the pycql documentation which can be used as DSL for CQL Filters
 * Experimented the above with pycql codebase
 * Worked with Tokens, Lexer, Parser, YACC, LALR(1) parser and successfully generated a CQL filter AST with the help of pycql package.


 * 1) What am I going to achieve for next week?
 * Write a pytest example code with pycql that validates simple filter expressions
 * Draw a design diagram for a CQL class that handles the input filters and turn them into queries for the different backend providers
 * Write unit tests for the CQL class and its methods


 * 1) Are there any blockers?
 * No blockers for now


 * Week 7 (13th July - 19th July)
 * 1) What I have done during this period?
 * Designed a filter class and generated AST of the CQL filter expression using pycql lexer and parser.
 * Created cql_evaluate.py file for evaluating AST using Recursive Descent Approach.
 * Created filter.py file for defining all the filter evaluating functions to get filtered features.
 * Defined a function for evaluating AttributeExpression of AST.
 * Defined a function for evaluating LiteralExpression of AST.
 * Defined a function for evaluating CombinationConditionNode of AST for logical operators like "AND" and "OR".
 * Defined a function for evaluating ComparisonPredicateNode of AST for conditional operators like "<", ">", "=", "<=", ">=" and "<>".
 * Successfully performed the desired CQL operation on CSV Provider Dataset and retrieved filtered feature dataset.
 * Other query operations like start index, last index, page limit etc. are performed with the CQL query operations.
 * Added unit test cases for combination and comparison operations.
 * Successfully performed tests for Simple CQL filter expressions on CSV dataset('pygeoapi-test.csv').


 * 1) What am I going to achieve for next week?
 * Define functions for evaluating NotConditionNode, BetweenPredicateNode, LikePredicateNode, ArithmeticExpressionNode, InPredicateNode, NullPredicateNode, TemporalPredicateNode, SpatialPredicateNode, and BBoxPredicateNode of AST.
 * Perform the evaluation of CQL Filter expressions for other data providers of pygeoapi.
 * Restructure filter class and make the implementation generic for all the data providers of pygeoapi.


 * 1) Are there any blockers?
 * No blockers for now


 * Week 8 (20th July - 26th July)
 * 1) What I have done during this period?
 * Restructured pycql implementation for CSV Provider.
 * Proposed a design plan for generic implementation of CQL filters for all pygeoapi data providers.
 * Added custom CQL Exception class to handle all the exceptions related to CQL filters.
 * Added more unit test cases.


 * 1) What am I going to achieve for next week?
 * Define functions for evaluating NotConditionNode, BetweenPredicateNode, LikePredicateNode, ArithmeticExpressionNode, InPredicateNode, NullPredicateNode, TemporalPredicateNode, SpatialPredicateNode, and BBoxPredicateNode of AST.
 * Work on the design plan proposed and deliver a generic code base.


 * 1) Are there any blockers?
 * No blockers for now

Week 9 - 12 (27th July - 31st August)
Coding phase 3:
 * Week 9 (27th July - 2nd August)
 * 1) What I have done during this period?
 * Worked on the implementation of the proposed CQL class design for delivering a generic code base.
 * Created a generic class in cql_evaluate.py file for parsing and evaluating AST on feature data provided by different pygeoapi data providers.
 * Restructured filter class.
 * Implemented CQL query filter for GeoJSON data provider.
 * Added routing for API based on query parameters- limit, start-index and CQL filter expressions.
 * Added code for generating accurate output when result-type=hits or result-type=results.
 * Added code on pagination for the resultant feature list.
 * Added code for invalid query parameter for CQL filter expression.
 * Defined a function for evaluating BetweenPredicateNode of AST.
 * Defined a function for evaluating InPredicateNode of AST.
 * Defined a function for evaluating NullPredicateNode of AST.
 * Added unit tests for CSV and GeoJSON data providers.
 * Added functional tests for flask endpoints.


 * 1) What am I going to achieve for next week?
 * Define functions for evaluating LikePredicateNode, ArithmeticExpressionNode, TemporalPredicateNode, SpatialPredicateNode, and BBoxPredicateNode of AST.
 * Perform the evaluation of CQL Filter expressions for remaining data providers of pygeoapi.


 * 1) Are there any blockers?
 * No blockers for now

Student's Biography
Farheen is a strong engineering professional with a B.Tech focused in Computer Science Engineering from West Bengal University of Technology, Kolkata, India. Currently she is pursuing M.Tech in Geo-informatics and Natural Resource Engineering under Centre of Studies in Resources Engineering from Indian Institute of Technology Bombay, Mumbai, India. Some more information about me can be obtained by following the link: User

Mentors

 * Francesco Bartoli
 * Jorge Samuel Mendes de Jesus