MapGuide RFC 5 - Enhanced Join Support

From OSGeo
Jump to navigation Jump to search
This page contains an change request (RFC) for MapGuide Open Source.  
More MapGuide RFCs can be found on the MapGuide RFCs page.

Status

  • Submission Date: November 2, 2006
  • Last Modified Date: November 3, 2006
  • Author: Ronnie Louie
  • RFC Status: draft
  • Implementation Status: pending
  • Voting History:
  • Assigned PSC guide(s):

Overview

This proposal is to support additional functionality to feature source joins for chained joins, inner joins, and 1-to-many support.

Current join support is limited to left outer join type only. This means that the result consists of all feature records from the primary source, even if the attributes are null. An inner join is defined as a join that results in only the features that have a match in both joined feature sources.

A chained join is one in which the secondary source for a previously defined joined is used as a primary source for another join. This situation occurs when there is a need to join a secondary table to another secondary table, not just hang all secondary tables off the primary one.

It is possible that a secondary source within a join might have multiple records that match one primary record. This is a 1 : many case. Current functionality limits the joined records to only the first one, effectively enforcing a 1 : 1 case, such that the additional matching records are not available in the join result. By supporting the 1 : many case, all the joined secondary data is available for further analysis.

Motivation

The existing feature join functionality requires the feature source to define an extension to a primary feature class. This extension is defined by identifying the secondary feature source and feature class, along with the attributes from both classes to join on. The extension does not specify the type of join to perform, which actually is left outer join only. There are situations where an inner join type on the data is desired, however this option is currently not possible since there is no way to choose a join type with the current implementation.

Joins in MapGuide currently consist of one primary feature class joined to one or more secondary feature classes. The secondary class cannot be extended via join in MapGuide even though it may make sense to perform such a chained join. For example, a primary table “parcels” is joined to a secondary table of “owners”. But you want to join “owners” to “demographics” to do an analysis on age or income levels or family size.

When a primary feature record is joined to secondary source, there may be one or more matching secondary records. MapGuide is only retrieving the first matching secondary record and ignoring the rest. Users would need to interrogate the secondary source separately to obtain the additional data. This is cumbersome and possibly error prone as there is no indication provided that additional records may exist. We need to provide an option to the extension definition to facilitate a 1 : many join result. For example, a primary table "parcels" is joined to a secondary table of "owners". A particular parcel "A01" is can have 3 different owners. MapGuide currently implements the 1 : 1 case, and retrieves the first encountered owner and displays a single parcel geometry on the map. The other two owners are never displayed in any fashion. If MapGuide supports a 1 : many option, MapGuide will be able to provide the details for all three owners, should the need arise.

Funding/Resources

The effort to implement the proposed changes will be sponsored by Autodesk.

Proposed Changes

Enhanced join support will make use of schema[1] changes already incorporated into the trunk code stream. In particular, the relevant schema changes are:

1. New optional attribute for specifying the join type. If not specified the default will be left outer join.

2. New optional attribute for specifying the 1 : 1 or 1 : many option. If not specified the default will be 1 : 1.

3. New optional attribute for specifying the delimiter symbol for identifying extended attributes. If not specified, the default will be an empty string. This delimiter is a customizable string and is not limited to a single character.


Changes to join notation is summarized as follows:

1. To support chained joins, notation needs to be introduced for defining these joins. Particularly, the secondary attribute from a primary join needs to be specified as the primary join property for a chained join. The primary join name and feature class property will use the dot symbol "." to separate the join name from the class property in the FeatureClassProperty attribute. The FeatureClassProperty value will be prefixed by the primary name using the the following notation <Primary_Join_name.FeatureClassProperty_name>.

2. As previously mentioned, extended properties will be identified by using a delimiter to separate the property names from their join qualifiers. The current implementation, requires joins to be named, and that name is used to qualify the extended properties. For example, the join between "parcels" and "owners" is given the name "ParcelOwnerJoin". The "owners" table has attributes "Id" and "Name". The join result contains "ParcelOwnerJoinId" and "ParcelOwnerJoinName" to indicate the "Id" and "Name" are extended attributes. By introducing a delimiter, say a vertical bar "|", the extended properties will be "ParcelOwnerJoin|Id" and "ParcelOwnerJoin|Name". This makes it clearer as to which attributes are in fact extended attributes.


The join result when 1 : many support is enabled will be duplicate primary (typically geometry) features for each matching secondary record. For the parcels-owners join example, the parcel "A01" with three owners will result in three duplicate (and overlapping) parcels, each with the different attributes values for the respective owners.

Implications

Backwards compatibility should be maintained since these changes should not break existing applications built on the current join logic. Documentation will need to be updated for the new join functionality.

Test Plan

In addition to the existing tests for left outer joins, new test cases for inner join, chained joins, and 1 : many support will need to be created. These tests should cover joins to file-based sources as well as connection-based sources.