MS RFC 91: Layer Filter Normalization

Date:2014/7/14
Author:Steve Lime
Contact:sdlime@comcast.net
Status:Draft
Version:TBD

1. Overview

Layer filters are driver specific expressions that are used within MapServer in a couple of ways:

  1. to define feature subsets for rendering (independent of classes)
  2. to define feature subsets for selected query operations

The first use case generally applies to drivers that don’t support some sort expression language (e.g. SQL). While supported for PostGIS and Oracle Spatial there is little reason to use filters since the necessary SQL is better expressed in the layer data statement. For non-RDMS drivers filters are defined either by a item/value combination or a using MapServer’s standard expression syntax.

The second use case applies to all drivers where filters are defined using driver specific syntax to extract features from a data source. This leads to a number of issues:

  • confusion and inconsistencies in mapfile syntax
  • lack of portabilty of filters between drivers
  • performance limitations because only the simplest filters are easy to represent and secure

Furthermore, the above two use cases are mutually excluseive which again can lead to user confusion. Existing query operations that rely on filters (by attribute, by operator) cache the existing filter before defining and applying a new one. While this limitation could be addressed independently it becomes a much more manageable task when using a standardized filter definition.

2. Proposed solution

This SLA proposed standardizing filter definitions using the common expression syntax implemented as part of RFC XX. Driverscapable of handling complex expressions will translate MapServer expressions to an internal representation which should lead to very significant performance improvements. Drivers that don’t have that capability would apply filtering to features much the same way as the native MapServer shapefile and tiled shapefiles already do (via msXXXLayerNextShape())

2.1 Driver Filter Translation

Drivers that support complex expressions (e.g. PostGIS and Oracle Spatial) will need to implement a new layer API function (propose msXXXLayerTranslateFilter(layerObj *layer, expressionObj *filter, char *filteritem)) to translate both simple filters amd MapServer logical expressions to a native representation. MapServer logical exressions consist of a series of tokens that are processed from from left to right. It should be possible then to convert these tokens to a native representation of that the driver will handle independently of MapServer– typically some flavor of SQL.

2.2 Filter Merging

Filter merging is the process of creating one expression from two and it will only be needed with MapServer query functions that utilize filters as part of their stock processing, specifically msQueryByFilter() and msQueryByAttributes(). If an existing filter exists it will be merged with the new filter. The existing filter will be cached as it always has been. Because this operation is will result in a filter expressed using the common syntax this operation can be done in a general way:

expressionObj *msMergeFilters(expressionObj *filter1, expressionObj *filter2)

The resulting filter will then be translated to native representation if applicable later.

2.3 Filter Merging Examples

Case 1: FILTERITEM and FILTER “string” (or case-insensitive)

New FILTER becomes a logical expression: (([FILTERITEM] = “string”) AND (new filter))

Case 2: FILTERITEM and FILTER /string/ (or case-insensitive)

New FILTER becomes a logical expression: (([FILTERITEM] ~ “string”) AND (new filter))

Case 3: FILTER (string) with no FILTERITEM

New FILTER becomes a logical expression: ((string) AND (new filter))

In the case where the old filter is already native filter and the translation fails we will leave the old filter in place so the driver can at least process that portion and then fall back to MapServer for the new filter.

2.4 Query Function Simplification

The msQueryByAttribute() function can be turned into an alias for the msQueryByFilter() function. They do basically the same thing: define a new filter and execute a query using it. This simplifies the code base. A few functional changes will need to be made within msQueryByFilter(), specifically handling the case when only 1 result is returned (mode=itemquery) but that is a simple check at the bottom of the result processing loop.

This does not impact the CGi or MapScript APIs although we could choose to simplify those at a later date.

2.5 Future Enhancements

Work on this RFC indicates that the linked-list of tokens, while optimal for Bison/Yacc grammars, can be difficult to translate in some cases. For example, if a driver needs to add an argument to a function call: buffer([shape], 500) needs to translate to buffer(the_geom, 500, 1) if is difficult to know when to output the 1 in complex filter. It may be beneficial rewrite the token handling to use tree storage instead of a list. That change doesn’t affect the larger implementation approach, instead it offers an opportunity for more complete translations. Time will tell.

3. Implementation Details

3.1 Summary of Current Behaviour (by driver)

  • shapefile: supports FILTERITEM and FILTERs -> string, regular, logical expression
  • OGR: supports FILTERTITEM and FILTERs -> string, regular, logical expressions and native (starting with “WHERE ”)
  • PostGIS: supports FILTER -> native SQL snippet
  • Oracle Spatial: supports FILTERITEM and FILTERs -> string and native (constructs native)
  • SDE: supports FILTER -> native SQL snippet
  • MS SQL Server: supports FILTER -> native SQL snippet

3.2 Overview

  • limited additions to logial expression syntax are recommended:
    • spatial operator syntax (e.g. A INTERSECTS B) will be deprecated in favor of a functional notation (e.g. INTERSECTS(A, B) = TRUE). This is more consistent with spatialally enabled SQL syntax and easer to translate.
    • BEYOND/DWITHIN operators will only exist in the functional form, they are useless in logical expressions as they site now
  • there are no changes to the MapScript API
  • drivers that support complex expressions (e.g. PostGIS, Oracle, SDE, MS SQL Server, OGR) need to implement a translation function to construct a partial “WHERE” clause from a set of expression/filter tokens

3.2 Files affected

The following files will be modified by this RFC:

  • mapserver.h: extend expressionObj to contain a native_string (char *). This will hold a the native expression of the filter.
  • mapquery.c: call merging function in msQueryByAttribute() and msQueryByFilter(), call translation function in all query functions.
  • mapdraw.c: call filter translation function in msDrawVectorLayer()
  • maplayer.c: changes/additions to layer API function
  • mapshape.c, mappostgis.c, mapogr.c.... : changes/additions to driver functions
  • mapparser.y: add function-based versions of spatial operators, add BOOLEAN token type
  • maplexer.l: recognize TRUE/FALSE as boolean tokens (store in token->dblval) in expression context
  • mapogcfilter.c/mapogccommonfilter.c: update expression writing to use functional notation, fix dwithin/beyond processing

Notes:

Drivers: This can be implemented incrementally within drivers. At a minimum all drivers must implement the msXXXLayerTranslateFilter() and the function should at least detect the case where a native filter is already supplied (for backwards compatibility). For relational databases this is the case where the filter is a plain old string expression. For OGR this is where the plain old string expression starts with “WHERE ”. In these cases the new native_string should be populated with the filter string contents.

msLayerNextShape(): Evaluation of the common syntax (e.g. pure-MapServer) filter can happen here rather than within the driver code. The OGR, Shapefile and Tiled Shapefile drivers all do this now and would need to be changed. The filter would only be executed if the filter’s native_string is NULL (this is checked in msEvalExpression().

do {
  rv = layer->vtable->LayerNextShape(layer, shape);
  if(rv != MS_SUCCESS) return rv;

  filter_passed = MS_TRUE;  /* By default accept ANY shape */
  if(layer->numitems > 0 && layer->iteminfo) {
    filter_passed = msEvalExpression(layer, shape, &(layer->filter), layer->filteritemindex);
  }

  if(!filter_passed) msFreeShape(shape);
} while(!filter_passed);

msLayerSupportsCommonFilters(): This function can be removed from the layer API in favor of checking the filter->native_string after a call to msXXXLayerTranslateFilter().

3.3 MapScript

No changes necessary.

3.4 Backwards Compatibility Issues

By respecting a pre-defined FILTER with attribute and OGC filter queries it’s possible that users could get confused and/or have to re-architect certain aspects of an application. For example, they may have actually already build filter merging into their process.

4. Security implications

This RFC should make things more robust in this regard by forcing expressions through multiple levels of translation. The process for WFS getFeature filters would be OGC Filter (XML) => MapServer common expression => native expression. All token conversion for attribute bindings, string and date literals will be escaped using whatever mechanism the driver provides. For example, PostGIS provides msPostGISEscapeSQLParam().

5. Performance implications

The token replacement is only done in msLayerOpen(), the performance impact should be negligeable as this happens only once per rendered layer. Performance will improve greatly in some cases where operations are offloaded to the database - less network traffic, more efficient algorithms.

With WFS-based dwithin/beyond operators we currently apply the distance values as buffers and then hand off to MapServer. The suggested changes would remove that step and improve performane in all cases.

7. Voting history

Passed on 8/19/2014 with +1’s from Steve L, Thomas B, Mike S, Stephan M, Daniel M, Perry N, Stephen W, Jeff M, Tom K, Tamas S