MS RFC 122: A Tool for Coverage Analysis of MapCache Multi-SQLite Caches

Date:2018-07-26
Author:Jerome Boue
Version:MapCache 1.8

1. Motivation

MapCache users may want to use offline a portion of a MapCache cache covering a given geographical region. For that purpose, they need to identify relevant files from an existing cache, check whether tiles are missing from these files (possibly seeding missing parts if necessary), and extract these files to build a new cache of the requested region.

The tool described in this RFC, named mapcache_detail, helps users with some of these activities. It works with multi-SQLite caches where one single file may contains thousands of tiles. It is able to:

  • Determine which SQLite files from the cache are needed to cover a given geographical region at a given zoom level range;
  • Count how many tiles are needed to cover the region and how many are already present in each file, giving a coverage ratio on file level, zoom level and global level;
  • Estimate data size of missing tiles, which need to be downloaded to entirely cover the requested region, again on file level, zoom level and global level.

mapcache_detail takes the form of an independent CLI executable using MapCache library. It is proposed as a simple contribution to the MapCache project under a contrib/mapcache_detail/ folder, due to its specific purpose (only SQLite caches are handled).

2. Supported SQLite Caches

The mapcache_detail tool supports:

3. Functionnal Overview

From initial cache configuration, requested geographic region and zoom levels, mapcache_detail finds out which SQLite files match said requested region. Then for each SQLite file, the tool measures:

  • File size and number of tiles actually present in file, no matter their contribution to region coverage; these measures are used to estimate the mean size of a tile and the size of a cache fully covering requested region.
  • Number of tiles actually present in file and needed to cover requested region.
  • Number of tiles needed to cover requested region, present or not in file. These last two measures are used to determine coverage ratio of the cache relatively to requested region.

A report is then provided containing all measures for every SQLite file, and combined at zoom level and global level. This report takes the form of JSON data on standard output.

3.1 Input Parameters

mapcache_detail needs two kinds of parameters:

  • Cache description, composed of:

    • MapCache configuration file of initial cache,
    • Layer, SRS and dimensions if needed,
    • A SQL query to count tiles in a rectangular area. This one is not needed if SQLite cache has been built with default settings of MapCache.
  • Region description, composed of:

    • Lowest and highest requested zoom levels,

    • Geometry of requested region, provided in one of the following formats:

      • xmin, ymin, xmax, ymax of the rectangle defining the region
      • a vector file specifying the region of arbitrary shape in one of the file formats supported by OGR/GDAL library.

Online help gives a description of available options to set input parameters. These options are similar to mapcache_seeder’s options:

$ mapcache_detail --help

Usage:   mapcache_detail <options>

    -h | --help
                Display this message and exit
    -c | --config <value>
                Configuration file (/path/to/mapcache.xml)
    -D | --dimension <value>
                Set the value of a dimension: format DIMENSIONNAME=VALUE.
                Can be used multiple times for multiple dimensions
    -t | --tileset <value>
                Tileset to analyze
    -g | --grid <value>
                Grid to analyze
    -e | --extent <value>
                Extent to analyze: format minx,miny,maxx,maxy. Cannot be
                used with --ogr-datasource.
    -d | --ogr-datasource
                OGR data source to get features from. Cannot be used with
                –extent.
    -l | --ogr-layer
                OGR layer inside OGR data source. Cannot be used with
                –ogr-sql.
    -w | --ogr-where
                Filter to apply on OGR leyr features. Cannot be used with
                –ogr-sql.
    -s | --ogr-sql
                SQL query to filter inside OGR data source. Cannot be used
                with –ogr-layer or –ogr-where.
    -z | --zoom <value>
                Set min and max zoom levels to analyze, separated by a
                comma, eg: 12,15
    -q | --query <value>
                Set query for counting tiles in a rectangle. Default value
                works with default schema of SQLite caches.
    -o | --short-output
                Only Existing SQLite files are reported, missing SQLite
                files are still Taken into account for level and global
                coverage.

3.2 Identifying files and counting tiles

To illustrate the process, here is an example of a fictional grid. Tiles are represented by the smallest squares on the grid. Larger squares of 25 tiles each represent SQLite files. Small indices denotes tile coordinates whereas large indices denote databases coordinates. Colored rectangle represents requested region for cache extraction. Darker tiles represent tiles present in cache.

../../_images/mapcache-detail.png

Expressed in tiles, region coordinates (xmin, ymin, xmax, ymax) are (11, 7, 27, 20). Expressed in SQLite files, these coordinates are (2, 1, 5, 4). All files whose coordinates are comprised between (2, 1) and (5, 4) included shall be part of the cache extraction. All tiles whose coordinates are comprised between (11, 7) and (27, 20) included shall be counted for the region coverage.

Following table gives tile count and coverage ratio of requested region, according to tool’s process:

SQlite file (2,1) (2,2) (2,3) (2,4) (3,1) (3,2) (3,3) (3,4) (4,1) (4,2) (4,3) (4,4) (5,1) (5,2) (5,3) (5,4) Total
Tiles present in cache and covering region 0 4 3 0 4 9 8 0 4 6 3 0 2 0 0 0 43
Tiles needed to fully cover region 12 20 20 4 15 25 25 5 15 25 25 5 9 15 15 3 238
Coverage 0 0.2 0.15 0 0.267 0.36 0.32 0 0.267 0.24 0.12 0 0.222 0 0 0 0.181

3.3 Output Report

Tool’s output, in JSON format, provides user with SQLite file list to be extracted from cache. Details are given on the number of tiles contributing to region coverage. A synthesis is also given for each zoom level and at a global level.

Following is a fictional example describing information present in tool’s output.

{                                   _______________________________________
    "layer": "example",            | Report starts with general information
    "grid": "local",               | on cache and requested region
    "unit": "m",
    "region": {
        "bounding_box": [ 11, 7, 27, 20 ],
        "geometry": {
            "type": "Polygon",
            "coordinates": [[ [11,7], [11,20], [27,20], [27,7], [11,7] ]]
        }
    },
    "zoom-levels": [ {
            "level": 1,
            "files": [ {            _______________________________________
                                   | For each file, output report gives:
                                   | its name, its size, its bounding box
                                   | and intersection of that bounding box
                                   | with requested region

                "file_name": "/path/to/cache/example/1/2-1.sqlite",
                "file_size": 54632,
                "file_bounding_box": [ 10, 5, 14, 9 ],
                "region_in_file": {
                    "bounding_box": [ 11, 7, 14, 9 ],
                    "geometry": {
                        "type": "Polygon",
                        "coordinates":
                            [[ [11,7], [11,9], [14,9], [14,7], [11,7] ]]
                    }
                },
                "nb_tiles_in_region": { ___________________________________
                                   | Measures associated to a SQLite file
                                   | are: number of tiles belonging to
                                   | requested region and present in file,
                                   | number of tiles belonging to region
                                   | present or not in file, and resulting
                                   | coverage ratio

                    "cached_in_file": 0,
                    "max_in_file": 12,
                    "coverage": 0
                }
            }, {
                "file_name": ...
                ...
            } ],
            "nb_tiles_in_region": { _______________________________________
                                   | Measures associated to a zoom level
                                   | are the sum of the ones for each SQLite
                                   | file of that level

                "cached_in_level": 43,
                "max_in_level": 238,
                "coverage": 0.1807
            }
    }, {
            "level": 2,
            ...
    } ],
    "nb_tiles_in_region": {         _______________________________________
        "cached_in_cache": 43,     | Global measures are the sum of all
        "max_in_cache": 238,       | zoom level measures
        "coverage": 0.1807
    },
    "sizes": {                      _______________________________________
                                   | At global level estimations about
                                   | cache size to be extracted for a full
                                   | region coverage are also given. These
                                   | estimations are based on the mean size
                                   | of a tile obtained from all SQLite file
                                   | sizes and how many tiles they contain

        "total_size_of_files": 1599442,
        "total_nb_tiles_in_files": 60,
        "average_tile_size": 26658,
        "estimated_max_cache_size": 6344604,
        "estimated_cached_cache_size": 1146294,
        "estimated_missing_cache_size": 5198310
    }
}

4. Implementation details

4.1. Dependencies

cJSON
Output report is presented as a JSON document. Therefore mapcache_detail needs JSON printing features. For that purpose an off-the-shelf solution is proposed, namely cJSON, available on GitHub through a MIT license. Anticipating integration of ElasticSearch Dimension back-end, (see RFC-121) which also needs JSON, files cJSON.h and cJSON.c are simply copied in MapCache project, respectively in include/ and lib/ directories.
OGR/GDAL, GEOS
In order to enter a non rectangular region defined with xmin, ymin, xmax, ymax, OGR/GDAL and GEOS third party libraries are needed. If these libraries are not available, this feature can be disabled at CMake stage with -DWITH_GEOS=OFF -DWITH_OGR=OFF arguments.

4.2. Build

mapcache_detail is automatically built when building MapCache with CMake, see MapCache Compilation & Installation instructions.

4.3. Affected files

File name Status Description
contrib/mapcache_detail/ :
mapcache_detail.c, CMakeLists.txt, mapcache_detail_config.h.in
New Implementation of mapcache_detail tool
include/cJSON.h, lib/cJSON.c New Embedded JSON parser and printer
CMakeLists.txt Modified Intégration of tool in build chain

5. Credits

Thanks to funding from the French Ministry of Defence.