ARLAS Collection model

About

A Collection is an Arlas object that references your indexed data and make it explorable by ARLAS-server.

As ARLAS-server is meant to deliver spatial-temporal data analysis, your indexed data must contain identifier, time, centroid and geometry fields.

And, by referencing your data in a Collection, you make your REST calls lighter as ARLAS-server already knows the fields to query.

Model

A Collection has the following structure :

{
  "collection_name": "string",
  "params": {
    "index_name": "string",
    "id_path": "string",
    "geometry_path": "string",
    "centroid_path": "string",
    "timestamp_path": "string",
    "include_fields": "string",
    "exclude_fields": "string",
    "custom_params": {},
    "display_names": {
      "collection" : "string",
      "fields": { "${doc_json_path}" :  "string" },
      "shape_columns": { "${geojson_flatten_path}" :  "string" }
    },
    "organisations": {
      "owner": "string",
      "shared": [
        "string"
      ],
      "public": "boolean"
    },
    "atom_feed": {
      ...
    },
    "dublin_core_element_name": {
      ...
    },
    "inspire": {
      ...
    },
    "open_search": {
      ...
    }
    "raster_tile_url": {
      ...
    },
    "raster_tile_width":"integer",
    "raster_tile_height":"integer"
  }
}

The atom_feed, dublin_core_element_name, inspire, open_search and ratser_tiles nodes are optionals.

The most important fields are:

Attribute Description Mention
index_name Name of the index in elasticsearch Mandatory
id_path Path to the id field in the indexed documents Optional
geometry_path Path to an Elasticsearch geometric field in the indexed documents Optional
centroid_path Path to an Elasticsearch Geo-point field in the indexed documents. Optional
timestamp_path Path to a timestamp/date field in the indexed documents, that meets the Elasticsearch date format, set when indexing data. Optional
include_fields Comma separated fields names that will be included in ARLAS-server responses. By default, all the fields are included Optional
exclude_fields Comma separated fields names that will be excluded from ARLAS-server responses. By default, none of the fields are excluded Optional
raster_tile_width In case the tile is too big, the crop width to apply. Set to -1 if not check must be applied Optional
raster_tile_height In case the tile is too big, the crop height to apply. Set to -1 if not check must be applied Optional
taggable_fields Comma separated fields names/paths that are allowed to be updated by the tag service. By default no field is taggable Optional
update_max_hits Maximum number of hits you can tag with one tag request Optional

Important 1

Taggable fields paths should not contain tags. It's a reserved word.

Important 2

Taggable fields must initially be set to a value or to null at index time in ES. For instance, if you use Logstash to index in ES, you can add in this line in Logstash config file to set the field values to null

ruby { code => "event.set('[labels][status]', nil);" } labels.status being the taggable field.

Important 4

geometry_path field value must have the same format in all documents within the same collection: It's not supported to index documents where geometry_path value is WKT and others as GeoJson. Otherwise a ParseException will be thrown. It goes the same for centroid_path.

ATOM

In case the ATOM output type on a collection is used in searches, the following properties can be set to customize the result:

   "atom_feed": {
     "author": {
       "name": "string",
       "email": "string",
       "uri": "string"
     },
     "contributor": {
       "name": "string",
       "email": "string",
       "uri": "string"
     },
     "icon": "string",
     "logo": "string",
     "rights": "string",
     "subtitle": "string",
     "generator": {
       "name": "string",
       "version": "string",
       "uri": "string"
     }
   }
Attribute Description Mention
author.name Author name of the feed Optional
author.email Author email of the feed Optional
author.uri Author URI of the feed Optional
contributor.name Name of the person or other entity who contributed to the feed Optional
contributor.email Email of the person or other entity who contributed to the feed Optional
contributor.uri URI of the person or other entity who contributed to the feed Optional
icon IRI reference (RFC3987) that identifies an image that provides iconic visual identification for a feed Optional
logo IRI reference (RFC3987) that identifies an image that provides visual identification for a feed Optional
rights Text that conveys information about rights held in and over an entry or feed Optional
subtitle Text that conveys a human-readable description or subtitle for a feed Optional
generator.name Name of the agent used to generate a feed, for debugging and other purposes Optional
generator.version Version of the agent used to generate a feed, for debugging and other purposes Optional
generator.uri URI of the agent used to generate a feed, for debugging and other purposes Optional

DUBLIN CORE

The Dublin Core Description document of the collection can be customized with the following properties:

   "dublin_core_element_name": {
       "title": "string",
       "creator": "string",
       "subject": "string",
       "description": "string",
       "publisher": "string",
       "contributor": "string",
       "type": "string",
       "format": "string",
       "identifier": "string",
       "source": "string",
       "language": "string",
       "bbox": {
         "north": 0,
         "south": 0,
         "east": 0,
         "west": 0
       },
       "date": "string",
       "coverage": {
         "additionalProp1": {},
         "additionalProp2": {},
         "additionalProp3": {}
       },
       "coverage_centroid": "string"
     }
Attribute Description Mention
title Title by which the resource is formally known Optional
creator Entity responsible for making the content of the resource Optional
subject The topic of the content of the resource. Optional
description An account of the content of the resource Optional
publisher Entity responsible for making the resource available Optional
contributor Entity responsible for making contributions to the content of the resource Optional
type The nature or genre of the content of the resource. Optional
format The physical or digital manifestation of the resource Optional
identifier An unambiguous reference to the resource within a given context. Generated by ARLAS-server
source A Reference to a resource from which the present resource is derived Optional
language A language of the intellectual content of the resource (taken from the ISO 639 standard) Optional
bbox Geographical extent of the resource content. Default to all globe Optional
date A date associated with an event in the life cycle of the resource Generated by ARLAS-server at collection creation
coverage Geographical extent of the resource content. Calculated by ARLAS-server from bbox attribute
coverage_centroid Centroid of the geographical extent of the resource content. Calculated by ARLAS-server from bbox attribute

INSPIRE

In case the INSPIRE option is enabled in configuration.yaml, the following properties can be set to customize the result of WFS GetCapabilities and CSW GetCapabilities, GetRecords & GetRecordById:

   "inspire": {
        "keywords": [
        {
          "value": "string",
          "vocabulary": "string",
          "date_of_publication": "string"
        }
        ],
        "languages": ["eng", ...],
        "topic_categories": "string",
        "lineage": "string",
        "spatial_resolution": {
          "value": "number",
          "unit_of_measure"
        }
        "inspire_uri": {
        "code": "string",
        "namespace": "string"
        },
        "inspire_use_conditions": "string",
        "inspire_limitation_access": {
        "access_constraints": "string",
        "otherConstraints": "string",
        "classification": "string"
        }
   }

The inspire node is mandatory only if INSPIRE option is enabled in configuration.yaml

Attribute Description Mention
keywords.value Value of the keyword. If the keyword is originated from Classification of Spatial data Services vocabulary, then the camel-case keyword should be set. For example : thematicImageProcessingService Mandatory
keywords.vocabulary Vocabulary from which the keyword value was taken. For example GEMET Inspire-themes or Classification of Spatial data Services vocabulary Mandatory for each keyword if the keyword value originates from a controlled vocabulary
keywords.date_of_publication Date of publication of the Vocabulary. Must be in YYYY-MM-DD format. Optional
languages The language(s) used within the resource. The value domain of this metadata element is limited to the languages defined in ISO 639-2. Mandatory if the resource includes textual information.
topic_categories List of topic categories. A topic category is a high-level classification scheme to assist in the grouping and topic-based search of available spatial data resources. Must be one of the values in this list. The value should be in camel-case. For example, the topic category of Climatology / Meteorology / Atmosphere, must be set as : climatologyMeteorologyAtmosphere Mandatory
lineage (Free text) This is a statement on process history and/or overall quality of the spatial data set. Where appropriate it may include a statement whether the data set has been validated or quality assured, whether it is the official version (if multiple versions exist) Mandatory
spatial_resolution Spatial resolution refers to the level of detail of the data set. It shall be expressed as a set of zero to many resolution distances (typically for gridded data and imagery-derived products) or equivalent scales (typically for maps or map-derived product Mandatory if an equivalent scale or a resolution distance can be specified
spatial_resolution.value An equivalent scale is expressed as an integer value expressing the scale denominator. A resolution distance should be expressed as a numerical value associated with a unit of length. Mandatory if an equivalent scale or a resolution distance can be specified
spatial_resolution.unit_of_value Unit of measure of the resolution distance. If it is not specified, that means spatial resolution is an equivalent scale Mandatory resolution distance can be specified
inspire_uri.code A character string code uniquely identifying the collection reference (data set), assigned by the data owner. Mandatory. If not set, it takes the id value generated by ARLAS-server
inspire_uri.namespace A character string namespace uniquely identifying the context of the identifier code (for example, the data owner). By default its value is ARLAS.{COLLECTION-NAME} Optional
inspire_use_conditions Provides information on any fees necessary to access and use the data set Optional; default : no conditions apply
inspire_limitation_access.access_constraints* Possible values : copyright, patent, patentPending, trademark, license, intellectualPropertyRights, restricted, otherRestrictions. Mandatory. Default value is otherRestrictions
inspire_limitation_access.otherConstraints* Free text or specify a URL to a link that describes eventual limitations. Default value : no limitations apply Optional
inspire_limitation_access.classification* Name of the handling restrictions on the WFS. One of: unclassified (default value), restricted, confidential, secret, topSecret Optional

*: inspire_limitation_access object describes access constraints applied to assure the protection of privacy or intellectual property, and any special restrictions or limitations on obtaining the Inspire compliant WFS.

OPENSEARCH

The OPENSEARCH Description document of the collection can be customized with the following properties:

   "open_search": {
     "short_name": "string",
     "description": "string",
     "contact": "string",
     "tags": "string",
     "long_name": "string",
     "image_height": "string",
     "image_width": "string",
     "image_type": "string",
     "image_url": "string",
     "developer": "string",
     "attribution": "string",
     "syndication_right": "string",
     "adult_content": "string",
     "language": "string",
     "input_encoding": "string",
     "output_encoding": "string",
     "url_template_prefix": "string"
   }
Attribute Description Mention
short_name Contains a brief human-readable title that identifies this search engine. The value must contain 16 or fewer characters of plain text. The value must not contain HTML or other markup. Optional
description Contains a human-readable text description of the search engine. The value must contain 1024 or fewer characters of plain text. The value must not contain HTML or other markup. Optional
contact Contains an email address at which the maintainer of the description document can be reached. The value must conform to the requirements of Section 3.4.1 "Addr-spec specification" in RFC 2822. Optional
tags Contains a set of words that are used as keywords to identify and categorize this search content. Tags must be a single word and are delimited by the space character (' '). The value must contain 256 or fewer characters of plain text. The value must not contain HTML or other markup. Optional
long_name Contains an extended human-readable title that identifies this search engine. The value must contain 48 or fewer characters of plain text. The value must not contain HTML or other markup. Optional
image_url Contains a URL that identifies the location of an image that can be used in association with this search content. Optional
image_height Contains the height, in pixels, of this image Optional
image_width Contains the width, in pixels, of this image Optional
image_type Contains the the MIME type of this image Optional
developer Contains the human-readable name or identifier of the creator or maintainer of the description document. The value must contain 64 or fewer characters of plain text. The value must not contain HTML or other markup. Optional
attribution Contains a list of all sources or entities that should be credited for the content contained in the search feed. The value must contain 256 or fewer characters of plain text. The value must not contain HTML or other markup. Optional
syndication_right Contains a value that indicates the degree to which the search results provided by this search engine can be queried, displayed, and redistributed. The value must be one of the following strings: "open" / "limited" / "private" / "closed" Optional
adult_content Contains a boolean value that should be set to true if the search results may contain material intended only for adults: true / false Optional
language Contains a string that indicates that the search engine supports search results in the specified language. The value must conform to the XML 1.0 Language Identification, as specified by RFC 5646. Optional
input_encoding Contains a string that indicates that the search engine supports search requests encoded with the specified character encoding. The value must conform to the XML 1.0 Character Encodings, as specified by the IANA Character Set Assignments. Optional
output_encoding Contains a string that indicates that the search engine supports search responses encoded with the specified character encoding. The value must conform to the XML 1.0 Character Encodings, as specified by the IANA Character Set Assignments. Optional
url_template_prefix Is the url prefix that is rendered in the url template of Opensearch response. Optional

RASTER TILES

If the data in the collection are metadata of images and the images are available as 256x256 px tiles through a WMTS or X/Y/Z service (top left corner is 0/0), then ARLAS can, for a given tile, dynamically stack the tiles aligned with the requested one and for the images matching a given filter. In case the tile server does not provide the tiles always with the right size (too big), then you can set the raster_tile_width and raster_tile_height to the desired dimensions, such as 256 or 512. A crop operation is done to meet the right size when the tile is too big. The "too small" case is not handled.

   "raster_tile_url": {
     "url": "string",
     "id_path": "string",
     "min_z": "integer",
     "max_z": "integer",
     "check_geometry": "boolean"
   }
Attribute Description Mention Default value
url The URL pattern of the WMTS or X/Y/Z service. It should contain variable place holders for {x}, {y}, {z} and {id} Mandatory
id_path JSON path of the image id that will be injected in the URL of the tile service optional id
min_z Min zoom supported by the tile service Optional 0
max_z Max zoom supported by the tile service Optional 18
check_geometry Whether ARLAS should check that the matching images have their geometry intersecting the requested tile. Usefull if the search returns false positives on geometric queries Optional false