ARLAS Collection model

About

A Collection is an Arlas object that references your indexed data and make it explorable by ARLAS-server.

As ARLAS-server is meant to deliver spatial-temporal data analysis, your indexed data must contain identifier, time, centroid and geometry fields.

And, by referencing your data in a Collection, you make your REST calls lighter as ARLAS-server already knows the fields to query.

Model

A Collection has the following structure :

{
  "collection_name": "string",
  "params": {
    "index_name": "string",
    "id_path": "string",
    "geometry_path": "string",
    "centroid_path": "string",
    "timestamp_path": "string",
    "include_fields": "string",
    "exclude_fields": "string",
    "custom_params": {},
    "display_names": {
      "collection" : "string",
      "fields": { "${doc_json_path}" :  "string" },
      "shape_columns": { "${geojson_flatten_path}" :  "string" }
    },
    "organisations": {
      "owner": "string",
      "shared": [
        "string"
      ],
      "public": "boolean"
    },
    "atom_feed": {
      ...
    },
    "dublin_core_element_name": {
      ...
    },
    "inspire": {
      ...
    },
    "open_search": {
      ...
    }
    "raster_tile_url": {
      ...
    },
    "raster_tile_width":"integer",
    "raster_tile_height":"integer"
  }
}

The atom_feed, dublin_core_element_name, inspire, open_search and ratser_tiles nodes are optionals.

The most important fields are:

Attribute	Description	Mention
index_name	Name of the index in elasticsearch	Mandatory
id_path	Path to the id field in the indexed documents	Optional
geometry_path	Path to an Elasticsearch geometric field in the indexed documents	Optional
centroid_path	Path to an Elasticsearch Geo-point field in the indexed documents.	Optional
timestamp_path	Path to a timestamp/date field in the indexed documents, that meets the Elasticsearch date format, set when indexing data.	Optional
include_fields	Comma separated fields names that will be included in ARLAS-server responses. By default, all the fields are included	Optional
exclude_fields	Comma separated fields names that will be excluded from ARLAS-server responses. By default, none of the fields are excluded	Optional
raster_tile_width	In case the tile is too big, the crop width to apply. Set to -1 if not check must be applied	Optional
raster_tile_height	In case the tile is too big, the crop height to apply. Set to -1 if not check must be applied	Optional
taggable_fields	Comma separated fields names/paths that are allowed to be updated by the tag service. By default no field is taggable	Optional
update_max_hits	Maximum number of hits you can tag with one `tag request`	Optional

Important 1

Taggable fields paths should not contain tags. It's a reserved word.

Important 2

Taggable fields must initially be set to a value or to null at index time in ES. For instance, if you use Logstash to index in ES, you can add in this line in Logstash config file to set the field values to null

    ruby {
        code => "event.set('[labels][status]', nil);"
    }

labels.status being the taggable field.

Important 4

geometry_path field value must have the same format in all documents within the same collection: It's not supported to index documents where geometry_path value is WKT and others as GeoJson. Otherwise a ParseException will be thrown. It goes the same for centroid_path.

ATOM

In case the ATOM output type on a collection is used in searches, the following properties can be set to customize the result:

   "atom_feed": {
     "author": {
       "name": "string",
       "email": "string",
       "uri": "string"
     },
     "contributor": {
       "name": "string",
       "email": "string",
       "uri": "string"
     },
     "icon": "string",
     "logo": "string",
     "rights": "string",
     "subtitle": "string",
     "generator": {
       "name": "string",
       "version": "string",
       "uri": "string"
     }
   }

Attribute	Description	Mention
author.name	Author name of the feed	Optional
author.email	Author email of the feed	Optional
author.uri	Author URI of the feed	Optional
contributor.name	Name of the person or other entity who contributed to the feed	Optional
contributor.email	Email of the person or other entity who contributed to the feed	Optional
contributor.uri	URI of the person or other entity who contributed to the feed	Optional
icon	IRI reference (RFC3987) that identifies an image that provides iconic visual identification for a feed	Optional
logo	IRI reference (RFC3987) that identifies an image that provides visual identification for a feed	Optional
rights	Text that conveys information about rights held in and over an entry or feed	Optional
subtitle	Text that conveys a human-readable description or subtitle for a feed	Optional
generator.name	Name of the agent used to generate a feed, for debugging and other purposes	Optional
generator.version	Version of the agent used to generate a feed, for debugging and other purposes	Optional
generator.uri	URI of the agent used to generate a feed, for debugging and other purposes	Optional

DUBLIN CORE

The Dublin Core Description document of the collection can be customized with the following properties:

   "dublin_core_element_name": {
       "title": "string",
       "creator": "string",
       "subject": "string",
       "description": "string",
       "publisher": "string",
       "contributor": "string",
       "type": "string",
       "format": "string",
       "identifier": "string",
       "source": "string",
       "language": "string",
       "bbox": {
         "north": 0,
         "south": 0,
         "east": 0,
         "west": 0
       },
       "date": "string",
       "coverage": {
         "additionalProp1": {},
         "additionalProp2": {},
         "additionalProp3": {}
       },
       "coverage_centroid": "string"
     }

Attribute	Description	Mention
title	Title by which the resource is formally known	Optional
creator	Entity responsible for making the content of the resource	Optional
subject	The topic of the content of the resource.	Optional
description	An account of the content of the resource	Optional
publisher	Entity responsible for making the resource available	Optional
contributor	Entity responsible for making contributions to the content of the resource	Optional
type	The nature or genre of the content of the resource.	Optional
format	The physical or digital manifestation of the resource	Optional
identifier	An unambiguous reference to the resource within a given context.	Generated by ARLAS-server
source	A Reference to a resource from which the present resource is derived	Optional
language	A language of the intellectual content of the resource (taken from the ISO 639 standard)	Optional
bbox	Geographical extent of the resource content. Default to all globe	Optional
date	A date associated with an event in the life cycle of the resource	Generated by ARLAS-server at collection creation
coverage	Geographical extent of the resource content.	Calculated by ARLAS-server from `bbox` attribute
coverage_centroid	Centroid of the geographical extent of the resource content.	Calculated by ARLAS-server from `bbox` attribute

INSPIRE

In case the INSPIRE option is enabled in configuration.yaml, the following properties can be set to customize the result of WFS GetCapabilities and CSW GetCapabilities, GetRecords & GetRecordById:

   "inspire": {
        "keywords": [
        {
          "value": "string",
          "vocabulary": "string",
          "date_of_publication": "string"
        }
        ],
        "languages": ["eng", ...],
        "topic_categories": "string",
        "lineage": "string",
        "spatial_resolution": {
          "value": "number",
          "unit_of_measure"
        }
        "inspire_uri": {
        "code": "string",
        "namespace": "string"
        },
        "inspire_use_conditions": "string",
        "inspire_limitation_access": {
        "access_constraints": "string",
        "otherConstraints": "string",
        "classification": "string"
        }
   }

The inspire node is mandatory only if INSPIRE option is enabled in configuration.yaml

Attribute	Description	Mention
keywords.value	Value of the keyword. If the keyword is originated from Classification of Spatial data Services vocabulary, then the camel-case keyword should be set. For example : `thematicImageProcessingService`	Mandatory
keywords.vocabulary	Vocabulary from which the keyword value was taken. For example GEMET Inspire-themes or Classification of Spatial data Services vocabulary	Mandatory for each keyword if the keyword value originates from a controlled vocabulary
keywords.date_of_publication	Date of publication of the Vocabulary. Must be in `YYYY-MM-DD` format.	Optional
languages	The language(s) used within the resource. The value domain of this metadata element is limited to the languages defined in ISO 639-2.	Mandatory if the resource includes textual information.
topic_categories	List of topic categories. A topic category is a high-level classification scheme to assist in the grouping and topic-based search of available spatial data resources. Must be one of the values in this list. The value should be in camel-case. For example, the topic category of Climatology / Meteorology / Atmosphere, must be set as : `climatologyMeteorologyAtmosphere`	Mandatory
lineage	(Free text) This is a statement on process history and/or overall quality of the spatial data set. Where appropriate it may include a statement whether the data set has been validated or quality assured, whether it is the official version (if multiple versions exist)	Mandatory
spatial_resolution	Spatial resolution refers to the level of detail of the data set. It shall be expressed as a set of zero to many resolution distances (typically for gridded data and imagery-derived products) or equivalent scales (typically for maps or map-derived product	Mandatory if an equivalent scale or a resolution distance can be specified
spatial_resolution.value	An equivalent scale is expressed as an integer value expressing the scale denominator. A resolution distance should be expressed as a numerical value associated with a unit of length.	Mandatory if an equivalent scale or a resolution distance can be specified
spatial_resolution.unit_of_value	Unit of measure of the resolution distance. If it is not specified, that means spatial resolution is an equivalent scale	Mandatory resolution distance can be specified
inspire_uri.code	A character string code uniquely identifying the collection reference (data set), assigned by the data owner.	Mandatory. If not set, it takes the id value generated by ARLAS-server
inspire_uri.namespace	A character string namespace uniquely identifying the context of the identifier code (for example, the data owner). By default its value is `ARLAS.{COLLECTION-NAME}`	Optional
inspire_use_conditions	Provides information on any fees necessary to access and use the data set	Optional; default : `no conditions apply`
inspire_limitation_access.access_constraints*	Possible values : `copyright`, `patent`, `patentPending`, `trademark`, `license`, `intellectualPropertyRights`, `restricted`, `otherRestrictions`.	Mandatory. Default value is `otherRestrictions`
inspire_limitation_access.otherConstraints*	Free text or specify a URL to a link that describes eventual limitations. Default value : `no limitations apply`	Optional
inspire_limitation_access.classification*	Name of the handling restrictions on the WFS. One of: `unclassified` (default value), `restricted`, `confidential`, `secret`, `topSecret`	Optional

*: inspire_limitation_access object describes access constraints applied to assure the protection of privacy or intellectual property, and any special restrictions or limitations on obtaining the Inspire compliant WFS.

OPENSEARCH

The OPENSEARCH Description document of the collection can be customized with the following properties:

   "open_search": {
     "short_name": "string",
     "description": "string",
     "contact": "string",
     "tags": "string",
     "long_name": "string",
     "image_height": "string",
     "image_width": "string",
     "image_type": "string",
     "image_url": "string",
     "developer": "string",
     "attribution": "string",
     "syndication_right": "string",
     "adult_content": "string",
     "language": "string",
     "input_encoding": "string",
     "output_encoding": "string",
     "url_template_prefix": "string"
   }

Attribute	Description	Mention
short_name	Contains a brief human-readable title that identifies this search engine. The value must contain 16 or fewer characters of plain text. The value must not contain HTML or other markup.	Optional
description	Contains a human-readable text description of the search engine. The value must contain 1024 or fewer characters of plain text. The value must not contain HTML or other markup.	Optional
contact	Contains an email address at which the maintainer of the description document can be reached. The value must conform to the requirements of Section 3.4.1 "Addr-spec specification" in RFC 2822.	Optional
tags	Contains a set of words that are used as keywords to identify and categorize this search content. Tags must be a single word and are delimited by the space character (' '). The value must contain 256 or fewer characters of plain text. The value must not contain HTML or other markup.	Optional
long_name	Contains an extended human-readable title that identifies this search engine. The value must contain 48 or fewer characters of plain text. The value must not contain HTML or other markup.	Optional
image_url	Contains a URL that identifies the location of an image that can be used in association with this search content.	Optional
image_height	Contains the height, in pixels, of this image	Optional
image_width	Contains the width, in pixels, of this image	Optional
image_type	Contains the the MIME type of this image	Optional
developer	Contains the human-readable name or identifier of the creator or maintainer of the description document. The value must contain 64 or fewer characters of plain text. The value must not contain HTML or other markup.	Optional
attribution	Contains a list of all sources or entities that should be credited for the content contained in the search feed. The value must contain 256 or fewer characters of plain text. The value must not contain HTML or other markup.	Optional
syndication_right	Contains a value that indicates the degree to which the search results provided by this search engine can be queried, displayed, and redistributed. The value must be one of the following strings: "open" / "limited" / "private" / "closed"	Optional
adult_content	Contains a boolean value that should be set to true if the search results may contain material intended only for adults: true / false	Optional
language	Contains a string that indicates that the search engine supports search results in the specified language. The value must conform to the XML 1.0 Language Identification, as specified by RFC 5646.	Optional
input_encoding	Contains a string that indicates that the search engine supports search requests encoded with the specified character encoding. The value must conform to the XML 1.0 Character Encodings, as specified by the IANA Character Set Assignments.	Optional
output_encoding	Contains a string that indicates that the search engine supports search responses encoded with the specified character encoding. The value must conform to the XML 1.0 Character Encodings, as specified by the IANA Character Set Assignments.	Optional
url_template_prefix	Is the url prefix that is rendered in the url template of Opensearch response.	Optional

RASTER TILES

If the data in the collection are metadata of images and the images are available as 256x256 px tiles through a WMTS or X/Y/Z service (top left corner is 0/0), then ARLAS can, for a given tile, dynamically stack the tiles aligned with the requested one and for the images matching a given filter. In case the tile server does not provide the tiles always with the right size (too big), then you can set the raster_tile_width and raster_tile_height to the desired dimensions, such as 256 or 512. A crop operation is done to meet the right size when the tile is too big. The "too small" case is not handled.

   "raster_tile_url": {
     "url": "string",
     "id_path": "string",
     "min_z": "integer",
     "max_z": "integer",
     "check_geometry": "boolean"
   }

Attribute	Description	Mention	Default value
url	The URL pattern of the WMTS or X/Y/Z service. It should contain variable place holders for `{x}`, `{y}`, `{z}` and `{id}`	Mandatory
id_path	JSON path of the image id that will be injected in the URL of the tile service	optional	id
min_z	Min zoom supported by the tile service	Optional	0
max_z	Max zoom supported by the tile service	Optional	18
check_geometry	Whether ARLAS should check that the matching images have their geometry intersecting the requested tile. Usefull if the search returns false positives on geometric queries	Optional	false