Indices
Elasticsearch index
To be explored in ARLAS dashboards, the data has to be indexed in an Elasticsearch (ES) index. An index contains the data and a mapping to describe how fields have to be interpreted (types).
arlas_cli
provide tools to infer mapping from data and manage the ES indices with the indices
command.
List index management commands
> arlas_cli indices --help
Usage: arlas_cli indices [OPTIONS] COMMAND [ARGS]...
Options:
--config TEXT Name of the ARLAS configuration to use from your
configuration file
(/home/willi/.arlas/cli/configuration.yaml).
--help Show this message and exit.
Commands:
clone Clone an index and set its name
create Create an index
data Index data
delete Delete an index
describe Describe an index
list List indices
mapping Generate the mapping based on the data
migrate Migrate an index on another arlas configuration, and set the...
sample Display a sample of an index
mapping
arlas_cli
provide tools to infer the ES mapping directly from a data file.
> arlas_cli indices --config local mapping --help
Usage: arlas_cli indices mapping [OPTIONS] FILE
Generate the mapping based on the data
Arguments:
FILE Path to the file containing the data. Format: NDJSON [required]
Options:
--nb-lines INTEGER Number of line to consider for generating the mapping.
Avoid going over 10. [default: 2]
--field-mapping TEXT Override the mapping with the provided field
path/type. Example: fragment.location:geo_point.
Important: the full field path must be provided.
--no-fulltext TEXT List of keyword or text fields that should not be in
the fulltext search. Important: the field name only
must be provided.
--no-index TEXT List of fields that should not be indexed.
--push-on TEXT Push the generated mapping for the provided index name
--help Show this message and exit.
See full arlas_cli documentation at https://gisaia.github.io/arlas_cli/
Data file
To generate a mapping, you need to provide a NDJSON file
(New line delimiter JSON).
The values of the first lines of the files are used to infer the mapping for each field of the data.
--nb_lines
The indices mapping
function uses the first rows to infer mapping.
If a field is not present in the first rows, it will not appear in the mapping.
Make sure to take enough rows to get all the fields with the option --nb_lines
Type identification
The mapping associates to each field of the data a type (see Elasticsearch type)
A geometry is identified as such if
- it is a geojson
- it is a WKT string
- the field name contains
geohash
- it is a string containing two float separated by a comma
A date is identified as such if
- its name is one of
timestamp
,date
,start
orend
and that it can be parsed as a date - its name contains
timestamp
,date
,start
orend
and its values are number within [631152000, 4102444800] or [631152000000, 4102444800000] (year 1990 to 2100)
--field-mapping
If the mapping is wrong, you can overwrite the typing with the --field-mapping
option.
It has the structure field_name:field_type (see Elasticsearch type)
Examples:
--field-mapping field_point:geo_point
--field-mapping field_geometry:geo_shape
--field-mapping field_short_text:keyword
--field-mapping field_long_text:text
--field-mapping field_float:double
--field-mapping field_int:long
The date fields have a format that can be specified as field_name:date-format with all format accepted by Elasticsearch date type
Examples:
--field-mapping field_time_epoch_second:date-epoch_second
--field-mapping field_time_epoch_millisecond:date-epoch_millis
--field-mapping field_time_pattern:date-"yyyy-MM-dd HH:mm:ss"
By default, the keywords and text fields are searchable as fulltext to be accessible in the search bar.
--no-fulltext
If searching through a field value is not needed, it can be deactivated. That would result in better performances for the fulltext search.
Example:
--no-fulltext field_keyword
--no-index
If a field doesn't need to be explored in the dashboard, it should be removed before indexing the data.
Alternatively, you can explicitly exclude the field from being indexed using the --no-index
option.
Example:
--no-index unused_field
The field will remain in the data but will not be indexed.
Created mapping
By default, the arlas_cli indices mapping
directly returns the mapping in the command line.
Once you're happy with the mapping, you can either store it in a file or directly push it on elasticsearch.
Store mapping in a file
To store the created mapping in a mapping.json
file, simply use >
as the end of your command.
Example:
> arlas_cli indices \
--config {local} \
mapping {path/to/data.json} \
--field-mapping {timestamps.start:date-epoch_second} \
--field-mapping {timestamps.end:date-epoch_second} \
> {path/to/mapping.json}
--push-on
To push the inferred mapping directly in an Elasticsearch index, use the --push-on
option with the target index name.
Example:
> arlas_cli indices \
--config {local} \
mapping {path/to/data.json} \
--push-on {index_name}
The index is then created and the index creation command can be skipped.
create
Before putting the data in an elasticsearch index, the index has to be initialised with the correct mapping.
The indices create
sub-function create the index from a mapping json file.
> arlas_cli indices --config local create --help
Usage: arlas_cli indices create [OPTIONS] INDEX
Create an index
Arguments:
INDEX index's name [required]
Options:
--mapping TEXT Name of the mapping within your configuration, or URL or
file path [required]
--shards INTEGER Number of shards for the index [default: 1]
--help Show this message and exit.
See full arlas_cli documentation at https://gisaia.github.io/arlas_cli/
Create an ES index with its mapping
The index name and the path to the mapping json file have to be used to create the ES index.
Warning
If the ARLAS deployment uses ARLAS IAM for authentication, the index must be associated with an organisation.
The index_name
must follow the pattern {organisation}@{data_index_name}
(e.g., gisaia.com@ais_courses
).
The indices create
sub-function create the index from a mapping json file.
Example:
> arlas_cli indices \
--config local \
create {index_name} \
--mapping {path/to/mapping.json}
Once the index is created, Elasticsearch can index data to fill that index.
data
To explore data in ARLAS, it has to be indexed in the created ES index.
The indices data
sub-function ingest the data in a given index.
> arlas_cli indices --config local data --help
Usage: arlas_cli indices data [OPTIONS] INDEX FILES...
Index data
Arguments:
INDEX index's name [required]
FILES... List of paths to the file(s) containing the data. Format: NDJSON
[required]
Options:
--bulk INTEGER Bulk size for indexing data [default: 5000]
--help Show this message and exit.
See full arlas_cli documentation at https://gisaia.github.io/arlas_cli/
Ingest data
To index data, you'll need to provide one or several NDJSON (New line delimiter JSON) file(s). Indexing uses bulks for optimal performances.
Example:
> arlas_cli indices \
--config {local} \
data {index_name} {path/to/data.json}
Tip
The data can be split in different NDJSON files in a folder:
part-00000-[...].json
part-00001-[...].json
...
files
argument can be filed with a pattern such as path/to/data.json/part-0000*.json
to reference all the different files.
Warning
If the index already contains data, the data is added to the index.
To reindex the same data, delete the index, and do not forget to recreate it with the correct mapping before ingesting the data.
--bulk
Indexing uses bulks for optimal performances.
The size of bulk can be changed with the --bulk
option
list
To list the available ES indices, simply use the indices list
sub-function. No arguments are required.
> arlas_cli indices --config local list --help
Usage: arlas_cli indices list [OPTIONS]
List indices
Options:
--help Show this message and exit.
See full arlas_cli documentation at https://gisaia.github.io/arlas_cli/
List available ES indices
It displays for each ES index its status, the number of elements it contains and the size of the index.
Example:
> arlas_cli indices --config {local} list
+--------------+--------+-------+--------+
| name | status | count | size |
+--------------+--------+-------+--------+
| .arlas | open | 4 | 11.9kb |
| index_name | open | 100 | 1mb |
+--------------+--------+-------+--------+
describe
Once the index is created, the description of the fields it contains (corresponding to the mapping) can be displayed with the indices describe
sub function:
> arlas_cli indices --config local data --help
Usage: arlas_cli indices data [OPTIONS] INDEX FILES...
Index data
Arguments:
INDEX index's name [required]
FILES... List of paths to the file(s) containing the data. Format: NDJSON
[required]
Options:
--bulk INTEGER Bulk size for indexing data [default: 5000]
--help Show this message and exit.
See full arlas_cli documentation at https://gisaia.github.io/arlas_cli/
Describe the index mapping
For a given index, the description of its fields and their type can be displayed.
For example:
> arlas_cli indices --config {local} describe {index_name}
+------------------+-----------+
| field name | type |
+------------------+-----------+
| field_keyword | keyword |
| field_point | geo_point |
| field_long | long |
| field_shape | geo_shape |
| field_double | double |
| field_date | date |
| field_text | text |
| field_object | object |
| field_boolean | boolean |
+------------------+-----------+
sample
The first rows of the data contained in an index can be displayed with the indices sample
sub function.
> arlas_cli indices --config local delete --help
Usage: arlas_cli indices delete [OPTIONS] INDEX
Delete an index
Arguments:
INDEX index's name [required]
Options:
--help Show this message and exit.
See full arlas_cli documentation at https://gisaia.github.io/arlas_cli/
Visualize few rows of your dataset
For a given index index_name
, the first rows of data can be displayed as a valid json dictionary.
--size
The number of rows to display (default 100) can be changed
Example:
> arlas_cli indices --config {local} sample {index_name} --size {10}
By default, the json representation of the data is pretty printed (clear indentation and one line per field)
--no-pretty
The pretty printing can be deactivated and data is displayed in a compact way
Example:
> arlas_cli indices --config {local} sample {index_name} --no-pretty
clone
Duplicate an index with a new index name
An ES index can be cloned on the same ES deployment with the indices clone
sub-command:
> arlas_cli indices --config local clone --help
Usage: arlas_cli indices clone [OPTIONS] SOURCE TARGET
Clone an index and set its name
Arguments:
SOURCE Source index name [required]
TARGET Target cloned index name [required]
Options:
--help Show this message and exit.
See full arlas_cli documentation at https://gisaia.github.io/arlas_cli/
Both indices co-exist with exactly the same mapping and data content.
migrate
Copy an index in another arlas configuration
An index can be copied from an ES instance to another.
Note
The two instances have to be accessible by arlas_cli
with two configurations (see Configuration guide).
The target configuration and the name of the new created index are given to the indices migrate
sub-command.
> arlas_cli indices --config local clone --help
Usage: arlas_cli indices clone [OPTIONS] SOURCE TARGET
Clone an index and set its name
Arguments:
SOURCE Source index name [required]
TARGET Target cloned index name [required]
Options:
--help Show this message and exit.
See full arlas_cli documentation at https://gisaia.github.io/arlas_cli/
Both indices co-exist with exactly the same mapping and data content.
delete
The ES index can be deleted with indices delete
sub command to free space on the ES cluster.
> arlas_cli indices --config local delete --help
Usage: arlas_cli indices delete [OPTIONS] INDEX
Delete an index
Arguments:
INDEX index's name [required]
Options:
--help Show this message and exit.
See full arlas_cli documentation at https://gisaia.github.io/arlas_cli/
Delete an ES index
To delete an ES index index_name
on local
configuration, run the following command:
> arlas_cli indices --config {local} delete {index_name}
Warning
A deleted index cannot be restored.
Note
By default, it is not allowed to delete an index for a given configuration.
To allow deleting, edit the configuration file and set allow_delete
to True
.
Good practice: Set admin confs
For a given ARLAS deployment, it is advised to set two configurations, with only the admin one that can delete an index.
For example:
+----------------------+------------------------------------------+
| name | url |
+----------------------+------------------------------------------+
| cloud.arlas.io | https://cloud.arlas.io/arlas/server |
| cloud.arlas.io-admin | https://cloud.arlas.io/arlas/server |
| local | http://localhost/arlas |
+----------------------+------------------------------------------+
Here the configuration --config cloud.arlas.io-admin
has to be used to delete any index.