Skip to content

Configuration

APROC Configuration

The APROC Configuration is done in the following files:

Framework Configuration

The conf/aproc.yaml set the configuration of the APROC framework for: - framework's dependencies on celery and AIRS - the registration of the process drivers - the storages access management

Example of configuration for the framework dependencies:

# The message queue used for dispatching the tasks
celery_broker_url: pyamqp://guest:guest@127.0.0.1:5672// 
# The backend database for storing the tasks
celery_result_backend: redis://127.0.0.1:6379/0
# ARLAS Item Registration Service (AIRS) endpoint
airs_endpoint: http://127.0.0.1:8000/arlas/airs

The APROC driver registration section lists the Processes drivers and their respective configuration files, e.g.:

processes:
  -
    name: ingest
    class_name: extensions.aproc.proc.ingest.ingest_process
    configuration:
      drivers: conf/drivers.yaml

Storage access configuration

The access manager configuration specifies which storages can be accessed and how. The example below declares 3 storages: one local, one on google object storage and one on S3:

access_manager:
  tmp_dir: /tmp/
  storages:
    -
      type: file
      writable_paths:
        - /tmp
        - /outbox
      readable_paths:
        - /inputs
    -
      type: gs
      bucket: gisaia-public
    -
      type: s3
      bucket: arlas
      endpoint: "https://s3.my.cloud.provider.com"
      region: eu-1
      api_key:
        access_key: xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
        secret_key: xxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Four types of storage are available:

  • file: local file storage
  • gs: google cloud object storage
  • s3: S3 compliant object storage
  • http: HTTP/HTTPS storage

Ingest drivers

The conf/drivers.yaml configuration file references the ingest drivers. The example below register one DIMAP driver. Ingestion can be done only in a directory contained in a referenced storage (see Storage access configuration). The driver that supports the archive and that has the smallest priority number is used for ingestion.

# The directory that can be explored for ingestion
inputs_directory: gs://gisaia-public/inputs
# The maximum number of archive that can be ingested in one request (the request will create as many jobs as archives found)
max_number_of_archive_for_ingest: 1000000
# The APROC endpoint for submission of sub requests
aproc_endpoint: $APROC_ENDPOINT_FROM_APROC|http://localhost:8001
resource_id_hash_starts_at: $APROC_RESOURCE_ID_HASH_STARTS_AT|1

drivers:
  -
    name: dimap
    class_name: extensions.aproc.proc.ingest.drivers.impl.dimap
    assets_dir: /tmp/aproc/dimap
    configuration:
    priority: 1

Download drivers

The conf/download.yaml configuration file lists the drivers for exporting archives. When a download request is received, the archive is transformed and placed in the outbox directory, which can be on the local file storage or on a S3 bucket. ARLAS Server is used to check whether the user requesting the download is allowed to access the archive. If SMTP is configured, then emails are sent to the administrator and to the users requesting the downloads. Download requests are logged in an ARLAS Collection so that the administrator gets a clear view of what is downloaded.

Just like the ingest drivers, download driver registration must set a priority. The driver that supports the item download request and that has the smallest priority is used for the download.

Enrich drivers

The conf/enrich.yaml configuring file registers the enrichment drivers. An enrichment driver adds an asset to an existing item.

dc3build drivers

  • conf/dc3build.yaml configuration file references the drivers for building a datacube based on a list of items.

Reference documentation

APROC Configuration reference documentation

ProcessSettings pydantic-model

Bases: BaseModel

Fields:

class_name pydantic-field

Name of the process class

configuration pydantic-field

Configuration that is specific the process (dictionary key/value)

name pydantic-field

Name of the process

Settings pydantic-model

Bases: BaseModel

Fields:

access_manager pydantic-field

Configuration for the AccessManager

airs_endpoint pydantic-field

ARLAS Item Registration Service endpoint

celery_broker_url pydantic-field

Celery's broker url of the form of transport://userid:password@hostname:port/virtual_host

celery_result_backend pydantic-field

Celery's backend used to store task results

processes pydantic-field

List of APROC processes

AccessManager Configuration reference documentation

AccessManagerSettings pydantic-model

Bases: BaseModel

Fields:

storages pydantic-field

List of configurations for the available storages

tmp_dir pydantic-field

Temporary directory in which to write files that will be deleted

FileStorageConfiguration pydantic-model

Bases: StorageConfiguration

Fields:

is_local = True pydantic-field

Whether the storage is local or remote

readable_paths = [] pydantic-field

List of paths from which files can be read

type = 'file' pydantic-field

Indicates the storage type, fixed to 'file'

writable_paths = [] pydantic-field

List of paths where files can be written

GoogleStorageApiKey pydantic-model

Bases: BaseModel

Fields:

auth_provider_x509_cert_url = GoogleStorageConstants.AUTH_PROVIDER_CERT_URL.value pydantic-field

URL for the provider's X.509 certificate

auth_uri = GoogleStorageConstants.AUTH_URI.value pydantic-field

OAuth2 auth endpoint URI

client_email pydantic-field

Service account email address

client_id = None pydantic-field

Optional client ID of the service account

private_key pydantic-field

The private key content in PEM format

private_key_id pydantic-field

ID of the private key used for authentication

project_id pydantic-field

Google Cloud project identifier

token_uri = GoogleStorageConstants.TOKEN_URI.value pydantic-field

OAuth2 token endpoint URI

type = 'service_account' pydantic-field

Must be 'service_account'.

universe_domain = GoogleStorageConstants.UNIVERSE_DOMAIN.value pydantic-field

Domain of the target universe (typically 'googleapis.com')

GoogleStorageConfiguration pydantic-model

Bases: StorageConfiguration

Fields:

api_key = None pydantic-field

API key for storage authentication

bucket pydantic-field

Name of the Google Cloud Storage bucket

is_local = False pydantic-field

Whether the storage is local or remote

type = 'gs' pydantic-field

Indicates the storage type, fixed to 'gs'

HttpStorageConfiguration pydantic-model

Bases: StorageConfiguration

Fields:

domain pydantic-field

Domain used for HTTP storage endpoint, e.g. 'example.com'

force_download = False pydantic-field

If true, always download the file instead of caching.

headers = {} pydantic-field

Additional HTTP headers to include in each request

is_local = False pydantic-field

Whether the storage is local or remote

type = 'http' pydantic-field

Indicates the storage type, fixed to 'http'

S3ApiKey pydantic-model

Bases: BaseModel

Fields:

access_key pydantic-field

Access api key for S3 storage authentication

secret_key pydantic-field

Secret api key for S3 storage authentication

S3StorageConfiguration pydantic-model

Bases: StorageConfiguration

Fields:

api_key = None pydantic-field

API key for storage authentication

bucket pydantic-field

Name of the S3 bucket

endpoint pydantic-field

Endpoint to access S3 storage

is_local = False pydantic-field

Whether the storage is local or remote

max_objects = 1000 pydantic-field

Maximum number of objects to fetch when listing elements in a directory

region = 'auto' pydantic-field

Region of the bucket

type = 's3' pydantic-field

Indicates the storage type, fixed to 's3'

StorageConfiguration pydantic-model

Bases: BaseModel

Fields:

is_local pydantic-field

Whether the storage is local or remote

type pydantic-field

Type of the storage used

APROC Ingestion drivers reference documentation

Settings pydantic-model

Bases: BaseModel

Fields:

alternative_asset_href_field = None pydantic-field

By default, data are fetched from the href of the asset named "data". Instead, data can be retrieved from an item's property.

aproc_endpoint pydantic-field

APROC endpoint for submitting sub tasks

drivers pydantic-field

Configuration of the ingestion drivers.

inputs_directory pydantic-field

Location of the archives tree that can be explored and ingested.

max_number_of_archive_for_ingest = 1000000 pydantic-field

Maximum number of archives to ingest when ingesting a directory

resource_id_hash_starts_at = 1 pydantic-field

For some drivers, the resource id is the hash of the url path. Prefix can be ignored with this property.

APROC Download drivers reference documentation

Index pydantic-model

Bases: BaseModel

Fields:

endpoint_url pydantic-field

Elasticsearch URL for indexing download requests

index_name pydantic-field

Elasticsearch index name for indexing download requests

login = '' pydantic-field

Elasticsearch login

pwd = '' pydantic-field

Elasticsearch password

SMPTConfiguration pydantic-model

Bases: BaseModel

Fields:

from_addr pydantic-field

Emails address of the system that sends the emails

host pydantic-field

SMTP host

login pydantic-field

SMTP user login

password pydantic-field

SMTP user password

port pydantic-field

SMTP port

Settings pydantic-model

Bases: BaseModel

Fields:

ARLAS URL Search (ex http://arlas-server:9999/arlas/explore/{collection}/_search?f=id:eq:{item})

arlaseo_mapping_url pydantic-field

Location of the ARLAS EO index mapping for STAC Items

clean_outbox_directory = True pydantic-field

Clean outbox directory once files copied on S3

download_mapping_url pydantic-field

Location of the download requests mapping

drivers pydantic-field

Configuration of the download drivers

email_content_admin = '' pydantic-field

Content of the email to be sent to the admin

email_content_error_download = '' pydantic-field

Content of the email to be sent to the user

email_content_user = '' pydantic-field

Content of the email to be sent to the user

email_path_prefix_add = '' pydantic-field

Prefix to add to the download paths presented to the users/admin

email_path_to_windows = False pydantic-field

Whether to change or not the path separators for windows

email_request_content_admin = '' pydantic-field

Content of the email to be sent to the admins when download request submitted

email_request_content_user = '' pydantic-field

Content of the email to be sent to the user when download request submitted

email_request_subject_admin = '' pydantic-field

Content of the subject to be sent to the admins when download request submitted

email_request_subject_user = '' pydantic-field

Content of the subject to be sent to the user when download request submitted

email_subject_admin = '' pydantic-field

Subject of the email to be sent to the admin

email_subject_error_download = '' pydantic-field

Subject of the email to be sent to the user

email_subject_user = '' pydantic-field

Subject of the email to be sent to the user

index_for_download pydantic-field

Configuration of the elasticsearch index for reporting downloads

notification_admin_emails = '' pydantic-field

List of admin emails for receiving download notifications, comma seperated.

outbox_directory pydantic-field

Directory where the downloads will be placed. Must be configured, even so you enabled outbox_s3

outbox_s3 pydantic-field

S3 bucket where the downloads will be placed. If configured, outbox_directory will be cleaned

smtp = None pydantic-field

Emails address of the system that sends the emails

APROC enrich drivers reference documentation

Settings pydantic-model

Bases: BaseModel

Fields:

  • drivers (list[DriverConfiguration])

drivers pydantic-field

List of driver configuration for item enrichment.

APROC datacube build drivers reference documentation

Settings pydantic-model

Bases: BaseModel

Fields:

ARLAS URL Search (ex http://arlas-server:9999/arlas/explore/{collection}/_search?f=id:eq:{item})

drivers pydantic-field

Configuration of the dc3build drivers