Configuration
APROC Configuration
The APROC Configuration is done in the following files:
conf/aproc.yaml
for configuring the APROC framework, the drivers registrations and the storage access manager.conf/drivers.yaml
for configuring the ingestion processconf/download.yaml
for configuring the download processconf/enrich.yaml
for configuring the enrichment processconf/dc3build.yaml
for configuring the datacube building process
Framework Configuration
The conf/aproc.yaml
set the configuration of the APROC framework for:
- framework's dependencies on celery and AIRS
- the registration of the process drivers
- the storages access management
Example of configuration for the framework dependencies:
# The message queue used for dispatching the tasks
celery_broker_url: pyamqp://guest:guest@127.0.0.1:5672//
# The backend database for storing the tasks
celery_result_backend: redis://127.0.0.1:6379/0
# ARLAS Item Registration Service (AIRS) endpoint
airs_endpoint: http://127.0.0.1:8000/arlas/airs
The APROC driver registration section lists the Processes drivers and their respective configuration files, e.g.:
processes:
-
name: ingest
class_name: extensions.aproc.proc.ingest.ingest_process
configuration:
drivers: conf/drivers.yaml
Storage access configuration
The access manager configuration specifies which storages can be accessed and how. The example below declares 3 storages: one local, one on google object storage and one on S3:
access_manager:
tmp_dir: /tmp/
storages:
-
type: file
writable_paths:
- /tmp
- /outbox
readable_paths:
- /inputs
-
type: gs
bucket: gisaia-public
-
type: s3
bucket: arlas
endpoint: "https://s3.my.cloud.provider.com"
region: eu-1
api_key:
access_key: xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
secret_key: xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Four types of storage are available:
file
: local file storagegs
: google cloud object storages3
: S3 compliant object storagehttp
: HTTP/HTTPS storage
Ingest drivers
The conf/drivers.yaml
configuration file references the ingest drivers. The example below register one DIMAP driver. Ingestion can be done only in a directory contained in a referenced storage (see Storage access configuration). The driver that supports the archive and that has the smallest priority number is used for ingestion.
# The directory that can be explored for ingestion
inputs_directory: gs://gisaia-public/inputs
# The maximum number of archive that can be ingested in one request (the request will create as many jobs as archives found)
max_number_of_archive_for_ingest: 1000000
# The APROC endpoint for submission of sub requests
aproc_endpoint: $APROC_ENDPOINT_FROM_APROC|http://localhost:8001
resource_id_hash_starts_at: $APROC_RESOURCE_ID_HASH_STARTS_AT|1
drivers:
-
name: dimap
class_name: extensions.aproc.proc.ingest.drivers.impl.dimap
assets_dir: /tmp/aproc/dimap
configuration:
priority: 1
Download drivers
The conf/download.yaml
configuration file lists the drivers for exporting archives. When a download request is received, the archive is transformed and placed in the outbox directory, which can be on the local file storage or on a S3 bucket. ARLAS Server is used to check whether the user requesting the download is allowed to access the archive. If SMTP is configured, then emails are sent to the administrator and to the users requesting the downloads. Download requests are logged in an ARLAS Collection so that the administrator gets a clear view of what is downloaded.
Just like the ingest drivers, download driver registration must set a priority. The driver that supports the item download request and that has the smallest priority is used for the download.
Enrich drivers
The conf/enrich.yaml
configuring file registers the enrichment drivers. An enrichment driver adds an asset to an existing item.
dc3build drivers
conf/dc3build.yaml
configuration file references the drivers for building a datacube based on a list of items.
Reference documentation
APROC Configuration reference documentation
ProcessSettings
pydantic-model
Bases: BaseModel
Fields:
-
name
(str | None
) -
class_name
(str | None
) -
configuration
(dict | None
)
class_name
pydantic-field
Name of the process class
configuration
pydantic-field
Configuration that is specific the process (dictionary key/value)
name
pydantic-field
Name of the process
Settings
pydantic-model
Bases: BaseModel
Fields:
-
celery_broker_url
(str | None
) -
celery_result_backend
(str | None
) -
processes
(list[ProcessSettings]
) -
airs_endpoint
(str | None
) -
access_manager
(AccessManagerSettings
)
access_manager
pydantic-field
Configuration for the AccessManager
airs_endpoint
pydantic-field
ARLAS Item Registration Service endpoint
celery_broker_url
pydantic-field
Celery's broker url of the form of transport://userid:password@hostname:port/virtual_host
celery_result_backend
pydantic-field
Celery's backend used to store task results
processes
pydantic-field
List of APROC processes
AccessManager Configuration reference documentation
AccessManagerSettings
pydantic-model
FileStorageConfiguration
pydantic-model
Bases: StorageConfiguration
Fields:
-
type
(Literal['file']
) -
is_local
(Literal[True]
) -
writable_paths
(list[str]
) -
readable_paths
(list[str]
)
is_local = True
pydantic-field
Whether the storage is local or remote
readable_paths = []
pydantic-field
List of paths from which files can be read
type = 'file'
pydantic-field
Indicates the storage type, fixed to 'file'
writable_paths = []
pydantic-field
List of paths where files can be written
GoogleStorageApiKey
pydantic-model
Bases: BaseModel
Fields:
-
type
(Literal['service_account']
) -
project_id
(str
) -
private_key_id
(str
) -
private_key
(str
) -
client_email
(str
) -
client_id
(str | None
) -
auth_uri
(Literal[AUTH_URI]
) -
token_uri
(Literal[TOKEN_URI]
) -
auth_provider_x509_cert_url
(Literal[AUTH_PROVIDER_CERT_URL]
) -
universe_domain
(Literal[UNIVERSE_DOMAIN]
)
auth_provider_x509_cert_url = GoogleStorageConstants.AUTH_PROVIDER_CERT_URL.value
pydantic-field
URL for the provider's X.509 certificate
auth_uri = GoogleStorageConstants.AUTH_URI.value
pydantic-field
OAuth2 auth endpoint URI
client_email
pydantic-field
Service account email address
client_id = None
pydantic-field
Optional client ID of the service account
private_key
pydantic-field
The private key content in PEM format
private_key_id
pydantic-field
ID of the private key used for authentication
project_id
pydantic-field
Google Cloud project identifier
token_uri = GoogleStorageConstants.TOKEN_URI.value
pydantic-field
OAuth2 token endpoint URI
type = 'service_account'
pydantic-field
Must be 'service_account'.
universe_domain = GoogleStorageConstants.UNIVERSE_DOMAIN.value
pydantic-field
Domain of the target universe (typically 'googleapis.com')
GoogleStorageConfiguration
pydantic-model
Bases: StorageConfiguration
Fields:
-
type
(Literal['gs']
) -
is_local
(Literal[False]
) -
bucket
(str
) -
api_key
(GoogleStorageApiKey | None
)
api_key = None
pydantic-field
API key for storage authentication
bucket
pydantic-field
Name of the Google Cloud Storage bucket
is_local = False
pydantic-field
Whether the storage is local or remote
type = 'gs'
pydantic-field
Indicates the storage type, fixed to 'gs'
HttpStorageConfiguration
pydantic-model
Bases: StorageConfiguration
Fields:
-
type
(Literal['http']
) -
is_local
(Literal[False]
) -
headers
(dict[str, str]
) -
domain
(str
) -
force_download
(bool
)
domain
pydantic-field
Domain used for HTTP storage endpoint, e.g. 'example.com'
force_download = False
pydantic-field
If true, always download the file instead of caching.
headers = {}
pydantic-field
Additional HTTP headers to include in each request
is_local = False
pydantic-field
Whether the storage is local or remote
type = 'http'
pydantic-field
Indicates the storage type, fixed to 'http'
S3ApiKey
pydantic-model
Bases: BaseModel
Fields:
-
access_key
(str
) -
secret_key
(str
)
access_key
pydantic-field
Access api key for S3 storage authentication
secret_key
pydantic-field
Secret api key for S3 storage authentication
S3StorageConfiguration
pydantic-model
Bases: StorageConfiguration
Fields:
-
type
(Literal['s3']
) -
is_local
(Literal[False]
) -
bucket
(str
) -
region
(str
) -
endpoint
(str
) -
api_key
(S3ApiKey | None
) -
max_objects
(int
)
api_key = None
pydantic-field
API key for storage authentication
bucket
pydantic-field
Name of the S3 bucket
endpoint
pydantic-field
Endpoint to access S3 storage
is_local = False
pydantic-field
Whether the storage is local or remote
max_objects = 1000
pydantic-field
Maximum number of objects to fetch when listing elements in a directory
region = 'auto'
pydantic-field
Region of the bucket
type = 's3'
pydantic-field
Indicates the storage type, fixed to 's3'
APROC Ingestion drivers reference documentation
Settings
pydantic-model
Bases: BaseModel
Fields:
-
drivers
(list[DriverConfiguration]
) -
inputs_directory
(str
) -
max_number_of_archive_for_ingest
(int
) -
aproc_endpoint
(str | None
) -
resource_id_hash_starts_at
(int
) -
alternative_asset_href_field
(str | None
)
alternative_asset_href_field = None
pydantic-field
By default, data are fetched from the href of the asset named "data". Instead, data can be retrieved from an item's property.
aproc_endpoint
pydantic-field
APROC endpoint for submitting sub tasks
drivers
pydantic-field
Configuration of the ingestion drivers.
inputs_directory
pydantic-field
Location of the archives tree that can be explored and ingested.
max_number_of_archive_for_ingest = 1000000
pydantic-field
Maximum number of archives to ingest when ingesting a directory
resource_id_hash_starts_at = 1
pydantic-field
For some drivers, the resource id is the hash of the url path. Prefix can be ignored with this property.
APROC Download drivers reference documentation
Index
pydantic-model
Bases: BaseModel
Fields:
-
index_name
(str
) -
endpoint_url
(str
) -
login
(str
) -
pwd
(str
)
endpoint_url
pydantic-field
Elasticsearch URL for indexing download requests
index_name
pydantic-field
Elasticsearch index name for indexing download requests
login = ''
pydantic-field
Elasticsearch login
pwd = ''
pydantic-field
Elasticsearch password
SMPTConfiguration
pydantic-model
Bases: BaseModel
Fields:
from_addr
pydantic-field
Emails address of the system that sends the emails
host
pydantic-field
SMTP host
login
pydantic-field
SMTP user login
password
pydantic-field
SMTP user password
port
pydantic-field
SMTP port
Settings
pydantic-model
Bases: BaseModel
Fields:
-
arlas_url_search
(str
) -
drivers
(list[DriverConfiguration]
) -
outbox_directory
(str
) -
outbox_s3
(S3 | None
) -
clean_outbox_directory
(bool
) -
notification_admin_emails
(str
) -
smtp
(SMPTConfiguration | None
) -
email_content_user
(str
) -
email_content_error_download
(str
) -
email_content_admin
(str
) -
email_subject_user
(str
) -
email_subject_error_download
(str
) -
email_subject_admin
(str
) -
email_path_prefix_add
(str
) -
email_path_to_windows
(bool
) -
email_request_subject_user
(str
) -
email_request_content_user
(str
) -
email_request_subject_admin
(str
) -
email_request_content_admin
(str
) -
index_for_download
(Index
) -
arlaseo_mapping_url
(str
) -
download_mapping_url
(str
)
arlas_url_search
pydantic-field
ARLAS URL Search (ex http://arlas-server:9999/arlas/explore/{collection}/_search?f=id:eq:{item})
arlaseo_mapping_url
pydantic-field
Location of the ARLAS EO index mapping for STAC Items
clean_outbox_directory = True
pydantic-field
Clean outbox directory once files copied on S3
download_mapping_url
pydantic-field
Location of the download requests mapping
drivers
pydantic-field
Configuration of the download drivers
email_content_admin = ''
pydantic-field
Content of the email to be sent to the admin
email_content_error_download = ''
pydantic-field
Content of the email to be sent to the user
email_content_user = ''
pydantic-field
Content of the email to be sent to the user
email_path_prefix_add = ''
pydantic-field
Prefix to add to the download paths presented to the users/admin
email_path_to_windows = False
pydantic-field
Whether to change or not the path separators for windows
email_request_content_admin = ''
pydantic-field
Content of the email to be sent to the admins when download request submitted
email_request_content_user = ''
pydantic-field
Content of the email to be sent to the user when download request submitted
email_request_subject_admin = ''
pydantic-field
Content of the subject to be sent to the admins when download request submitted
email_request_subject_user = ''
pydantic-field
Content of the subject to be sent to the user when download request submitted
email_subject_admin = ''
pydantic-field
Subject of the email to be sent to the admin
email_subject_error_download = ''
pydantic-field
Subject of the email to be sent to the user
email_subject_user = ''
pydantic-field
Subject of the email to be sent to the user
index_for_download
pydantic-field
Configuration of the elasticsearch index for reporting downloads
notification_admin_emails = ''
pydantic-field
List of admin emails for receiving download notifications, comma seperated.
outbox_directory
pydantic-field
Directory where the downloads will be placed. Must be configured, even so you enabled outbox_s3
outbox_s3
pydantic-field
S3 bucket where the downloads will be placed. If configured, outbox_directory will be cleaned
smtp = None
pydantic-field
Emails address of the system that sends the emails
APROC enrich drivers reference documentation
Settings
pydantic-model
Bases: BaseModel
Fields:
-
drivers
(list[DriverConfiguration]
)
drivers
pydantic-field
List of driver configuration for item enrichment.
APROC datacube build drivers reference documentation
Settings
pydantic-model
Bases: BaseModel
Fields:
-
arlas_url_search
(str
) -
drivers
(list[DriverConfiguration]
)
arlas_url_search
pydantic-field
ARLAS URL Search (ex http://arlas-server:9999/arlas/explore/{collection}/_search?f=id:eq:{item})
drivers
pydantic-field
Configuration of the dc3build drivers