Databricks Job
laktory.models.pipeline.DatabricksJobOrchestrator
¤
Bases: Job, PipelineChild
Databricks job used as an orchestrator to execute a Laktory pipeline.
Job orchestrator supports incremental workloads with Spark Structured Streaming, but it does not support continuous processing.
References
| BASE | DESCRIPTION |
|---|---|
always_running
|
(Bool) Whenever the job is always running, like a Spark Streaming application, on every update restart the current active run or start it again, if nothing it is not running. False by default. Any job runs are started with
TYPE:
|
budget_policy_id
|
The ID of the user-specified budget policy to use for this job. If not specified, a default budget policy may be applied when creating or modifying the job
TYPE:
|
continuous
|
TYPE:
|
control_run_state
|
(Bool) If true, the Databricks provider will stop and start the job as needed to ensure that the active run for the job reflects the deployed configuration. For continuous jobs, the provider respects the
TYPE:
|
dbt_task
|
TYPE:
|
deployment
|
TYPE:
|
description
|
description for this task
TYPE:
|
edit_mode
|
If
TYPE:
|
email_notifications
|
An optional block to specify a set of email addresses notified when this task begins, completes or fails. The default behavior is to not send any emails. This block is documented below
TYPE:
|
environment
|
TYPE:
|
existing_cluster_id
|
Identifier of the interactive cluster to run job on. Note: running tasks on interactive clusters may lead to increased costs!
TYPE:
|
format
|
TYPE:
|
git_source
|
Specifies the a Git repository for task source code. See git_source Configuration Block below
TYPE:
|
health
|
block described below that specifies health conditions for a given task
TYPE:
|
job_cluster
|
A list of job databricks_cluster specifications that can be shared and reused by tasks of this job. Libraries cannot be declared in a shared job cluster. You must declare dependent libraries in task settings. Multi-task syntax
TYPE:
|
library
|
(Set) An optional list of libraries to be installed on the cluster that will execute the job
TYPE:
|
max_concurrent_runs
|
(Integer) An optional maximum allowed number of concurrent runs of the job. Defaults to 1
TYPE:
|
max_retries
|
(Integer) An optional maximum number of times to retry an unsuccessful run. A run is considered to be unsuccessful if it completes with a
TYPE:
|
min_retry_interval_millis
|
(Integer) An optional minimal interval in milliseconds between the start of the failed run and the subsequent retry run. The default behavior is that unsuccessful runs are immediately retried
TYPE:
|
name
|
The name of the defined parameter. May only contain alphanumeric characters,
TYPE:
|
new_cluster
|
Block with almost the same set of parameters as for databricks_cluster resource, except following (check the REST API documentation for full list of supported parameters):
TYPE:
|
notebook_task
|
TYPE:
|
notification_settings
|
An optional block controlling the notification settings on the job level documented below
TYPE:
|
parameter
|
Specifies job parameter for the job. See parameter Configuration Block
TYPE:
|
performance_target
|
The performance mode on a serverless job. The performance target determines the level of compute performance or cost-efficiency for the run. Supported values are: *
TYPE:
|
pipeline_task
|
TYPE:
|
python_wheel_task
|
TYPE:
|
queue
|
The queue status for the job. See queue Configuration Block below
TYPE:
|
retry_on_timeout
|
(Bool) An optional policy to specify whether to retry a job when it times out. The default behavior is to not retry on timeout
TYPE:
|
run_as
|
The user or the service principal the job runs as. See run_as Configuration Block below
TYPE:
|
run_job_task
|
TYPE:
|
schedule
|
An optional periodic schedule for this job. The default behavior is that the job runs when triggered by clicking Run Now in the Jobs UI or sending an API request to runNow. See schedule Configuration Block below
TYPE:
|
spark_jar_task
|
TYPE:
|
spark_python_task
|
TYPE:
|
spark_submit_task
|
TYPE:
|
tags
|
An optional map of the tags associated with the job. See tags Configuration Map
TYPE:
|
task
|
Task to run against the
TYPE:
|
timeout_seconds
|
(Integer) An optional timeout applied to each run of this job. The default behavior is to have no timeout
TYPE:
|
timeouts
|
TYPE:
|
trigger
|
The conditions that triggers the job to start. See trigger Configuration Block below. *
TYPE:
|
usage_policy_id
|
TYPE:
|
webhook_notifications
|
(List) An optional set of system destinations (for example, webhook destinations or Slack) to be notified when runs of this task begins, completes or fails. The default behavior is to not send any notifications. This field is a block and is documented below
TYPE:
|
| LAKTORY | DESCRIPTION |
|---|---|
access_controls
|
Access controls list
TYPE:
|
config_file
|
Pipeline configuration (json) file deployed to the workspace and used by the job to read and execute the pipeline.
TYPE:
|
name_prefix
|
Prefix added to the job name
TYPE:
|
name_suffix
|
Suffix added to the job name
TYPE:
|
node_max_retries
|
An optional maximum number of times to retry an unsuccessful run for each node.
TYPE:
|
serverless_environment_version
|
Serverless environment version
TYPE:
|
type
|
Type of orchestrator
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
to_dab_resource |
Convert to a DABs Python Job resource object for use with |
| ATTRIBUTE | DESCRIPTION |
|---|---|
additional_core_resources |
TYPE:
|
additional_core_resources
property
¤
- configuration workspace file
- configuration workspace file permissions
to_dab_resource()
¤
Convert to a DABs Python Job resource object for use with
laktory.dab.build_resources.
| RETURNS | DESCRIPTION |
|---|---|
|
|
Source code in laktory/models/pipeline/orchestrators/databricksjoborchestrator.py
182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 | |