Declarative Automation Bundles

API Documentation

laktory.dab.build_resources

Declarative Automation Bundles (DAB) is Databricks' native infrastructure-as-code solution. Laktory integrates with it through a Python resource hook that discovers pipeline YAML files, generates the required configuration files, and returns Lakeflow Job and Declarative Pipeline resources, bypassing Laktory's own deployment mechanism.

If your team already uses the Databricks CLI and databricks.yml to manage workspace resources, DABs integration gets you started quickly.

When to use Laktory alongside DABs¤

DAB is excellent at deploying Databricks Jobs and DLT Pipelines, but it has two gaps that Laktory fills:

1. Pipeline definitions and data transformations DAB manages the deployment of a pipeline (compute, scheduling, target catalog/schema) but has no concept of the transformations that run inside it. Laktory adds a declarative, code-first layer for defining pipeline logic from DataFrame and SQL transformations to source/sink wiring. All testable locally.

2. Resources outside DAB's scope DAB covers some workspace-level resources (jobs, clusters, dashboards, schemas, etc.), but it does not support others such as: users, groups, external locations, credentials, metastore, etc. Laktory handles all of these through a Terraform-backed stack, giving your data team full ownership of the platform without a separate infrastructure team.

A common adoption path: start by adding Laktory extension for pipeline definitions alongside your existing DAB setup, then build a Laktory stack to manage other missing resources.

Deployment strategies¤

Laktory supports two approaches:

1. DABs + Laktory stack (hybrid)

Use DABs to manage workspace-level resources (jobs, clusters, SQL warehouses, dashboards, etc. - see the full DABs resource list) and the Laktory DABs integration to deploy your data pipelines. Manage resources outside DABs' scope - in particular account-level resources like users, groups, metastore configuration, and the Unity Catalog hierarchy - separately with a Laktory stack backed by Terraform.

2. Laktory stack only

Use a Laktory stack backed by Terraform for all resources, including workspace resources, pipelines, and account-level infrastructure.

The two deployments in strategy 1 are independent - run them in whichever order suits your CI/CD workflow. Unity Catalog infrastructure is typically deployed first so catalogs and schemas exist before pipelines run.

How It Works¤

When databricks bundle deploy is executed, the Databricks CLI calls the registered Python hook laktory.dab:build_resources. Laktory then:

Scans the configured pipeline directory for *.yaml / *.yml files
Loads each pipeline and injects bundle variables
Writes a JSON config file per pipeline to laktory/.build/pipelines/ for the pipeline executor to read at runtime
Returns a DABs Resources object containing one Job or Declarative Pipeline resource per pipeline

DABs then syncs the laktory/.build/ directory (including the generated config files and the LDP notebook) to the Databricks workspace, and deploys the Job/Pipeline resources.

Setup `databricks.yml`¤

Add the following to your bundle configuration:

databricks.yml

variables:
  dab_workspace_root:        # required: workspace path where Laktory files will be synced
    default: ${workspace.root_path}
  laktory_pipelines_dir:     # local directory containing pipeline YAML files
    default: ./laktory/pipelines

sync:
  paths:
    - ./laktory            # sync pipeline YAMLs and the generated .build/ directory
  include:
    - ./laktory/.build/**  # force-include even if laktory/.build/ is in .gitignore

python:
  venv_path: .venv
  resources:
    - 'laktory.dab:build_resources'

The ./laktory/.build/ directory contains generated files (pipeline config JSON, LDP notebook) and is typically added to .gitignore. The sync.include directive tells DABs to sync it to the workspace regardless.

The dab_workspace_root variable must be set to ${workspace.root_path} so Laktory can compute the correct workspace path for the config files.

Pipeline Discovery¤

Laktory scans {laktory_pipelines_dir} for *.yaml and *.yml files. Each file is expected to contain a single pipeline definition. Pipelines without an orchestrator are skipped silently.

Multiple directories are supported via a comma-separated list:

databricks.yml

variables:
  laktory_pipelines_dir:
    default: ./laktory/pipelines/

Orchestrators¤

Each pipeline declares its orchestrator type, which determines what DABs resource is created.

Lakeflow Declarative Pipeline (LDP)¤

Setting type: LAKEFLOW_DECLARATIVE_PIPELINE generates a databricks.bundles.pipelines.Pipeline resource. All Laktory LDP pipelines share a single entry-point notebook (laktory_ldp.py) that is automatically copied to the build directory and synced to the workspace.

laktory/pipelines/pl-stock-prices.yml

name: pl-stock-prices

orchestrator:
  type: LAKEFLOW_DECLARATIVE_PIPELINE
  serverless: true
  catalog: ${var.catalog}
  schema: market

nodes:
  - name: brz_stock_prices
    source:
      path: /Volumes/${var.catalog}/sources/landing/stock_prices/
      format: JSON
    sinks:
      - table_name: brz_stock_prices

  - name: slv_stock_prices
    source:
      node_name: brz_stock_prices
    sinks:
      - table_name: slv_stock_prices
    transformer:
      nodes:
        - expr: SELECT symbol, close, created_at FROM {df}
          func_type: SQL

Limitations of LDP orchestrators:

Pipeline nodes with views are not supported

Lakeflow Job¤

Setting type: LAKEFLOW_JOB generates a databricks.bundles.jobs.Job resource. Each pipeline node becomes a separate job task, and task dependencies mirror the pipeline DAG.

laktory/pipelines/pl-taxi-trips.yml

name: pl-taxi-trips

orchestrator:
  type: LAKEFLOW_JOB
  serverless_environment_version: "2"
  name: pl-taxi-trips

nodes:
  - name: brz_taxi_trips
    source:
      table_name: samples.nyctaxi.trips
    sinks:
      - catalog_name: ${var.catalog}
        schema_name: taxis
        table_name: brz_taxi_trips

  - name: slv_taxi_trips
    source:
      node_name: brz_taxi_trips
    sinks:
      - catalog_name: ${var.catalog}
        schema_name: taxis
        table_name: slv_taxi_trips
    transformer:
      nodes:
        - expr: SELECT *, trip_distance * 1.60934 AS trip_distance_km FROM {df}
          func_type: SQL

The Job orchestrator supports views and both serverless and classic job clusters. Set serverless_environment_version for serverless execution, or define job_clusters for classic compute.

Variable Injection¤

Bundle variables declared in databricks.yml are automatically injected into pipeline models at load time. Both DABs-style (${var.name}) and Laktory-style (${vars.name}) syntax are supported. Pipeline-level variables take precedence over bundle variables when names conflict.

databricks.yml

variables:
  catalog:
    description: The Unity Catalog to deploy into

targets:
  dev:
    variables:
      catalog: dev
  prod:
    variables:
      catalog: prod

laktory/pipelines/pl-stock-prices.yml

name: pl-stock-prices

orchestrator:
  type: LAKEFLOW_DECLARATIVE_PIPELINE
  catalog: ${var.catalog}   # resolved from bundle variable at deploy time
  schema: market

Settings¤

Two settings control where Laktory writes and reads files during bundle resolution:

Setting	Environment variable	Description
`build_root`	`LAKTORY_BUILD_ROOT`	Local directory for generated config JSON files and the LDP notebook
`workspace_root`	`LAKTORY_WORKSPACE_ROOT`	Workspace path where Laktory files are synced by DABs

Both are auto-configured from the bundle context when left at their defaults. The build_root is set to {bundle_root}/laktory/.build/ and workspace_root is derived as {dab_workspace_root}/files/laktory/.build/. Explicit overrides via environment variables take priority.

Declarative Automation Bundles

When to use Laktory alongside DABs¤

Deployment strategies¤

How It Works¤

Setup databricks.yml¤

Pipeline Discovery¤

Orchestrators¤

Lakeflow Declarative Pipeline (LDP)¤

Lakeflow Job¤

Variable Injection¤

Settings¤

Setup `databricks.yml`¤