Build with AI
The fastest way to work with Laktory is through an AI coding agent. After running laktory setup-agent, the agent has live access to model schemas and can generate, validate, and insert correct YAML directly into your project. The examples below illustrate the kinds of tasks you can describe in plain language.
Data Pipeline¤
Create a bronze pipeline node called
brz_ordersthat reads JSONL files fromdbfs:/landing/orders/and writes to a Unity Catalog tabledev.raw.ordersin OVERWRITE mode.Add a silver node
slv_ordersthat reads frombrz_orders(static), selectsorder_id,customer_id, andCAST(created_at AS TIMESTAMP), drops duplicates on(order_id, created_at), and writes todev.silver.orders.Add a gold node
gld_orders_by_daythat aggregatesslv_ordersbyDATE(created_at)andcustomer_id, computingCOUNT(order_id) AS order_countandSUM(amount) AS total_amount.Change the source of
slv_ordersto streaming (as_stream: true) so the node only processes new records on each run.Add a custom transformer step in
slv_ordersthat callslake.with_last_modifiedfrom thelakewheel package. Wire up the dependency and import.
Orchestration¤
Configure the pipeline to run on a Lakeflow Job with serverless environment version 3, scheduled every day at 6am UTC.
Switch from serverless to a dedicated cluster with 2–8 workers on
Standard_DS3_v2, Spark 16.3, usingUSER_ISOLATIONsecurity mode.Add email notifications to the pipeline job that alert
data-team@example.comon failure andops@example.comon success.
Resources¤
Create a Unity Catalog named
devwithOPENisolation, grantUSE_CATALOGandUSE_SCHEMAtoaccount users, and add two schemas:financeandsandbox.Define a group
data-engineerswith workspace USER permission, and a userjohn.doe@example.comwho belongs to that group.Create a secret scope named
integrationswith a secretapi-tokenand READ permission fordata-engineers.Define a SQL warehouse named
analytics— 2X-Small, serverless, auto-stop after 10 minutes, accessible by all users.Create a Databricks job
job-ingestwith two tasks:ingest(notebook/jobs/ingest.py) andtransform(notebook/jobs/transform.py) that depends oningest. Both run on a shared job cluster with 2 workers.