Laktory Documentation¤
Laktory is built on top of a collection of nested pydantic models.
Each model is a subclass of pydantic.BaseModel and offers a few additional methods and properties.
The serializable nature of these models makes it possible to define a lakehouse using a declarative approach.
Types¤
Models generally fall into two categories:
- Deployable: Resources defined with the intent of deploying them to a cloud or data platform provider. These resources are found under
laktory.models.resources. - Executable: Resources defined with the intent of executing them on a local workstation or a remote compute resource. A notable example is the
laktory.models.pipelinemodel.
Declaration¤
Because all models are serializable, they can be declared either directly in Python code or as external YAML or JSON files. The latter is often preferable in a DataOps approach, where the focus is on configuring models rather than writing boilerplate code.
Below is an example of how Catalog, Schema, and Table models can be declared using both approaches.
name: prod
comment: Production catalog
schemas:
- name: finance
tables:
- name: bronze
- name: silver
- name: gold
- name: engineering
tables:
- name: bronze
- name: silver
- name: gold
from laktory import models
with open("catalog.yaml", "r") as fp:
catalog = models.resources.databricks.Catalog.model_validate_yaml(fp)
from laktory.models.resources.databricks import Catalog
from laktory.models.resources.databricks import Schema
from laktory.models.resources.databricks import Table
bronze = Table(name="bronze")
silver = Table(name="bronze")
gold = Table(name="gold")
catalog = Catalog(
name="prod",
comment="Production catalog",
schemas=[
Schema(name="finance", tables=[bronze, silver, gold]),
Schema(name="engineering", tables=[bronze, silver, gold]),
]
)
Both approaches result in the exact same catalog Python object.
YAML nesting¤
Laktory supports nested YAML files, allowing you to reference another YAML file within a YAML file using custom tags.
Direct injection¤
With the !use tag, you can inject the content of a YAML file directly where it is referenced. The example above can be
rewritten as:
name: "production"
comment: Production catalog
schemas: !use schemas.yaml
- name: finance
tables: !use tables.yaml
- name: engineering
tables: !use tables.yaml
- name: bronze
- name: silver
- name: gold
List Concatenation¤
In addition to direct injection, Laktory also supports the !extend tag to concatenate two lists. For example:
Using this tag, the model
name: production
schemas:
- name: sandbox
- name: bronze
- name: silver
- name: gold
could be re-written as:
name: production
schemas:
- name: sandbox
- !extend common_schemas.yaml
- name: bronze
- name: silver
- name: gold
Dictionary merge¤
The !update tag allows merging the content of two dictionaries. For instance:
name: production
<<: !update catalog_properties.yaml
isolation_mode: OPEN
owner: laktory
Is equivalent to:
name: production
isolation_mode: OPEN
owner: laktory
Variables¤
Laktory models support variables to facilitate parameterization or to reference values unavailable at declaration time.
name: ${vars.env}
comment: Production catalog
schemas:
- name: ${vars.env}-finance
- name: ${vars.env}-engineering
variables:
env: prod
For more information, refer to the variables documentation.
Stack¤
The laktory.models.Stack model acts as a container for declaring a collection of cloud-deployable resources. It serves as the main entry point for the Laktory CLI. For
more information, refer to the stack documentation.