Python Package
API Documentation
When developing production-grade code, it is generally recommended to use Python packages. This promotes reusability, modularity, clean deployment, and better unit testing.
Laktory supports this approach through resources such as
models.resources.databricks.PyhonPackage, which automates the building and deployment
of wheel files from local Python package source code.
Consider the following directory structure:
.
├── lake
│ └── lake
│ ├── __init__.py
│ ├── _version.py
│ ├── dataframe_ext.py
│ └── pyproject.toml
│ └── README.md
├── notebooks
│ └── jobs
│ ├── job_hello.py
├── requirements.txt
├── resources
│ └── pl-stocks-job.yaml
│ └── pythonpackages.yaml
├── stack.yaml
lake,
defined by a pyproject.toml file.
Inside the lake package, a custom Narwhals extension is
declared for data transformations:
from datetime import datetime
import narwhals as nw
import laktory as lk
@lk.api.register_anyframe_namespace("lake")
class LakeNamespace:
def __init__(self, _df):
self._df = _df
def with_last_modified(self):
return self._df.with_columns(last_modified=nw.lit(datetime.now()))
The stack file declares:
- a
PythonPackagedatabricks resource - a variable
wheel_filepaththat defines the workspace path to which the wheel file will be deployed
name: workflows
resources:
databricks_pythonpackages: !use resources/pythonpackages.yaml
pipelines:
pl-stocks-job: !use resources/pl-stocks-job.yaml
variables:
wheel_filepath: /Workspace${vars.workspace_laktory_root}wheels/lake-0.0.1-py3-none-any.whl
environments:
dev:
variables:
env: dev
is_dev: true
PythonPackage resource declares:
- the name of the package
- the path to the pyproject.toml file
- the target directory in the Databricks workspace under the Laktory root
workspace-file-lake-package:
package_name: lake
config_filepath: ./lake/pyproject.toml
dirpath: wheels/
access_controls:
- group_name: account users
permission_level: CAN_READ
Finally, the pipeline references the wheel file as a dependency and uses the lake
namespace to apply a custom transformation. The dependencies section ensures the
package is installed at runtime (in Databricks Jobs or Pipelines) and imported during
execution.
name: pl-stocks-job
orchestrator:
type: DATABRICKS_JOB
dependencies:
- laktory==<laktory_version>
- ${vars.wheel_filepath}
nodes:
- name: slv_stock_prices
source:
table_name: brz_stock_prices
sinks:
- table_name: slv_stock_prices_job
mode: OVERWRITE
transformer:
nodes:
- func_name: lake.with_last_modified
When you run laktory deploy, the Python package is:
- built into a wheel file using the configuration in pyproject.toml
- deployed as a Databricks workspace file to ${vars.workspace_laktory_root}/wheels/
This workflow enables clean, repeatable deployment of custom transformation logic alongside your data pipelines.