Databricks Pipeline

`laktory.models.pipeline.orchestrators.databrickspipelineorchestrator.DatabricksPipelineOrchestrator` ¤

Bases: Pipeline, PipelineChild

Databricks Pipeline used as an orchestrator to execute a Laktory pipeline.

DLT orchestrator does not support pipeline nodes with views (as opposed to materialized tables). Also, it does not support writing to multiple schemas within the same pipeline.

Selecting this orchestrator requires to add the supporting notebook to the stack.

References

PARAMETER	DESCRIPTION
`dataframe_backend_`	Type of DataFrame backend TYPE: `DataFrameBackends \| VariableType` DEFAULT: `None`
`dataframe_api_`	DataFrame API to use in DataFrame Transformer nodes. Either 'NATIVE' (backend-specific) or 'NARWHALS' (backend-agnostic). TYPE: `Literal['NARWHALS', 'NATIVE'] \| VariableType` DEFAULT: `None`
`resource_name_`	Name of the resource in the context of infrastructure as code. If None, `default_resource_name` will be used instead. TYPE: `str \| VariableType` DEFAULT: `None`
`options`	Resources options specifications TYPE: `ResourceOptions \| VariableType` DEFAULT: `ResourceOptions(variables={}, is_enabled=True, depends_on=[], provider=None, ignore_changes=None, aliases=None, delete_before_replace=True, import_=None, parent=None, replace_on_changes=None, moved_from=None)`
`lookup_existing`	Lookup resource instead of creating a new one. TYPE: `ResourceLookup \| VariableType` DEFAULT: `None`
`variables`	Dict of variables to be injected in the model at runtime TYPE: `dict[str, Any]` DEFAULT: `{}`
`access_controls`	Pipeline access controls TYPE: `list[Union[AccessControl, VariableType]] \| VariableType` DEFAULT: `[]`
`allow_duplicate_names`	If `False`, deployment will fail if name conflicts with that of another pipeline. TYPE: `bool \| VariableType` DEFAULT: `None`
`budget_policy_id`	optional string specifying ID of the budget policy for this DLT pipeline. TYPE: `str \| VariableType` DEFAULT: `None`
`catalog`	Name of the unity catalog storing the pipeline tables TYPE: `str \| None \| VariableType` DEFAULT: `None`
`cause`	TYPE: `str \| VariableType` DEFAULT: `None`
`channel`	Name of the release channel for Spark version used by DLT pipeline. TYPE: `Literal['CURRENT', 'PREVIEW'] \| VariableType` DEFAULT: `'PREVIEW'`
`cluster_id`	TYPE: `str \| VariableType` DEFAULT: `None`
`clusters`	Clusters to run the pipeline. If none is specified, pipelines will automatically select a default cluster configuration for the pipeline. TYPE: `list[Union[PipelineCluster, VariableType]] \| VariableType` DEFAULT: `[]`
`creator_user_name`	TYPE: `str \| VariableType` DEFAULT: `None`
`configuration`	List of values to apply to the entire pipeline. Elements must be formatted as key:value pairs TYPE: `dict[Union[str, VariableType], Union[str, VariableType]] \| VariableType` DEFAULT: `{}`
`continuous`	If `True`, the pipeline is run continuously. TYPE: `bool \| VariableType` DEFAULT: `None`
`deployment`	Deployment type of this pipeline. TYPE: `PipelineDeployment \| VariableType` DEFAULT: `None`
`development`	If `True` the pipeline is run in development mode TYPE: `bool \| VariableType` DEFAULT: `None`
`edition`	Name of the product edition TYPE: `Literal['CORE', 'PRO', 'ADVANCED'] \| VariableType` DEFAULT: `None`
`event_log`	An optional block specifying a table where DLT Event Log will be stored. TYPE: `PipelineEventLog \| VariableType` DEFAULT: `None`
`expected_last_modified`	TYPE: `int \| VariableType` DEFAULT: `None`
`filters`	Filters on which Pipeline packages to include in the deployed graph. TYPE: `PipelineFilters \| VariableType` DEFAULT: `None`
`gateway_definition`	The definition of a gateway pipeline to support CDC. TYPE: `PipelineGatewayDefinition \| VariableType` DEFAULT: `None`
`health`	TYPE: `str \| VariableType` DEFAULT: `None`
`last_modified`	TYPE: `int \| VariableType` DEFAULT: `None`
`latest_updates`	TYPE: `list[Union[PipelineLatestUpdate, VariableType]] \| VariableType` DEFAULT: `None`
`libraries`	Specifies pipeline code (notebooks) and required artifacts. TYPE: `list[Union[PipelineLibrary, VariableType]] \| VariableType` DEFAULT: `None`
`name`	Pipeline name TYPE: `str \| VariableType`
`name_prefix`	Prefix added to the DLT pipeline name TYPE: `str \| VariableType` DEFAULT: `None`
`name_suffix`	Suffix added to the DLT pipeline name TYPE: `str \| VariableType` DEFAULT: `None`
`notifications`	Notifications specifications TYPE: `list[Union[PipelineNotifications, VariableType]] \| VariableType` DEFAULT: `[]`
`photon`	If `True`, Photon engine enabled. TYPE: `bool \| VariableType` DEFAULT: `None`
`restart_window`	TYPE: `PipelineRestartWindow \| VariableType` DEFAULT: `None`
`root_path`	An optional string specifying the root path for this pipeline. This is used as the root directory when editing the pipeline in the Databricks user interface and it is added to sys.path when executing Python sources during pipeline execution. TYPE: `str \| VariableType` DEFAULT: `None`
`run_as`	TYPE: `PipelineRunAs \| VariableType` DEFAULT: `None`
`run_as_user_name`	TYPE: `str \| VariableType` DEFAULT: `None`
`schema_`	The default schema (database) where tables are read from or published to. The presence of this attribute implies that the pipeline is in direct publishing mode. TYPE: `str \| VariableType` DEFAULT: `None`
`serverless`	If `True`, serverless is enabled TYPE: `bool \| VariableType` DEFAULT: `None`
`state`	TYPE: `str \| VariableType` DEFAULT: `None`
`storage`	A location on DBFS or cloud storage where output data and metadata required for pipeline execution are stored. By default, tables are stored in a subdirectory of this location. Change of this parameter forces recreation of the pipeline. (Conflicts with `catalog`). TYPE: `str \| VariableType` DEFAULT: `None`
`tags`	A map of tags associated with the pipeline. These are forwarded to the cluster as cluster tags, and are therefore subject to the same limitations. A maximum of 25 tags can be added to the pipeline. TYPE: `dict[Union[str, VariableType], Union[str, VariableType]] \| VariableType` DEFAULT: `None`
`target`	The name of a database (in either the Hive metastore or in a UC catalog) for persisting pipeline output data. Configuring the target setting allows you to view and query the pipeline output data from the Databricks UI. TYPE: `str \| VariableType` DEFAULT: `None`
`trigger`	TYPE: `PipelineTrigger \| VariableType` DEFAULT: `None`
`url`	URL of the DLT pipeline on the given workspace. TYPE: `str \| VariableType` DEFAULT: `None`
`type`	Type of orchestrator TYPE: `Literal['DATABRICKS_PIPELINE'] \| VariableType` DEFAULT: `'DATABRICKS_PIPELINE'`
`config_file`	Pipeline configuration (json) file deployed to the workspace and used by the job to read and execute the pipeline. TYPE: `PipelineConfigWorkspaceFile \| VariableType`

METHOD	DESCRIPTION
`inject_vars`	Inject model variables values into a model attributes.
`inject_vars_into_dump`	Inject model variables values into a model dump.
`model_validate_json_file`	Load model from json file object
`model_validate_yaml`	Load model from yaml file object using laktory.yaml.RecursiveLoader. Supports
`push_vars`	Push variable values to all child recursively
`validate_assignment_disabled`	Updating a model attribute inside a model validator when `validate_assignment`

ATTRIBUTE	DESCRIPTION
`additional_core_resources`	configuration workspace file TYPE: `list[PulumiResource]`
`core_resources`	List of core resources to be deployed with this laktory model:
`resource_key`	Resource key used to build default resource name. Equivalent to TYPE: `str`
`self_as_core_resources`	Flag set to `True` if self must be included in core resources

`additional_core_resources` `property` ¤

configuration workspace file
configuration workspace file permissions

`core_resources` `property` ¤

List of core resources to be deployed with this laktory model: - class instance (self)

`resource_key` `property` ¤

Resource key used to build default resource name. Equivalent to name properties if available. Otherwise, empty string.

`self_as_core_resources` `property` ¤

Flag set to True if self must be included in core resources

`inject_vars(inplace=False, vars=None)` ¤

Inject model variables values into a model attributes.

PARAMETER	DESCRIPTION
`inplace`	If `True` model is modified in place. Otherwise, a new model instance is returned. TYPE: `bool` DEFAULT: `False`
`vars`	A dictionary of variables to be injected in addition to the model internal variables. TYPE: `dict` DEFAULT: `None`

RETURNS	DESCRIPTION
	Model instance.

Examples:

from typing import Union

from laktory import models


class Cluster(models.BaseModel):
    name: str = None
    size: Union[int, str] = None


c = Cluster(
    name="cluster-${vars.my_cluster}",
    size="${{ 4 if vars.env == 'prod' else 2 }}",
    variables={
        "env": "dev",
    },
).inject_vars()
print(c)
# > variables={'env': 'dev'} name='cluster-${vars.my_cluster}' size=2

References

variables

Source code in laktory/models/basemodel.py

def inject_vars(self, inplace: bool = False, vars: dict = None):
    """
    Inject model variables values into a model attributes.

    Parameters
    ----------
    inplace:
        If `True` model is modified in place. Otherwise, a new model
        instance is returned.
    vars:
        A dictionary of variables to be injected in addition to the
        model internal variables.


    Returns
    -------
    :
        Model instance.

    Examples
    --------
    ```py
    from typing import Union

    from laktory import models


    class Cluster(models.BaseModel):
        name: str = None
        size: Union[int, str] = None


    c = Cluster(
        name="cluster-${vars.my_cluster}",
        size="${{ 4 if vars.env == 'prod' else 2 }}",
        variables={
            "env": "dev",
        },
    ).inject_vars()
    print(c)
    # > variables={'env': 'dev'} name='cluster-${vars.my_cluster}' size=2
    ```

    References
    ----------
    * [variables](https://www.laktory.ai/concepts/variables/)
    """

    # Fetching vars
    if vars is None:
        vars = {}

    vars = deepcopy(vars)
    vars.update(self.variables)

    # TODO: Review implementation as it results in serious performance hits
    # from laktory.models.pipeline import Pipeline
    # from laktory.models.pipeline import PipelineNode
    #
    # if isinstance(self, Pipeline):
    #     vars["_pl"] = self
    #
    # if isinstance(self, PipelineNode):
    #     vars["_pl_node"] = self

    # Create copy
    if not inplace:
        self = self.model_copy(deep=True)

    # Inject into field values
    for k in list(self.model_fields_set):
        if k == "variables":
            continue
        o = getattr(self, k)

        if isinstance(o, BaseModel) or isinstance(o, dict) or isinstance(o, list):
            # Mutable objects will be updated in place
            _resolve_values(o, vars)
        else:
            # Simple objects must be updated explicitly
            setattr(self, k, _resolve_value(o, vars))

    # Inject into child resources
    if hasattr(self, "core_resources"):
        for r in self.core_resources:
            if r == self:
                continue
            r.inject_vars(vars=vars, inplace=True)

    if not inplace:
        return self

`inject_vars_into_dump(dump, inplace=False, vars=None)` ¤

Inject model variables values into a model dump.

PARAMETER	DESCRIPTION
`dump`	Model dump (or any other general purpose mutable object) TYPE: `dict[str, Any]`
`inplace`	If `True` model is modified in place. Otherwise, a new model instance is returned. TYPE: `bool` DEFAULT: `False`
`vars`	A dictionary of variables to be injected in addition to the model internal variables. TYPE: `dict[str, Any]` DEFAULT: `None`

RETURNS	DESCRIPTION
	Model dump with injected variables.

Examples:

from laktory import models

m = models.BaseModel(
    variables={
        "env": "dev",
    },
)
data = {
    "name": "cluster-${vars.my_cluster}",
    "size": "${{ 4 if vars.env == 'prod' else 2 }}",
}
print(m.inject_vars_into_dump(data))
# > {'name': 'cluster-${vars.my_cluster}', 'size': 2}

References

variables

Source code in laktory/models/basemodel.py

def inject_vars_into_dump(
    self, dump: dict[str, Any], inplace: bool = False, vars: dict[str, Any] = None
):
    """
    Inject model variables values into a model dump.

    Parameters
    ----------
    dump:
        Model dump (or any other general purpose mutable object)
    inplace:
        If `True` model is modified in place. Otherwise, a new model
        instance is returned.
    vars:
        A dictionary of variables to be injected in addition to the
        model internal variables.


    Returns
    -------
    :
        Model dump with injected variables.


    Examples
    --------
    ```py
    from laktory import models

    m = models.BaseModel(
        variables={
            "env": "dev",
        },
    )
    data = {
        "name": "cluster-${vars.my_cluster}",
        "size": "${{ 4 if vars.env == 'prod' else 2 }}",
    }
    print(m.inject_vars_into_dump(data))
    # > {'name': 'cluster-${vars.my_cluster}', 'size': 2}
    ```

    References
    ----------
    * [variables](https://www.laktory.ai/concepts/variables/)
    """

    # Setting vars
    if vars is None:
        vars = {}
    vars = deepcopy(vars)
    vars.update(self.variables)

    # Create copy
    if not inplace:
        dump = copy.deepcopy(dump)

    # Inject into field values
    _resolve_values(dump, vars)

    if not inplace:
        return dump

`model_validate_json_file(fp)` `classmethod` ¤

Load model from json file object

PARAMETER	DESCRIPTION
`fp`	file object structured as a json file TYPE: `TextIO`

RETURNS	DESCRIPTION
`Model`	Model instance

Source code in laktory/models/basemodel.py

@classmethod
def model_validate_json_file(cls: Type[Model], fp: TextIO) -> Model:
    """
    Load model from json file object

    Parameters
    ----------
    fp:
        file object structured as a json file

    Returns
    -------
    :
        Model instance
    """
    data = json.load(fp)
    return cls.model_validate(data)

`model_validate_yaml(fp, vars=None)` `classmethod` ¤

Load model from yaml file object using laktory.yaml.RecursiveLoader. Supports reference to external yaml and sql files using !use, !extend and !update tags. Path to external files can be defined using model or environment variables.

Referenced path should always be relative to the file they are referenced from.

Custom Tags

!use {filepath}: Directly inject the content of the file at filepath
- !extend {filepath}: Extend the current list with the elements found in the file at filepath. Similar to python list.extend method.
<<: !update {filepath}: Merge the current dictionary with the content of the dictionary defined at filepath. Similar to python dict.update method.

PARAMETER	DESCRIPTION
`fp`	file object structured as a yaml file TYPE: `TextIO`
`vars`	Dict of variables available when parsing filepaths references in yaml files i.e. `!use catalog_${vars.env}.yaml` DEFAULT: `None`

RETURNS	DESCRIPTION
`Model`	Model instance

Examples:

businesses:
  apple:
    symbol: aapl
    address: !use addresses.yaml
    <<: !update common.yaml
    emails:
      - jane.doe@apple.com
      - extend! emails.yaml
  amazon:
    symbol: amzn
    address: !use addresses.yaml
    <<: update! common.yaml
    emails:
      - john.doe@amazon.com
      - extend! emails.yaml

Source code in laktory/models/basemodel.py

@classmethod
def model_validate_yaml(cls: Type[Model], fp: TextIO, vars=None) -> Model:
    """
    Load model from yaml file object using laktory.yaml.RecursiveLoader. Supports
    reference to external yaml and sql files using `!use`, `!extend` and `!update` tags.
    Path to external files can be defined using model or environment variables.

    Referenced path should always be relative to the file they are referenced from.

    Custom Tags
    -----------
    - `!use {filepath}`:
        Directly inject the content of the file at `filepath`

    - `- !extend {filepath}`:
        Extend the current list with the elements found in the file at `filepath`.
        Similar to python list.extend method.

    - `<<: !update {filepath}`:
        Merge the current dictionary with the content of the dictionary defined at
        `filepath`. Similar to python dict.update method.

    Parameters
    ----------
    fp:
        file object structured as a yaml file
    vars:
        Dict of variables available when parsing filepaths references in yaml files
        i.e. `!use catalog_${vars.env}.yaml`

    Returns
    -------
    :
        Model instance

    Examples
    --------
    ```yaml
    businesses:
      apple:
        symbol: aapl
        address: !use addresses.yaml
        <<: !update common.yaml
        emails:
          - jane.doe@apple.com
          - extend! emails.yaml
      amazon:
        symbol: amzn
        address: !use addresses.yaml
        <<: update! common.yaml
        emails:
          - john.doe@amazon.com
          - extend! emails.yaml
    ```
    """

    data = RecursiveLoader.load(fp, vars=vars)
    return cls.model_validate(data)

`push_vars(update_core_resources=False)` ¤

Push variable values to all child recursively

Source code in laktory/models/basemodel.py

def push_vars(self, update_core_resources=False) -> Any:
    """Push variable values to all child recursively"""

    def _update_model(m):
        if not isinstance(m, BaseModel):
            return
        for k, v in self.variables.items():
            m.variables[k] = m.variables.get(k, v)
        m.push_vars()

    def _push_vars(o):
        if isinstance(o, list):
            for _o in o:
                _push_vars(_o)
        elif isinstance(o, dict):
            for _o in o.values():
                _push_vars(_o)
        else:
            _update_model(o)

    for k in self.model_fields.keys():
        _push_vars(getattr(self, k))

    if update_core_resources and hasattr(self, "core_resources"):
        for r in self.core_resources:
            if r != self:
                _push_vars(r)

    return None

`validate_assignment_disabled()` ¤

Updating a model attribute inside a model validator when validate_assignment is True causes an infinite recursion by design and must be turned off temporarily.

Source code in laktory/models/basemodel.py

@contextmanager
def validate_assignment_disabled(self):
    """
    Updating a model attribute inside a model validator when `validate_assignment`
    is `True` causes an infinite recursion by design and must be turned off
    temporarily.
    """
    original_state = self.model_config["validate_assignment"]
    self.model_config["validate_assignment"] = False
    try:
        yield
    finally:
        self.model_config["validate_assignment"] = original_state

Databricks Pipeline

laktory.models.pipeline.orchestrators.databrickspipelineorchestrator.DatabricksPipelineOrchestrator ¤

additional_core_resources property ¤

core_resources property ¤

resource_key property ¤

self_as_core_resources property ¤

inject_vars(inplace=False, vars=None) ¤

inject_vars_into_dump(dump, inplace=False, vars=None) ¤

model_validate_json_file(fp) classmethod ¤

model_validate_yaml(fp, vars=None) classmethod ¤

push_vars(update_core_resources=False) ¤

validate_assignment_disabled() ¤

`laktory.models.pipeline.orchestrators.databrickspipelineorchestrator.DatabricksPipelineOrchestrator` ¤

`additional_core_resources` `property` ¤

`core_resources` `property` ¤

`resource_key` `property` ¤

`self_as_core_resources` `property` ¤

`inject_vars(inplace=False, vars=None)` ¤

`inject_vars_into_dump(dump, inplace=False, vars=None)` ¤

`model_validate_json_file(fp)` `classmethod` ¤

`model_validate_yaml(fp, vars=None)` `classmethod` ¤

`push_vars(update_core_resources=False)` ¤

`validate_assignment_disabled()` ¤