Pipeline cluster

`laktory.models.resources.databricks.pipeline.PipelineCluster` ¤

Bases: Cluster

Pipeline Cluster. Same attributes as laktory.models.Cluster, except for

autotermination_minutes
cluster_id
data_security_mode
enable_elastic_disk
idempotency_token
is_pinned
libraries
no_wait
node_type_id
runtime_engine
single_user_name
spark_version

that are not allowed.

PARAMETER	DESCRIPTION
`resource_name_`	Name of the resource in the context of infrastructure as code. If None, `default_resource_name` will be used instead. TYPE: `str \| VariableType` DEFAULT: `None`
`options`	Resources options specifications TYPE: `ResourceOptions \| VariableType` DEFAULT: `ResourceOptions(variables={}, is_enabled=True, depends_on=[], provider=None, ignore_changes=None, aliases=None, delete_before_replace=True, import_=None, parent=None, replace_on_changes=None, moved_from=None)`
`lookup_existing`	Specifications for looking up existing resource. Other attributes will be ignored. TYPE: `ClusterLookup \| VariableType` DEFAULT: `None`
`variables`	Dict of variables to be injected in the model at runtime TYPE: `dict[str, Any]` DEFAULT: `{}`
`access_controls`	List of access controls TYPE: `list[Union[AccessControl, VariableType]] \| VariableType` DEFAULT: `[]`
`apply_policy_default_values`	Whether to use policy default values for missing cluster attributes. TYPE: `bool \| VariableType` DEFAULT: `None`
`autoscale`	Autoscale specifications TYPE: `ClusterAutoScale \| VariableType` DEFAULT: `None`
`autotermination_minutes`	TYPE: `int \| VariableType` DEFAULT: `None`
`cluster_id`	TYPE: `str \| VariableType` DEFAULT: `None`
`cluster_name`	Cluster name, which doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string. TYPE: `str \| VariableType` DEFAULT: `None`
`custom_tags`	Additional tags for cluster resources. Databricks will tag all cluster resources (e.g., AWS EC2 instances and EBS volumes) with these tags in addition to default_tags. If a custom cluster tag has the same name as a default cluster tag, the custom tag is prefixed with an x_ when it is propagated. TYPE: `dict[Union[str, VariableType], Union[str, VariableType]] \| VariableType` DEFAULT: `None`
`data_security_mode`	TYPE: `str \| VariableType` DEFAULT: `None`
`driver_instance_pool_id`	Similar to instance_pool_id, but for driver node. If omitted, and instance_pool_id is specified, then the driver will be allocated from that pool. TYPE: `str \| VariableType` DEFAULT: `None`
`driver_node_type_id`	The node type of the Spark driver. This field is optional; if unset, API will set the driver node type to the same value as node_type_id defined above. TYPE: `str \| VariableType` DEFAULT: `None`
`enable_elastic_disk`	TYPE: `bool \| VariableType` DEFAULT: `None`
`enable_local_disk_encryption`	Some instance types you use to run clusters may have locally attached disks. Databricks may store shuffle data or temporary data on these locally attached disks. To ensure that all data at rest is encrypted for all storage types, including shuffle data stored temporarily on your cluster’s local disks, you can enable local disk encryption. When local disk encryption is enabled, Databricks generates an encryption key locally unique to each cluster node and uses it to encrypt all data stored on local disks. The scope of the key is local to each cluster node and is destroyed along with the cluster node itself. During its lifetime, the key resides in memory for encryption and decryption and is stored encrypted on the disk. Your workloads may run more slowly because of the performance impact of reading and writing encrypted data to and from local volumes. This feature is not available for all Azure Databricks subscriptions. Contact your Microsoft or Databricks account representative to request access. TYPE: `bool \| VariableType` DEFAULT: `None`
`idempotency_token`	TYPE: `str \| VariableType` DEFAULT: `None`
`init_scripts`	List of init scripts specifications TYPE: `list[Union[ClusterInitScript, VariableType]] \| VariableType` DEFAULT: `[]`
`instance_pool_id`	To reduce cluster start time, you can attach a cluster to a predefined pool of idle instances. When attached to a pool, a cluster allocates its driver and worker nodes from the pool. If the pool does not have sufficient idle resources to accommodate the cluster’s request, it expands by allocating new instances from the instance provider. When an attached cluster changes its state to TERMINATED, the instances it used are returned to the pool and reused by a different cluster. TYPE: `str \| VariableType` DEFAULT: `None`
`is_pinned`	TYPE: `bool \| VariableType` DEFAULT: `None`
`is_single_node`	When set to true, Databricks will automatically set single node related custom_tags, spark_conf, and num_workers. TYPE: `bool \| VariableType` DEFAULT: `None`
`kind`	The kind of compute described by this compute specification. Possible values (see API docs for full list): CLASSIC_PREVIEW (if corresponding public preview is enabled). TYPE: `str \| VariableType` DEFAULT: `None`
`libraries`	TYPE: `list[Union[Any, VariableType]] \| VariableType` DEFAULT: `None`
`node_type_id`	TYPE: `str \| VariableType` DEFAULT: `None`
`no_wait`	TYPE: `bool \| VariableType` DEFAULT: `None`
`num_workers`	Number of worker nodes that this cluster should have. A cluster has one Spark driver and num_workers executors for a total of num_workers + 1 Spark nodes. TYPE: `int \| VariableType` DEFAULT: `None`
`policy_id`	TYPE: `str \| VariableType` DEFAULT: `None`
`runtime_engine`	TYPE: `str \| VariableType` DEFAULT: `None`
`remote_disk_throughput`	TYPE: `int \| VariableType` DEFAULT: `None`
`single_user_name`	TYPE: `str \| VariableType` DEFAULT: `None`
`spark_conf`	Map with key-value pairs to fine-tune Spark clusters, where you can provide custom Spark configuration properties in a cluster configuration. TYPE: `dict[Union[str, VariableType], Union[str, VariableType]] \| VariableType` DEFAULT: `{}`
`spark_env_vars`	Map with environment variable key-value pairs to fine-tune Spark clusters. Key-value pairs of the form (X,Y) are exported (i.e., X='Y') while launching the driver and workers. TYPE: `dict[Union[str, VariableType], Union[str, VariableType]] \| VariableType` DEFAULT: `{}`
`spark_version`	TYPE: `str \| VariableType` DEFAULT: `None`
`ssh_public_keys`	SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. You can specify up to 10 keys. TYPE: `list[Union[str, VariableType]] \| VariableType` DEFAULT: `[]`
`total_initial_remote_disk_size`	TYPE: `int \| VariableType` DEFAULT: `None`
`use_ml_runtime`	Whenever ML runtime should be selected or not. Actual runtime is determined by spark_version (DBR release), this field use_ml_runtime, and whether node_type_id is GPU node or not. TYPE: `bool \| VariableType` DEFAULT: `None`

METHOD	DESCRIPTION
`inject_vars`	Inject model variables values into a model attributes.
`inject_vars_into_dump`	Inject model variables values into a model dump.
`model_validate_json_file`	Load model from json file object
`model_validate_yaml`	Load model from yaml file object using laktory.yaml.RecursiveLoader. Supports
`push_vars`	Push variable values to all child recursively
`validate_assignment_disabled`	Updating a model attribute inside a model validator when `validate_assignment`

ATTRIBUTE	DESCRIPTION
`additional_core_resources`	permissions TYPE: `list[PulumiResource]`
`core_resources`	List of core resources to be deployed with this laktory model:
`pulumi_properties`	Resources properties formatted for pulumi: TYPE: `dict`
`pulumi_renames`	Map of fields to rename when dumping model to pulumi TYPE: `dict[str, str]`
`resource_key`	Resource key used to build default resource name. Equivalent to TYPE: `str`
`resource_type_id`	Resource type id used to build default resource name. Equivalent to TYPE: `str`
`self_as_core_resources`	Flag set to `True` if self must be included in core resources
`terraform_properties`	Resources properties formatted for terraform: TYPE: `dict`
`terraform_renames`	Map of fields to rename when dumping model to terraform TYPE: `dict[str, str]`

`additional_core_resources` `property` ¤

permissions

`core_resources` `property` ¤

List of core resources to be deployed with this laktory model: - class instance (self)

`pulumi_properties` `property` ¤

Resources properties formatted for pulumi:

Serialization (model dump)
Removal of excludes defined in self.pulumi_excludes
Renaming of keys according to self.pulumi_renames
Injection of variables

RETURNS	DESCRIPTION
`dict`	Pulumi-safe model dump

`pulumi_renames` `property` ¤

Map of fields to rename when dumping model to pulumi

`resource_key` `property` ¤

Resource key used to build default resource name. Equivalent to name properties if available. Otherwise, empty string.

`resource_type_id` `property` ¤

Resource type id used to build default resource name. Equivalent to class name converted to kebab case. e.g.: SecretScope -> secret-scope

`self_as_core_resources` `property` ¤

Flag set to True if self must be included in core resources

`terraform_properties` `property` ¤

Resources properties formatted for terraform:

Serialization (model dump)
Removal of excludes defined in self.terraform_excludes
Renaming of keys according to self.terraform_renames
Injection of variables

RETURNS	DESCRIPTION
`dict`	Terraform-safe model dump

`terraform_renames` `property` ¤

Map of fields to rename when dumping model to terraform

`inject_vars(inplace=False, vars=None)` ¤

Inject model variables values into a model attributes.

PARAMETER	DESCRIPTION
`inplace`	If `True` model is modified in place. Otherwise, a new model instance is returned. TYPE: `bool` DEFAULT: `False`
`vars`	A dictionary of variables to be injected in addition to the model internal variables. TYPE: `dict` DEFAULT: `None`

RETURNS	DESCRIPTION
	Model instance.

Examples:

from typing import Union

from laktory import models


class Cluster(models.BaseModel):
    name: str = None
    size: Union[int, str] = None


c = Cluster(
    name="cluster-${vars.my_cluster}",
    size="${{ 4 if vars.env == 'prod' else 2 }}",
    variables={
        "env": "dev",
    },
).inject_vars()
print(c)
# > variables={'env': 'dev'} name='cluster-${vars.my_cluster}' size=2

References

variables

Source code in laktory/models/basemodel.py

def inject_vars(self, inplace: bool = False, vars: dict = None):
    """
    Inject model variables values into a model attributes.

    Parameters
    ----------
    inplace:
        If `True` model is modified in place. Otherwise, a new model
        instance is returned.
    vars:
        A dictionary of variables to be injected in addition to the
        model internal variables.


    Returns
    -------
    :
        Model instance.

    Examples
    --------
    ```py
    from typing import Union

    from laktory import models


    class Cluster(models.BaseModel):
        name: str = None
        size: Union[int, str] = None


    c = Cluster(
        name="cluster-${vars.my_cluster}",
        size="${{ 4 if vars.env == 'prod' else 2 }}",
        variables={
            "env": "dev",
        },
    ).inject_vars()
    print(c)
    # > variables={'env': 'dev'} name='cluster-${vars.my_cluster}' size=2
    ```

    References
    ----------
    * [variables](https://www.laktory.ai/concepts/variables/)
    """

    # Fetching vars
    if vars is None:
        vars = {}

    vars = deepcopy(vars)
    vars.update(self.variables)

    # TODO: Review implementation as it results in serious performance hits
    # from laktory.models.pipeline import Pipeline
    # from laktory.models.pipeline import PipelineNode
    #
    # if isinstance(self, Pipeline):
    #     vars["_pl"] = self
    #
    # if isinstance(self, PipelineNode):
    #     vars["_pl_node"] = self

    # Create copy
    if not inplace:
        self = self.model_copy(deep=True)

    # Inject into field values
    for k in list(self.model_fields_set):
        if k == "variables":
            continue
        o = getattr(self, k)

        if isinstance(o, BaseModel) or isinstance(o, dict) or isinstance(o, list):
            # Mutable objects will be updated in place
            _resolve_values(o, vars)
        else:
            # Simple objects must be updated explicitly
            setattr(self, k, _resolve_value(o, vars))

    # Inject into child resources
    if hasattr(self, "core_resources"):
        for r in self.core_resources:
            if r == self:
                continue
            r.inject_vars(vars=vars, inplace=True)

    if not inplace:
        return self

`inject_vars_into_dump(dump, inplace=False, vars=None)` ¤

Inject model variables values into a model dump.

PARAMETER	DESCRIPTION
`dump`	Model dump (or any other general purpose mutable object) TYPE: `dict[str, Any]`
`inplace`	If `True` model is modified in place. Otherwise, a new model instance is returned. TYPE: `bool` DEFAULT: `False`
`vars`	A dictionary of variables to be injected in addition to the model internal variables. TYPE: `dict[str, Any]` DEFAULT: `None`

RETURNS	DESCRIPTION
	Model dump with injected variables.

Examples:

from laktory import models

m = models.BaseModel(
    variables={
        "env": "dev",
    },
)
data = {
    "name": "cluster-${vars.my_cluster}",
    "size": "${{ 4 if vars.env == 'prod' else 2 }}",
}
print(m.inject_vars_into_dump(data))
# > {'name': 'cluster-${vars.my_cluster}', 'size': 2}

References

variables

Source code in laktory/models/basemodel.py

def inject_vars_into_dump(
    self, dump: dict[str, Any], inplace: bool = False, vars: dict[str, Any] = None
):
    """
    Inject model variables values into a model dump.

    Parameters
    ----------
    dump:
        Model dump (or any other general purpose mutable object)
    inplace:
        If `True` model is modified in place. Otherwise, a new model
        instance is returned.
    vars:
        A dictionary of variables to be injected in addition to the
        model internal variables.


    Returns
    -------
    :
        Model dump with injected variables.


    Examples
    --------
    ```py
    from laktory import models

    m = models.BaseModel(
        variables={
            "env": "dev",
        },
    )
    data = {
        "name": "cluster-${vars.my_cluster}",
        "size": "${{ 4 if vars.env == 'prod' else 2 }}",
    }
    print(m.inject_vars_into_dump(data))
    # > {'name': 'cluster-${vars.my_cluster}', 'size': 2}
    ```

    References
    ----------
    * [variables](https://www.laktory.ai/concepts/variables/)
    """

    # Setting vars
    if vars is None:
        vars = {}
    vars = deepcopy(vars)
    vars.update(self.variables)

    # Create copy
    if not inplace:
        dump = copy.deepcopy(dump)

    # Inject into field values
    _resolve_values(dump, vars)

    if not inplace:
        return dump

`model_validate_json_file(fp)` `classmethod` ¤

Load model from json file object

PARAMETER	DESCRIPTION
`fp`	file object structured as a json file TYPE: `TextIO`

RETURNS	DESCRIPTION
`Model`	Model instance

Source code in laktory/models/basemodel.py

@classmethod
def model_validate_json_file(cls: Type[Model], fp: TextIO) -> Model:
    """
    Load model from json file object

    Parameters
    ----------
    fp:
        file object structured as a json file

    Returns
    -------
    :
        Model instance
    """
    data = json.load(fp)
    return cls.model_validate(data)

`model_validate_yaml(fp, vars=None)` `classmethod` ¤

Load model from yaml file object using laktory.yaml.RecursiveLoader. Supports reference to external yaml and sql files using !use, !extend and !update tags. Path to external files can be defined using model or environment variables.

Referenced path should always be relative to the file they are referenced from.

Custom Tags

!use {filepath}: Directly inject the content of the file at filepath
- !extend {filepath}: Extend the current list with the elements found in the file at filepath. Similar to python list.extend method.
<<: !update {filepath}: Merge the current dictionary with the content of the dictionary defined at filepath. Similar to python dict.update method.

PARAMETER	DESCRIPTION
`fp`	file object structured as a yaml file TYPE: `TextIO`
`vars`	Dict of variables available when parsing filepaths references in yaml files i.e. `!use catalog_${vars.env}.yaml` DEFAULT: `None`

RETURNS	DESCRIPTION
`Model`	Model instance

Examples:

businesses:
  apple:
    symbol: aapl
    address: !use addresses.yaml
    <<: !update common.yaml
    emails:
      - jane.doe@apple.com
      - extend! emails.yaml
  amazon:
    symbol: amzn
    address: !use addresses.yaml
    <<: update! common.yaml
    emails:
      - john.doe@amazon.com
      - extend! emails.yaml

Source code in laktory/models/basemodel.py

@classmethod
def model_validate_yaml(cls: Type[Model], fp: TextIO, vars=None) -> Model:
    """
    Load model from yaml file object using laktory.yaml.RecursiveLoader. Supports
    reference to external yaml and sql files using `!use`, `!extend` and `!update` tags.
    Path to external files can be defined using model or environment variables.

    Referenced path should always be relative to the file they are referenced from.

    Custom Tags
    -----------
    - `!use {filepath}`:
        Directly inject the content of the file at `filepath`

    - `- !extend {filepath}`:
        Extend the current list with the elements found in the file at `filepath`.
        Similar to python list.extend method.

    - `<<: !update {filepath}`:
        Merge the current dictionary with the content of the dictionary defined at
        `filepath`. Similar to python dict.update method.

    Parameters
    ----------
    fp:
        file object structured as a yaml file
    vars:
        Dict of variables available when parsing filepaths references in yaml files
        i.e. `!use catalog_${vars.env}.yaml`

    Returns
    -------
    :
        Model instance

    Examples
    --------
    ```yaml
    businesses:
      apple:
        symbol: aapl
        address: !use addresses.yaml
        <<: !update common.yaml
        emails:
          - jane.doe@apple.com
          - extend! emails.yaml
      amazon:
        symbol: amzn
        address: !use addresses.yaml
        <<: update! common.yaml
        emails:
          - john.doe@amazon.com
          - extend! emails.yaml
    ```
    """

    data = RecursiveLoader.load(fp, vars=vars)
    return cls.model_validate(data)

`push_vars(update_core_resources=False)` ¤

Push variable values to all child recursively

Source code in laktory/models/basemodel.py

def push_vars(self, update_core_resources=False) -> Any:
    """Push variable values to all child recursively"""

    def _update_model(m):
        if not isinstance(m, BaseModel):
            return
        for k, v in self.variables.items():
            m.variables[k] = m.variables.get(k, v)
        m.push_vars()

    def _push_vars(o):
        if isinstance(o, list):
            for _o in o:
                _push_vars(_o)
        elif isinstance(o, dict):
            for _o in o.values():
                _push_vars(_o)
        else:
            _update_model(o)

    for k in self.model_fields.keys():
        _push_vars(getattr(self, k))

    if update_core_resources and hasattr(self, "core_resources"):
        for r in self.core_resources:
            if r != self:
                _push_vars(r)

    return None

`validate_assignment_disabled()` ¤

Updating a model attribute inside a model validator when validate_assignment is True causes an infinite recursion by design and must be turned off temporarily.

Source code in laktory/models/basemodel.py

@contextmanager
def validate_assignment_disabled(self):
    """
    Updating a model attribute inside a model validator when `validate_assignment`
    is `True` causes an infinite recursion by design and must be turned off
    temporarily.
    """
    original_state = self.model_config["validate_assignment"]
    self.model_config["validate_assignment"] = False
    try:
        yield
    finally:
        self.model_config["validate_assignment"] = original_state

Pipeline cluster

laktory.models.resources.databricks.pipeline.PipelineCluster ¤

additional_core_resources property ¤

core_resources property ¤

pulumi_properties property ¤

pulumi_renames property ¤

resource_key property ¤

resource_type_id property ¤

self_as_core_resources property ¤

terraform_properties property ¤

terraform_renames property ¤

inject_vars(inplace=False, vars=None) ¤

inject_vars_into_dump(dump, inplace=False, vars=None) ¤

model_validate_json_file(fp) classmethod ¤

model_validate_yaml(fp, vars=None) classmethod ¤

push_vars(update_core_resources=False) ¤

validate_assignment_disabled() ¤

`laktory.models.resources.databricks.pipeline.PipelineCluster` ¤

`additional_core_resources` `property` ¤

`core_resources` `property` ¤

`pulumi_properties` `property` ¤

`pulumi_renames` `property` ¤

`resource_key` `property` ¤

`resource_type_id` `property` ¤

`self_as_core_resources` `property` ¤

`terraform_properties` `property` ¤

`terraform_renames` `property` ¤

`inject_vars(inplace=False, vars=None)` ¤

`inject_vars_into_dump(dump, inplace=False, vars=None)` ¤

`model_validate_json_file(fp)` `classmethod` ¤

`model_validate_yaml(fp, vars=None)` `classmethod` ¤

`push_vars(update_core_resources=False)` ¤

`validate_assignment_disabled()` ¤