Skip to content

DataQualityMonitor

laktory.models.resources.databricks.DataQualityMonitor ¤

Bases: DataQualityMonitorBase

Databricks Data Quality Monitor

Manages data quality monitoring on Unity Catalog objects. For table-level data profiling, set object_type to "table" and configure data_profiling_config. For schema-level anomaly detection, set object_type to "schema" and configure anomaly_detection_config.

When attached to a UnityCatalogDataSink via the data_profiling_config field, object_type and object_id are populated automatically from the sink. The object_id and output_schema_id inside data_profiling_config accept either a UUID or a human-readable name (e.g. dev.finance.slv_prices or dev.monitoring); Laktory resolves names to UUIDs at execution time.

Examples:

import io

from laktory import models

dqm_yaml = '''
object_type: table
object_id: dev.finance.slv_stock_prices
data_profiling_config:
  output_schema_id: dev.monitoring
  snapshot: {}
'''
dqm = models.resources.databricks.DataQualityMonitor.model_validate_yaml(
    io.StringIO(dqm_yaml)
)
References
BASE DESCRIPTION
anomaly_detection_config

Anomaly Detection Configuration, applicable to schema object types

TYPE: DataQualityMonitorAnomalyDetectionConfig | None | VariableType DEFAULT: None

data_profiling_config

Data Profiling Configuration, applicable to table object types. Exactly one Analysis Configuration must be present

TYPE: DataQualityMonitorDataProfilingConfig | None | VariableType DEFAULT: None

object_id

The UUID of the request object. It is schema_id for schema, and table_id for table

TYPE: str | VariableType

object_type

The type of the monitored object. Can be one of the following: schema or table

TYPE: str | VariableType

provider_config

Configure the provider for management through account provider

TYPE: DataQualityMonitorProviderConfig | None | VariableType DEFAULT: None


laktory.models.resources.databricks.dataqualitymonitor.DataQualityMonitorAnomalyDetectionConfig ¤

Bases: BaseModel

PARAMETER DESCRIPTION
excluded_table_full_names

List of fully qualified table names to exclude from anomaly detection

TYPE: list[str] | None | VariableType DEFAULT: None


laktory.models.resources.databricks.dataqualitymonitor.DataQualityMonitorDataProfilingConfig ¤

Bases: BaseModel

PARAMETER DESCRIPTION
assets_dir

Field for specifying the absolute path to a custom directory to store data-monitoring assets. Normally prepopulated to a default user location via UI and Python APIs

TYPE: str | None | VariableType DEFAULT: None

baseline_table_name

Baseline table name. Baseline data is used to compute drift from the data in the monitored table_name. The baseline table and the monitored table shall have the same schema

TYPE: str | None | VariableType DEFAULT: None

custom_metrics

Custom metrics

TYPE: list[DataQualityMonitorDataProfilingConfigCustomMetrics] | None | VariableType DEFAULT: None

inference_log

Analysis Configuration for monitoring inference log tables

TYPE: DataQualityMonitorDataProfilingConfigInferenceLog | None | VariableType DEFAULT: None

notification_settings

Field for specifying notification settings

TYPE: DataQualityMonitorDataProfilingConfigNotificationSettings | None | VariableType DEFAULT: None

output_schema_id

ID of the schema where output tables are created

TYPE: str | VariableType

schedule

The cron schedule

TYPE: DataQualityMonitorDataProfilingConfigSchedule | None | VariableType DEFAULT: None

skip_builtin_dashboard

Whether to skip creating a default dashboard summarizing data quality metrics

TYPE: bool | None | VariableType DEFAULT: None

slicing_exprs

List of column expressions to slice data with for targeted analysis. The data is grouped by each expression independently, resulting in a separate slice for each predicate and its complements. For example slicing_exprs=[“col_1”, “col_2 > 10”] will generate the following slices: two slices for col_2 > 10 (True and False), and one slice per unique value in col1. For high-cardinality columns, only the top 100 unique values by frequency will generate slices

TYPE: list[str] | None | VariableType DEFAULT: None

snapshot

Analysis Configuration for monitoring snapshot tables

TYPE: DataQualityMonitorDataProfilingConfigSnapshot | None | VariableType DEFAULT: None

time_series

Analysis Configuration for monitoring time series tables

TYPE: DataQualityMonitorDataProfilingConfigTimeSeries | None | VariableType DEFAULT: None

warehouse_id

Optional argument to specify the warehouse for dashboard creation. If not specified, the first running warehouse will be used

TYPE: str | None | VariableType DEFAULT: None


laktory.models.resources.databricks.dataqualitymonitor.DataQualityMonitorDataProfilingConfigCustomMetrics ¤

Bases: BaseModel

PARAMETER DESCRIPTION
definition

Jinja template for a SQL expression that specifies how to compute the metric. See create metric definition

TYPE: str | VariableType

input_columns

A list of column names in the input table the metric should be computed for. Can use ':table' to indicate that the metric needs information from multiple columns

TYPE: list[str] | VariableType

name

Name of the metric in the output tables

TYPE: str | VariableType

output_data_type

The output type of the custom metric

TYPE: str | VariableType

type

The type of the custom metric. Possible values are: DATA_PROFILING_CUSTOM_METRIC_TYPE_AGGREGATE, DATA_PROFILING_CUSTOM_METRIC_TYPE_DERIVED, DATA_PROFILING_CUSTOM_METRIC_TYPE_DRIFT

TYPE: str | VariableType


laktory.models.resources.databricks.dataqualitymonitor.DataQualityMonitorDataProfilingConfigInferenceLog ¤

Bases: BaseModel

PARAMETER DESCRIPTION
granularities

List of granularities to use when aggregating data into time windows based on their timestamp

TYPE: list[str] | VariableType

label_column

Column for the label

TYPE: str | None | VariableType DEFAULT: None

model_id_column

Column for the model identifier

TYPE: str | VariableType

prediction_column

Column for the prediction

TYPE: str | VariableType

problem_type

Problem type the model aims to solve. Possible values are: INFERENCE_PROBLEM_TYPE_CLASSIFICATION, INFERENCE_PROBLEM_TYPE_REGRESSION

TYPE: str | VariableType

timestamp_column

Column for the timestamp

TYPE: str | VariableType


laktory.models.resources.databricks.dataqualitymonitor.DataQualityMonitorDataProfilingConfigNotificationSettings ¤

Bases: BaseModel

PARAMETER DESCRIPTION
on_failure

Destinations to send notifications on failure/timeout

TYPE: DataQualityMonitorDataProfilingConfigNotificationSettingsOnFailure | None | VariableType DEFAULT: None


laktory.models.resources.databricks.dataqualitymonitor.DataQualityMonitorDataProfilingConfigNotificationSettingsOnFailure ¤

Bases: BaseModel

PARAMETER DESCRIPTION
email_addresses

The list of email addresses to send the notification to. A maximum of 5 email addresses is supported

TYPE: list[str] | None | VariableType DEFAULT: None


laktory.models.resources.databricks.dataqualitymonitor.DataQualityMonitorDataProfilingConfigSchedule ¤

Bases: BaseModel

PARAMETER DESCRIPTION
quartz_cron_expression

The expression that determines when to run the monitor. See examples

TYPE: str | VariableType

timezone_id

A Java timezone id. The schedule for a job will be resolved with respect to this timezone. See Java TimeZone <http://docs.oracle.com/javase/7/docs/api/java/util/TimeZone.html>_ for details. The timezone id (e.g., America/Los_Angeles) in which to evaluate the quartz expression

TYPE: str | VariableType


laktory.models.resources.databricks.dataqualitymonitor.DataQualityMonitorDataProfilingConfigSnapshot ¤

Bases: BaseModel


laktory.models.resources.databricks.dataqualitymonitor.DataQualityMonitorDataProfilingConfigTimeSeries ¤

Bases: BaseModel

PARAMETER DESCRIPTION
granularities

List of granularities to use when aggregating data into time windows based on their timestamp

TYPE: list[str] | VariableType

timestamp_column

Column for the timestamp

TYPE: str | VariableType


laktory.models.resources.databricks.dataqualitymonitor.DataQualityMonitorProviderConfig ¤

Bases: BaseModel

PARAMETER DESCRIPTION
workspace_id

Workspace ID which the resource belongs to. This workspace must be part of the account which the provider is configured with

TYPE: str | None | VariableType DEFAULT: None