DataQualityMonitor
laktory.models.resources.databricks.DataQualityMonitor
¤
Bases: DataQualityMonitorBase
Databricks Data Quality Monitor
Manages data quality monitoring on Unity Catalog objects. For table-level
data profiling, set object_type to "table" and configure
data_profiling_config. For schema-level anomaly detection, set
object_type to "schema" and configure anomaly_detection_config.
When attached to a UnityCatalogDataSink via the data_profiling_config
field, object_type and object_id are populated automatically from the
sink. The object_id and output_schema_id inside data_profiling_config
accept either a UUID or a human-readable name (e.g. dev.finance.slv_prices
or dev.monitoring); Laktory resolves names to UUIDs at execution time.
Examples:
import io
from laktory import models
dqm_yaml = '''
object_type: table
object_id: dev.finance.slv_stock_prices
data_profiling_config:
output_schema_id: dev.monitoring
snapshot: {}
'''
dqm = models.resources.databricks.DataQualityMonitor.model_validate_yaml(
io.StringIO(dqm_yaml)
)
| BASE | DESCRIPTION |
|---|---|
anomaly_detection_config
|
Anomaly Detection Configuration, applicable to
TYPE:
|
data_profiling_config
|
Data Profiling Configuration, applicable to
TYPE:
|
object_id
|
The UUID of the request object. It is
TYPE:
|
object_type
|
The type of the monitored object. Can be one of the following:
TYPE:
|
provider_config
|
Configure the provider for management through account provider
TYPE:
|
laktory.models.resources.databricks.dataqualitymonitor.DataQualityMonitorAnomalyDetectionConfig
¤
Bases: BaseModel
| PARAMETER | DESCRIPTION |
|---|---|
excluded_table_full_names
|
List of fully qualified table names to exclude from anomaly detection
TYPE:
|
laktory.models.resources.databricks.dataqualitymonitor.DataQualityMonitorDataProfilingConfig
¤
Bases: BaseModel
| PARAMETER | DESCRIPTION |
|---|---|
assets_dir
|
Field for specifying the absolute path to a custom directory to store data-monitoring assets. Normally prepopulated to a default user location via UI and Python APIs
TYPE:
|
baseline_table_name
|
Baseline table name. Baseline data is used to compute drift from the data in the monitored
TYPE:
|
custom_metrics
|
Custom metrics
TYPE:
|
inference_log
|
TYPE:
|
notification_settings
|
Field for specifying notification settings
TYPE:
|
output_schema_id
|
ID of the schema where output tables are created
TYPE:
|
schedule
|
The cron schedule
TYPE:
|
skip_builtin_dashboard
|
Whether to skip creating a default dashboard summarizing data quality metrics
TYPE:
|
slicing_exprs
|
List of column expressions to slice data with for targeted analysis. The data is grouped by each expression independently, resulting in a separate slice for each predicate and its complements. For example
TYPE:
|
snapshot
|
TYPE:
|
time_series
|
TYPE:
|
warehouse_id
|
Optional argument to specify the warehouse for dashboard creation. If not specified, the first running warehouse will be used
TYPE:
|
laktory.models.resources.databricks.dataqualitymonitor.DataQualityMonitorDataProfilingConfigCustomMetrics
¤
Bases: BaseModel
| PARAMETER | DESCRIPTION |
|---|---|
definition
|
Jinja template for a SQL expression that specifies how to compute the metric. See create metric definition
TYPE:
|
input_columns
|
A list of column names in the input table the metric should be computed for. Can use
TYPE:
|
name
|
Name of the metric in the output tables
TYPE:
|
output_data_type
|
The output type of the custom metric
TYPE:
|
type
|
The type of the custom metric. Possible values are:
TYPE:
|
laktory.models.resources.databricks.dataqualitymonitor.DataQualityMonitorDataProfilingConfigInferenceLog
¤
Bases: BaseModel
| PARAMETER | DESCRIPTION |
|---|---|
granularities
|
List of granularities to use when aggregating data into time windows based on their timestamp
TYPE:
|
label_column
|
Column for the label
TYPE:
|
model_id_column
|
Column for the model identifier
TYPE:
|
prediction_column
|
Column for the prediction
TYPE:
|
problem_type
|
Problem type the model aims to solve. Possible values are:
TYPE:
|
timestamp_column
|
Column for the timestamp
TYPE:
|
laktory.models.resources.databricks.dataqualitymonitor.DataQualityMonitorDataProfilingConfigNotificationSettings
¤
Bases: BaseModel
| PARAMETER | DESCRIPTION |
|---|---|
on_failure
|
Destinations to send notifications on failure/timeout
TYPE:
|
laktory.models.resources.databricks.dataqualitymonitor.DataQualityMonitorDataProfilingConfigNotificationSettingsOnFailure
¤
Bases: BaseModel
| PARAMETER | DESCRIPTION |
|---|---|
email_addresses
|
The list of email addresses to send the notification to. A maximum of 5 email addresses is supported
TYPE:
|
laktory.models.resources.databricks.dataqualitymonitor.DataQualityMonitorDataProfilingConfigSchedule
¤
Bases: BaseModel
| PARAMETER | DESCRIPTION |
|---|---|
quartz_cron_expression
|
The expression that determines when to run the monitor. See examples
TYPE:
|
timezone_id
|
A Java timezone id. The schedule for a job will be resolved with respect to this timezone. See
TYPE:
|
laktory.models.resources.databricks.dataqualitymonitor.DataQualityMonitorDataProfilingConfigSnapshot
¤
Bases: BaseModel
laktory.models.resources.databricks.dataqualitymonitor.DataQualityMonitorDataProfilingConfigTimeSeries
¤
Bases: BaseModel
| PARAMETER | DESCRIPTION |
|---|---|
granularities
|
List of granularities to use when aggregating data into time windows based on their timestamp
TYPE:
|
timestamp_column
|
Column for the timestamp
TYPE:
|
laktory.models.resources.databricks.dataqualitymonitor.DataQualityMonitorProviderConfig
¤
Bases: BaseModel
| PARAMETER | DESCRIPTION |
|---|---|
workspace_id
|
Workspace ID which the resource belongs to. This workspace must be part of the account which the provider is configured with
TYPE:
|