BaseDataSource
laktory.models.datasources.BaseDataSource
¤
Bases: BaseModel, PipelineChild
Base class for data sources.
| PARAMETER | DESCRIPTION |
|---|---|
as_stream
|
If
TYPE:
|
drop_duplicates
|
Remove duplicated rows from source using all columns if
TYPE:
|
drops
|
List of columns to drop
TYPE:
|
filter
|
SQL expression used to select specific rows from the source table
TYPE:
|
renames
|
Mapping between the source column names and desired column names
TYPE:
|
selects
|
Columns to select from the source. Can be specified as a list or as a dictionary to rename the source columns
TYPE:
|
type
|
Name of the data source type
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
read |
Read data with options specified in attributes. |
read(**kwargs)
¤
Read data with options specified in attributes.
| RETURNS | DESCRIPTION |
|---|---|
AnyFrame
|
Resulting dataframe |
Source code in laktory/models/datasources/basedatasource.py
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 | |
laktory.models.datasources.basedatasource.DataFrameSample
¤
Bases: BaseModel
| PARAMETER | DESCRIPTION |
|---|---|
fraction
|
TYPE:
|
n
|
TYPE:
|
seed
|
TYPE:
|
laktory.models.datasources.basedatasource.Watermark
¤
Bases: BaseModel
References
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#handling-late-data-and-watermarking
| PARAMETER | DESCRIPTION |
|---|---|
column
|
Event time column name
TYPE:
|
threshold
|
How late, expressed in seconds, the data is expected to be with respect to event time.
TYPE:
|