DataFrameDataSource
laktory.models.datasources.DataFrameDataSource
¤
Bases: BaseDataSource
Data source using in-memory DataFrame.
Examples:
From data with PySpark backend.
import laktory as lk
data = {
"x": [0, 1],
"y": ["a", "b"],
}
# From data using PySpark
source = lk.models.DataFrameDataSource(
data=data,
dataframe_backend="PYSPARK",
)
df = source.read()
print(df.collect(backend="pandas"))
'''
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
| x y |
| 0 0 a |
| 1 1 b |
└──────────────────┘
'''
From Polars DataFrame
import polars as pl
import laktory as lk
data = {
"x": [0, 1],
"y": ["a", "b"],
}
source = lk.models.DataFrameDataSource(
df=pl.DataFrame(data),
)
df = source.read()
print(df)
'''
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
| | x | y | |
| |---|---| |
| | 0 | a | |
| | 1 | b | |
└──────────────────┘
'''
| PARAMETER | DESCRIPTION |
|---|---|
as_stream
|
If
TYPE:
|
data
|
Serialized data used to build source
TYPE:
|
df
|
DataFrame object acting as source
TYPE:
|
drop_duplicates
|
Remove duplicated rows from source using all columns if
TYPE:
|
drops
|
List of columns to drop
TYPE:
|
filter
|
SQL expression used to select specific rows from the source table
TYPE:
|
renames
|
Mapping between the source column names and desired column names
TYPE:
|
selects
|
Columns to select from the source. Can be specified as a list or as a dictionary to rename the source columns
TYPE:
|
type
|
Source Type
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
read |
Read data with options specified in attributes. |
read(**kwargs)
¤
Read data with options specified in attributes.
| RETURNS | DESCRIPTION |
|---|---|
AnyFrame
|
Resulting dataframe |
Source code in laktory/models/datasources/basedatasource.py
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 | |