Skip to content

DataFrameDataSource

laktory.models.datasources.DataFrameDataSource ¤

Bases: BaseDataSource

Data source using in-memory DataFrame.

Examples:

From data with PySpark backend.

import laktory as lk

data = {
    "x": [0, 1],
    "y": ["a", "b"],
}

# From data using PySpark
source = lk.models.DataFrameDataSource(
    data=data,
    dataframe_backend="PYSPARK",
)
df = source.read()
print(df.collect(backend="pandas"))
'''
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|        x  y      |
|     0  0  a      |
|     1  1  b      |
└──────────────────┘
'''

From Polars DataFrame

import polars as pl

import laktory as lk

data = {
    "x": [0, 1],
    "y": ["a", "b"],
}

source = lk.models.DataFrameDataSource(
    df=pl.DataFrame(data),
)
df = source.read()
print(df)
'''
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|    | x | y |     |
|    |---|---|     |
|    | 0 | a |     |
|    | 1 | b |     |
└──────────────────┘
'''
References
PARAMETER DESCRIPTION
as_stream

If Truesource is read as a streaming DataFrame. Currently only supported by Spark DataFrame backend.

TYPE: bool | VariableType DEFAULT: False

data

Serialized data used to build source

TYPE: dict[str, list[Any]] | list[dict[str, Any]] | VariableType DEFAULT: None

df

DataFrame object acting as source

TYPE: Any DEFAULT: None

drop_duplicates

Remove duplicated rows from source using all columns if True or only the provided column names.

TYPE: bool | list[str] | VariableType DEFAULT: None

drops

List of columns to drop

TYPE: list | VariableType DEFAULT: None

filter

SQL expression used to select specific rows from the source table

TYPE: str | VariableType DEFAULT: None

renames

Mapping between the source column names and desired column names

TYPE: dict[str | VariableType, str | VariableType] | VariableType DEFAULT: None

selects

Columns to select from the source. Can be specified as a list or as a dictionary to rename the source columns

TYPE: list[str] | dict[str, str] | VariableType DEFAULT: None

type

Source Type

TYPE: Literal['DATAFRAME'] | VariableType DEFAULT: 'DATAFRAME'

METHOD DESCRIPTION
read

Read data with options specified in attributes.

read(**kwargs) ¤

Read data with options specified in attributes.

RETURNS DESCRIPTION
AnyFrame

Resulting dataframe

Source code in laktory/models/datasources/basedatasource.py
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
def read(self, **kwargs) -> AnyFrame:
    """
    Read data with options specified in attributes.

    Returns
    -------
    :
        Resulting dataframe
    """
    logger.info(
        f"Reading `{self.__class__.__name__}` {self._id} with {self.dataframe_backend}"
    )
    df = self._read(**kwargs)

    # Convert to Narwhals
    if not isinstance(df, (nw.LazyFrame, nw.DataFrame)):
        df = nw.from_native(df)

    # Post read
    df = self._post_read(df)

    logger.info("Read completed.")

    return df