DataFrameDataSource

`laktory.models.datasources.DataFrameDataSource` ¤

Bases: BaseDataSource

Data source using in-memory DataFrame.

Examples:

From data with PySpark backend.

import laktory as lk

data = {
    "x": [0, 1],
    "y": ["a", "b"],
}

# From data using PySpark
source = lk.models.DataFrameDataSource(
    data=data,
    dataframe_backend="PYSPARK",
)
df = source.read()
print(df.collect(backend="pandas"))
'''
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|        x  y      |
|     0  0  a      |
|     1  1  b      |
└──────────────────┘
'''

From Polars DataFrame

import polars as pl

import laktory as lk

data = {
    "x": [0, 1],
    "y": ["a", "b"],
}

source = lk.models.DataFrameDataSource(
    df=pl.DataFrame(data),
)
df = source.read()
print(df)
'''
┌──────────────────┐
|Narwhals DataFrame|
|------------------|
|    | x | y |     |
|    |---|---|     |
|    | 0 | a |     |
|    | 1 | b |     |
└──────────────────┘
'''

References

Data Sources and Sinks

PARAMETER	DESCRIPTION
`as_stream`	If `True`source is read as a streaming DataFrame. Currently only supported by Spark DataFrame backend. TYPE: `bool \| VariableType` DEFAULT: `False`
`data`	Serialized data used to build source TYPE: `dict[str, list[Any]] \| list[dict[str, Any]] \| VariableType` DEFAULT: `None`
`df`	DataFrame object acting as source TYPE: `Any` DEFAULT: `None`
`drop_duplicates`	Remove duplicated rows from source using all columns if `True` or only the provided column names. TYPE: `bool \| list[str] \| VariableType` DEFAULT: `None`
`drops`	List of columns to drop TYPE: `list \| VariableType` DEFAULT: `None`
`filter`	SQL expression used to select specific rows from the source table TYPE: `str \| VariableType` DEFAULT: `None`
`renames`	Mapping between the source column names and desired column names TYPE: `dict[str \| VariableType, str \| VariableType] \| VariableType` DEFAULT: `None`
`selects`	Columns to select from the source. Can be specified as a list or as a dictionary to rename the source columns TYPE: `list[str] \| dict[str, str] \| VariableType` DEFAULT: `None`
`type`	Source Type TYPE: `Literal['DATAFRAME'] \| VariableType` DEFAULT: `'DATAFRAME'`

METHOD	DESCRIPTION
`read`	Read data with options specified in attributes.

`read(**kwargs)` ¤

Read data with options specified in attributes.

RETURNS	DESCRIPTION
`AnyFrame`	Resulting dataframe

Source code in laktory/models/datasources/basedatasource.py

def read(self, **kwargs) -> AnyFrame:
    """
    Read data with options specified in attributes.

    Returns
    -------
    :
        Resulting dataframe
    """
    logger.info(
        f"Reading `{self.__class__.__name__}` {self._id} with {self.dataframe_backend}"
    )
    df = self._read(**kwargs)

    # Convert to Narwhals
    if not isinstance(df, (nw.LazyFrame, nw.DataFrame)):
        df = nw.from_native(df)

    # Post read
    df = self._post_read(df)

    logger.info("Read completed.")

    return df

DataFrameDataSource

laktory.models.datasources.DataFrameDataSource ¤

read(**kwargs) ¤

`laktory.models.datasources.DataFrameDataSource` ¤

`read(**kwargs)` ¤