FileDataSource
laktory.models.datasources.FileDataSource
¤
Bases: BaseDataSource
Data source using disk files, such data events (json/csv) or full dataframes.
Examples:
from laktory import models
source = models.FileDataSource(
path="/Volumes/sources/landing/events/yahoo-finance/stock_price",
format="JSON",
dataframe_backend="POLARS",
)
# df = source.read()
# With Explicit Schema
source = models.FileDataSource(
path="/Volumes/sources/landing/events/yahoo-finance/stock_price",
format="JSON",
dataframe_backend="PYSPARK",
schema={
"columns": {
"symbol": "String",
"open": "Float64",
"close": "Float64",
}
},
)
# df = source.read()
| PARAMETER | DESCRIPTION |
|---|---|
as_stream
|
If
TYPE:
|
drop_duplicates
|
Remove duplicated rows from source using all columns if
TYPE:
|
drops
|
List of columns to drop
TYPE:
|
filter
|
SQL expression used to select specific rows from the source table
TYPE:
|
format
|
Format of the data files.
TYPE:
|
has_header
|
Indicate if the first row of the dataset is a header or not. Only applicable to 'CSV' format.
TYPE:
|
infer_schema
|
When
TYPE:
|
path
|
File path on a local disk, remote storage or Databricks volume.
TYPE:
|
reader_kwargs
|
Keyword arguments passed directly to dataframe backend reader. Passed to
TYPE:
|
reader_methods
|
DataFrame backend reader methods.
TYPE:
|
renames
|
Mapping between the source column names and desired column names
TYPE:
|
schema_definition
|
Target schema specified as a list of columns, as a dict or a json serialization. Only used when reading data from non-strongly typed files such as JSON or csv files.
TYPE:
|
schema_location_
|
Path for schema inference when reading data as a stream. If
TYPE:
|
selects
|
Columns to select from the source. Can be specified as a list or as a dictionary to rename the source columns
TYPE:
|
type
|
Source Type
TYPE:
|
| METHOD | DESCRIPTION |
|---|---|
read |
Read data with options specified in attributes. |
read(**kwargs)
¤
Read data with options specified in attributes.
| RETURNS | DESCRIPTION |
|---|---|
AnyFrame
|
Resulting dataframe |
Source code in laktory/models/datasources/basedatasource.py
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 | |