pytd.writer.SparkWriter

class pytd.writer.SparkWriter(td_spark_path: str | None = None, download_if_missing: bool = True, spark_configs: dict[str, Any] | None = None)[source]

A writer module that loads Python data to Treasure Data.

Parameters:

td_spark_path (str, optional) – Path to td-spark-assembly-{td-spark-version}_spark{spark-version}.jar. If not given, seek a path TDSparkContextBuilder.default_jar_path() by default.
download_if_missing (bool, default: True) – Download td-spark if it does not exist at the time of initialization.
spark_configs (dict, optional) – Additional Spark configurations to be set via SparkConf’s set method.

td_spark_path

Path to td-spark-assembly-{td-spark-version}_spark{spark-version}.jar.

Type:: str

download_if_missing

Download td-spark if it does not exist at the time of initialization.

Type:: bool

spark_configs

Additional Spark configurations to be set via SparkConf’s set method.

Type:: dict

td_spark

Connection of td-spark

Type:: td_pyspark.TDSparkContext

__init__(td_spark_path: str | None = None, download_if_missing: bool = True, spark_configs: dict[str, Any] | None = None) → None[source]

Methods

`__init__`([td_spark_path, ...])
`close`()	Close a PySpark session connected to Treasure Data.
`from_string`(writer, **kwargs)
`write_dataframe`(dataframe, table, if_exists)	Write a given DataFrame to a Treasure Data table.

Attributes

closed

__init__(td_spark_path: str | None = None, download_if_missing: bool = True, spark_configs: dict[str, Any] | None = None) → None[source]

property closed: bool

write_dataframe(dataframe: DataFrame, table: Table, if_exists: Literal['error', 'overwrite', 'append', 'ignore']) → None[source]

Write a given DataFrame to a Treasure Data table.

This method internally converts a given pandas.DataFrame into Spark DataFrame, and directly writes to Treasure Data’s main storage so-called Plazma through a PySpark session.

Parameters:

dataframe (pandas.DataFrame) – Data loaded to a target table.
table (pytd.table.Table) – Target table.
if_exists ({'error', 'overwrite', 'append', 'ignore'}) –
What happens when a target table already exists.
- error: raise an exception.
- overwrite: drop it, recreate it, and insert data.
- append: insert data. Create if does not exist.
- ignore: do nothing.

close() → None[source]: Close a PySpark session connected to Treasure Data.