pytd.writer.SparkWriter

class pytd.writer.SparkWriter(td_spark_path=None, download_if_missing=True, spark_configs=None)[source]

A writer module that loads Python data to Treasure Data.

Parameters
td_spark_pathstring, optional

Path to td-spark-assembly_x.xx-x.x.x.jar. If not given, seek a path TDSparkContextBuilder.default_jar_path() by default.

download_if_missingboolean, default: True

Download td-spark if it does not exist at the time of initialization.

spark_configsdict, optional

Additional Spark configurations to be set via SparkConf’s set method.

__init__(self, td_spark_path=None, download_if_missing=True, spark_configs=None)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(self[, td_spark_path, …])

Initialize self.

close(self)

Close a PySpark session connected to Treasure Data.

from_string(writer, \*\*kwargs)

write_dataframe(self, dataframe, table, …)

Write a given DataFrame to a Treasure Data table.

Attributes

closed

__init__(self, td_spark_path=None, download_if_missing=True, spark_configs=None)[source]

Initialize self. See help(type(self)) for accurate signature.

property closed
write_dataframe(self, dataframe, table, if_exists)[source]

Write a given DataFrame to a Treasure Data table.

This method internally converts a given pandas.DataFrame into Spark DataFrame, and directly writes to Treasure Data’s main storage so-called Plazma through a PySpark session.

Parameters
dataframepandas.DataFrame

Data loaded to a target table.

tablepytd.table.Table

Target table.

if_exists{‘error’, ‘overwrite’, ‘append’, ‘ignore’}

What happens when a target table already exists.

  • error: raise an exception.

  • overwrite: drop it, recreate it, and insert data.

  • append: insert data. Create if does not exist.

  • ignore: do nothing.

close(self)[source]

Close a PySpark session connected to Treasure Data.