PySpark Integration¶
-
pytd.spark.
download_td_spark
(spark_binary_version='2.11', version='latest', destination=None)[source]¶ Download a td-spark jar file from S3.
- Parameters
- spark_binary_versionstring, default: ‘2.11’
Apache Spark binary version.
- versionstring, default: ‘latest’
td-spark version.
- destionationstring, optional
Where a downloaded jar file to be stored.
-
pytd.spark.
fetch_td_spark_context
(apikey=None, endpoint=None, td_spark_path=None, download_if_missing=True, spark_configs=None)[source]¶ Build TDSparkContext via td-pyspark.
- Parameters
- apikeystring, optional
Treasure Data API key. If not given, a value of environment variable
TD_API_KEY
is used by default.- endpointstring, optional
Treasure Data API server. If not given, https://api.treasuredata.com is used by default. List of available endpoints is: https://support.treasuredata.com/hc/en-us/articles/360001474288-Sites-and-Endpoints
- td_spark_pathstring, optional
Path to td-spark-assembly_x.xx-x.x.x.jar. If not given, seek a path
TDSparkContextBuilder.default_jar_path()
by default.- download_if_missingboolean, default: True
Download td-spark if it does not exist at the time of initialization.
- spark_configsdict, optional
Additional Spark configurations to be set via
SparkConf
’sset
method.
- Returns
- td_pyspark.TDSparkContext
td_pyspark.TDSparkContext¶
pytd.spark.fetch_td_spark_context()
returns td_pyspark.TDSparkContext()
. See the documentation below and sample usage on Google Colab for more information.