pytd.Client

class pytd.Client(apikey=None, endpoint=None, database='sample_datasets', default_engine='presto', header=True, **kwargs)[source]

Treasure Data client interface.

A client instance establishes a connection to Treasure Data. This interface gives easy and efficient access to Presto/Hive query engine and Plazma primary storage.

Parameters
apikeystring, optional

Treasure Data API key. If not given, a value of environment variable TD_API_KEY is used by default.

endpointstring, optional

Treasure Data API server. If not given, https://api.treasuredata.com is used by default. List of available endpoints is: https://support.treasuredata.com/hc/en-us/articles/360001474288-Sites-and-Endpoints

databasestring, default: ‘sample_datasets’

Name of connected database.

default_enginestring, {‘presto’, ‘hive’}, or pytd.query_engine.QueryEngine, default: ‘presto’

Query engine. If a QueryEngine instance is given, apikey, endpoint, and database are overwritten by the values configured in the instance.

headerstring or boolean, default: True

Prepend comment strings, in the form “– comment”, as a header of queries. Set False to disable header.

__init__(self, apikey=None, endpoint=None, database='sample_datasets', default_engine='presto', header=True, **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(self[, apikey, endpoint, database, …])

Initialize self.

close(self)

Close a client I/O session to Treasure Data.

get_job(self, job_id)

Get a td-client-python Job object from job_id.

get_table(self, database, table)

Create a pytd table control instance.

list_databases(self)

Get a list of td-client-python Database objects.

list_jobs(self)

Get a list of td-client-python Job objects.

list_tables(self[, database])

Get a list of td-client-python Table objects.

load_table_from_dataframe(self, dataframe, …)

Write a given DataFrame to a Treasure Data table.

query(self, query[, engine])

Run query and get results.

__init__(self, apikey=None, endpoint=None, database='sample_datasets', default_engine='presto', header=True, **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

list_databases(self)[source]

Get a list of td-client-python Database objects.

Returns
list of tdclient.models.Database
list_tables(self, database=None)[source]

Get a list of td-client-python Table objects.

Parameters
databasestring, optional

Database name. If not give, list tables in a table associated with this pytd.Client instance.

Returns
list of tdclient.models.Table
list_jobs(self)[source]

Get a list of td-client-python Job objects.

Returns
list of tdclient.models.Job
get_job(self, job_id)[source]

Get a td-client-python Job object from job_id.

Parameters
job_idinteger

Job ID.

Returns
tdclient.models.Job
close(self)[source]

Close a client I/O session to Treasure Data.

query(self, query, engine=None, **kwargs)[source]

Run query and get results.

Parameters
querystring

Query issued on a specified query engine.

enginestring, {‘presto’, ‘hive’}, or pytd.query_engine.QueryEngine, optional

Query engine. If not given, default query engine created in the constructor will be used.

**kwargs

Treasure Data-specific optional query parameters. Giving these keyword arguments forces query engine to issue a query via Treasure Data REST API provided by tdclient; that is, if engine is Presto, you cannot enjoy efficient direct access to the query engine provided by prestodb.

  • db (str): use the database

  • result_url (str): result output URL

  • priority (int or str): priority
    • -2: “VERY LOW”

    • -1: “LOW”

    • 0: “NORMAL”

    • 1: “HIGH”

    • 2: “VERY HIGH”

  • retry_limit (int): max number of automatic retries

  • wait_interval (int): sleep interval until job finish

  • wait_callback (function): called every interval against job itself

Returns
dictkeys (‘data’, ‘columns’)
‘data’

List of rows. Every single row is represented as a list of column values.

‘columns’

List of column names.

get_table(self, database, table)[source]

Create a pytd table control instance.

Parameters
databasestring

Database name.

tablestring

Table name.

Returns
pytd.table.Table
load_table_from_dataframe(self, dataframe, destination, writer='bulk_import', if_exists='error', **kwargs)[source]

Write a given DataFrame to a Treasure Data table.

This function may initialize a Writer instance. Note that, as a part of the initialization process for SparkWriter, the latest version of td-spark will be downloaded.

Parameters
dataframepandas.DataFrame

Data loaded to a target table.

destinationstring, or pytd.table.Table

Target table.

writerstring, {‘bulk_import’, ‘insert_into’, ‘spark’}, or pytd.writer.Writer, default: ‘bulk_import’

A Writer to choose writing method to Treasure Data. If not given or string value, a temporal Writer instance will be created.

if_exists{‘error’, ‘overwrite’, ‘append’, ‘ignore’}, default: ‘error’

What happens when a target table already exists. - error: raise an exception. - overwrite: drop it, recreate it, and insert data. - append: insert data. Create if does not exist. - ignore: do nothing.