pytd.Client

class pytd.Client(apikey: str | None = None, endpoint: str | None = None, database: str = 'sample_datasets', default_engine: Literal['presto', 'hive'] | QueryEngine = 'presto', header: str | bool = True, **kwargs: Any)[source]

Treasure Data client interface.

A client instance establishes a connection to Treasure Data. This interface gives easy and efficient access to Presto/Hive query engine and Plazma primary storage.

Parameters:

apikey (str, optional) – Treasure Data API key. If not given, a value of environment variable TD_API_KEY is used by default.
endpoint (str, optional) – Treasure Data API server. If not given, https://api.treasuredata.com is used by default. List of available endpoints is: https://api-docs.treasuredata.com/en/overview/aboutendpoints#treasure-data-api-baseurls
database (str, default: 'sample_datasets') – Name of connected database.
default_engine (str, {‘presto’, ‘hive’}, or pytd.query_engine.QueryEngine, default: ‘presto’) – Query engine. If a QueryEngine instance is given, apikey, endpoint, and database are overwritten by the values configured in the instance.
header (str or bool, default: True) – Prepend comment strings, in the form “– comment”, as a header of queries. Set False to disable header.

api_client

Connection to Treasure Data.

Type:: tdclient.Client

query_executed

Query execution result returned from DB-API Cursor object.

Examples

Presto query executed via trino returns TrinoResult object:

>>> import pytd
>>> client = pytd.Client()
>>> client.query_executed
>>> client.query('select 1')
>>> client.query_executed
<trino.client.TrinoResult object at 0x10b9826a0>

Meanwhile, tdclient runs a job on Treasure Data, and Cursor returns its job id:

>>> client.query('select 1', priority=0)
>>> client.query_executed
'669563342'

Note that the optional argument priority forces the client to query via tdclient.

Type:: str or trino.client.TrinoResult, default: None

__init__(apikey: str | None = None, endpoint: str | None = None, database: str = 'sample_datasets', default_engine: Literal['presto', 'hive'] | QueryEngine = 'presto', header: str | bool = True, **kwargs: Any) → None[source]

Methods

`__init__`([apikey, endpoint, database, ...])
`close`()	Close a client I/O session to Treasure Data.
`create_database_if_not_exists`(database)	Create a database on Treasure Data if it does not exist.
`exists`(database[, table])	Check if a database and table exists.
`get_job`(job_id)	Get a td-client-python Job object from `job_id`.
`get_table`(database, table)	Create a pytd table control instance.
`list_databases`()	Get a list of td-client-python Database objects.
`list_jobs`()	Get a list of td-client-python Job objects.
`list_tables`([database])	Get a list of td-client-python Table objects.
`load_table_from_dataframe`(dataframe, destination)	Write a given DataFrame to a Treasure Data table.
`query`(query[, engine])	Run query and get results.

__init__(apikey: str | None = None, endpoint: str | None = None, database: str = 'sample_datasets', default_engine: Literal['presto', 'hive'] | QueryEngine = 'presto', header: str | bool = True, **kwargs: Any) → None[source]

list_databases() → list[Database][source]

Get a list of td-client-python Database objects.

Return type:: list of tdclient.models.Database

list_tables(database: str | None = None) → list[Table][source]

Get a list of td-client-python Table objects.

Parameters:: database (string, optional) – Database name. If not give, list tables in a table associated with this pytd.Client instance.
Return type:: list of tdclient.models.Table

list_jobs() → list[Job][source]

Get a list of td-client-python Job objects.

Return type:: list of tdclient.models.Job

get_job(job_id: int) → Job[source]

Get a td-client-python Job object from job_id.

Parameters:: job_id (integer) – Job ID.
Return type:: tdclient.models.Job

close() → None[source]: Close a client I/O session to Treasure Data.

query(query: str, engine: Literal['presto', 'hive'] | QueryEngine | None = None, **kwargs: Any) → QueryResult[source]

Run query and get results.

Executed result stored in QueryEngine is retained in self.query_executed.

Parameters:

query (str) – Query issued on a specified query engine.
engine (str, {‘presto’, ‘hive’}, or pytd.query_engine.QueryEngine, optional) – Query engine. If not given, default query engine created in the constructor will be used.
**kwargs –
Treasure Data-specific optional query parameters. Giving these keyword arguments forces query engine to issue a query via Treasure Data REST API provided by tdclient; that is, if engine is Presto, you cannot enjoy efficient direct access to the query engine provided by trino.
- db (str): use the database
- result_url (str): result output URL
- priority (int or str): priority
  - -2: “VERY LOW”
  - -1: “LOW”
  - 0: “NORMAL”
  - 1: “HIGH”
  - 2: “VERY HIGH”
- retry_limit (int): max number of automatic retries
- wait_interval (int): sleep interval until job finish
- wait_callback (function): called every interval against job itself
- engine_version (str): run query with Hive 2 if this parameter is set to "stable" and engine denotes Hive. https://api-docs.treasuredata.com/en/tools/hive/writing_hive_queries
Meanwhile, when a following argument is set to True, query is deterministically issued via tdclient.
- force_tdclient (bool): force Presto engines to issue a query via tdclient rather than its default trino interface.

Returns:

dict –

‘data’: List of rows. Every single row is represented as a list of column values.
’columns’: List of column names.

Return type:

keys (‘data’, ‘columns’)

get_table(database: str, table: str) → Table[source]

Create a pytd table control instance.

Parameters:

database (str) – Database name.
table (str) – Table name.

Return type:

pytd.table.Table

exists(database: str, table: str | None = None) → bool[source]

Check if a database and table exists.

Parameters:

database (str) – Database name.
table (str, optional) – Table name. If not given, just check the database existence.

Return type:

bool

create_database_if_not_exists(database: str) → None[source]

Create a database on Treasure Data if it does not exist.

Parameters:: database (str) – Database name.

load_table_from_dataframe(dataframe: DataFrame, destination: str | Table, writer: Literal['bulk_import', 'insert_into', 'spark'] | Writer = 'bulk_import', if_exists: Literal['error', 'overwrite', 'append', 'ignore'] = 'error', **kwargs: Any) → None[source]

Write a given DataFrame to a Treasure Data table.

This function may initialize a Writer instance. Note that, as a part of the initialization process for SparkWriter, the latest version of td-spark will be downloaded.

Parameters:

dataframe (pandas.DataFrame) – Data loaded to a target table.
destination (str, or pytd.table.Table) – Target table.
writer (str, {‘bulk_import’, ‘insert_into’, ‘spark’}, or pytd.writer.Writer, default: ‘bulk_import’) – A Writer to choose writing method to Treasure Data. If not given or string value, a temporal Writer instance will be created.
if_exists (str, {'error', 'overwrite', 'append', 'ignore'}, default: 'error') –
What happens when a target table already exists.
- error: raise an exception.
- overwrite: drop it, recreate it, and insert data.
- append: insert data. Create if does not exist.
- ignore: do nothing.