pytd.Client
- class pytd.Client(apikey: str | None = None, endpoint: str | None = None, database: str = 'sample_datasets', default_engine: Literal['presto', 'hive'] | QueryEngine = 'presto', header: str | bool = True, **kwargs: Any)[source]
Treasure Data client interface.
A client instance establishes a connection to Treasure Data. This interface gives easy and efficient access to Presto/Hive query engine and Plazma primary storage.
- Parameters:
apikey (str, optional) – Treasure Data API key. If not given, a value of environment variable
TD_API_KEYis used by default.endpoint (str, optional) – Treasure Data API server. If not given,
https://api.treasuredata.comis used by default. List of available endpoints is: https://api-docs.treasuredata.com/en/overview/aboutendpoints#treasure-data-api-baseurlsdatabase (str, default: 'sample_datasets') – Name of connected database.
default_engine (str, {‘presto’, ‘hive’}, or
pytd.query_engine.QueryEngine, default: ‘presto’) – Query engine. If a QueryEngine instance is given,apikey,endpoint, anddatabaseare overwritten by the values configured in the instance.header (str or bool, default: True) – Prepend comment strings, in the form “– comment”, as a header of queries. Set False to disable header.
- api_client
Connection to Treasure Data.
- Type:
tdclient.Client
- query_executed
Query execution result returned from DB-API Cursor object.
Examples
Presto query executed via
trinoreturnsTrinoResultobject:>>> import pytd >>> client = pytd.Client() >>> client.query_executed >>> client.query('select 1') >>> client.query_executed <trino.client.TrinoResult object at 0x10b9826a0>
Meanwhile,
tdclientruns a job on Treasure Data, and Cursor returns its job id:>>> client.query('select 1', priority=0) >>> client.query_executed '669563342'
Note that the optional argument
priorityforces the client to query via tdclient.- Type:
str or
trino.client.TrinoResult, default: None
- __init__(apikey: str | None = None, endpoint: str | None = None, database: str = 'sample_datasets', default_engine: Literal['presto', 'hive'] | QueryEngine = 'presto', header: str | bool = True, **kwargs: Any) None[source]
Methods
__init__([apikey, endpoint, database, ...])close()Close a client I/O session to Treasure Data.
create_database_if_not_exists(database)Create a database on Treasure Data if it does not exist.
exists(database[, table])Check if a database and table exists.
get_job(job_id)Get a td-client-python Job object from
job_id.get_table(database, table)Create a pytd table control instance.
Get a list of td-client-python Database objects.
Get a list of td-client-python Job objects.
list_tables([database])Get a list of td-client-python Table objects.
load_table_from_dataframe(dataframe, destination)Write a given DataFrame to a Treasure Data table.
query(query[, engine])Run query and get results.
- __init__(apikey: str | None = None, endpoint: str | None = None, database: str = 'sample_datasets', default_engine: Literal['presto', 'hive'] | QueryEngine = 'presto', header: str | bool = True, **kwargs: Any) None[source]
- list_databases() list[Database][source]
Get a list of td-client-python Database objects.
- Return type:
list of
tdclient.models.Database
- list_tables(database: str | None = None) list[Table][source]
Get a list of td-client-python Table objects.
- Parameters:
database (string, optional) – Database name. If not give, list tables in a table associated with this
pytd.Clientinstance.- Return type:
list of
tdclient.models.Table
- list_jobs() list[Job][source]
Get a list of td-client-python Job objects.
- Return type:
list of
tdclient.models.Job
- get_job(job_id: int) Job[source]
Get a td-client-python Job object from
job_id.- Parameters:
job_id (integer) – Job ID.
- Return type:
tdclient.models.Job
- query(query: str, engine: Literal['presto', 'hive'] | QueryEngine | None = None, **kwargs: Any) QueryResult[source]
Run query and get results.
Executed result stored in
QueryEngineis retained inself.query_executed.- Parameters:
query (str) – Query issued on a specified query engine.
engine (str, {‘presto’, ‘hive’}, or
pytd.query_engine.QueryEngine, optional) – Query engine. If not given, default query engine created in the constructor will be used.**kwargs –
Treasure Data-specific optional query parameters. Giving these keyword arguments forces query engine to issue a query via Treasure Data REST API provided by
tdclient; that is, ifengineis Presto, you cannot enjoy efficient direct access to the query engine provided bytrino.db(str): use the databaseresult_url(str): result output URLpriority(int or str): priority-2: “VERY LOW”
-1: “LOW”
0: “NORMAL”
1: “HIGH”
2: “VERY HIGH”
retry_limit(int): max number of automatic retrieswait_interval(int): sleep interval until job finishwait_callback(function): called every interval against job itselfengine_version(str): run query with Hive 2 if this parameter is set to"stable"andenginedenotes Hive. https://api-docs.treasuredata.com/en/tools/hive/writing_hive_queries
Meanwhile, when a following argument is set to
True, query is deterministically issued viatdclient.force_tdclient(bool): force Presto engines to issue a query viatdclientrather than its defaulttrinointerface.
- Returns:
dict –
- ‘data’
List of rows. Every single row is represented as a list of column values.
- ’columns’
List of column names.
- Return type:
keys (‘data’, ‘columns’)
- get_table(database: str, table: str) Table[source]
Create a pytd table control instance.
- Parameters:
database (str) – Database name.
table (str) – Table name.
- Return type:
- exists(database: str, table: str | None = None) bool[source]
Check if a database and table exists.
- Parameters:
database (str) – Database name.
table (str, optional) – Table name. If not given, just check the database existence.
- Return type:
bool
- create_database_if_not_exists(database: str) None[source]
Create a database on Treasure Data if it does not exist.
- Parameters:
database (str) – Database name.
- load_table_from_dataframe(dataframe: DataFrame, destination: str | Table, writer: Literal['bulk_import', 'insert_into', 'spark'] | Writer = 'bulk_import', if_exists: Literal['error', 'overwrite', 'append', 'ignore'] = 'error', **kwargs: Any) None[source]
Write a given DataFrame to a Treasure Data table.
This function may initialize a Writer instance. Note that, as a part of the initialization process for SparkWriter, the latest version of td-spark will be downloaded.
- Parameters:
dataframe (
pandas.DataFrame) – Data loaded to a target table.destination (str, or
pytd.table.Table) – Target table.writer (str, {‘bulk_import’, ‘insert_into’, ‘spark’}, or
pytd.writer.Writer, default: ‘bulk_import’) – A Writer to choose writing method to Treasure Data. If not given or string value, a temporal Writer instance will be created.if_exists (str, {'error', 'overwrite', 'append', 'ignore'}, default: 'error') –
What happens when a target table already exists.
error: raise an exception.
overwrite: drop it, recreate it, and insert data.
append: insert data. Create if does not exist.
ignore: do nothing.