Apache Impala setup
- Maintained by: Cloudera
- Authors: Cloudera
- GitHub repo: cloudera/dbt-impala
- PyPI package:
dbt-impala
- Slack channel: #db-impala
- Supported dbt Core version: v1.1.0 and newer
- dbt Cloud support: Not Supported
- Minimum data platform version: n/a
Installing dbt-impala
Use pip
to install the adapter. Before 1.8, installing the adapter would automatically install dbt-core
and any additional dependencies. Beginning in 1.8, installing an adapter does not automatically install dbt-core
. This is because adapters and dbt Core versions have been decoupled from each other so we no longer want to overwrite existing dbt-core installations.
Use the following command for installation:
python -m pip install dbt-core dbt-impala
Configuring dbt-impala
For Impala-specific configuration, please refer to Impala configs.
Connection Methods
dbt-impala can connect to Apache Impala and Cloudera Data Platform clusters.
The Impyla library is used to establish connections to Impala.
Two transport mechanisms are supported:
- binary
- HTTP(S)
The default mechanism is binary
. To use HTTP transport, use the boolean option use_http_transport: [true / false]
.
Authentication Methods
dbt-impala supports three authentication mechanisms:
insecure
No authentication is used, only recommended for testing.ldap
Authentication via LDAPkerbros
Authentication via Kerberos (GSSAPI)
Insecure
This method is only recommended if you have a local install of Impala and want to test out the dbt-impala adapter.
your_profile_name:
target: dev
outputs:
dev:
type: impala
host: [host] # default value: localhost
port: [port] # default value: 21050
dbname: [db name] # this should be same as schema name provided below, starting with 1.1.2 this parameter is optional
schema: [schema name]
LDAP
LDAP allows you to authenticate with a username & password when Impala is configured with LDAP Auth. LDAP is supported over Binary & HTTP connection mechanisms.
This is the recommended authentication mechanism to use with Cloudera Data Platform (CDP).
your_profile_name:
target: dev
outputs:
dev:
type: impala
host: [host name]
http_path: [optional, http path to Impala]
port: [port] # default value: 21050
auth_type: ldap
use_http_transport: [true / false] # default value: true
use_ssl: [true / false] # TLS should always be used with LDAP to ensure secure transmission of credentials, default value: true
username: [username]
password: [password]
dbname: [db name] # this should be same as schema name provided below, starting with 1.1.2 this parameter is optional
schema: [schema name]
retries: [retries] # number of times impyla attempts retry conneciton to warehouse, default value: 3
Note: When creating workload user in CDP ensure that the user has CREATE, SELECT, ALTER, INSERT, UPDATE, DROP, INDEX, READ and WRITE permissions. If the user is required to execute GRANT statements, see for instance (https://docs.getdbt.com/reference/resource-configs/grants) or (https://docs.getdbt.com/reference/project-configs/on-run-start-on-run-end) appropriate GRANT permissions should be configured. When using Apache Ranger, permissions for allowing GRANT are typically set using "Delegate Admin" option.
Kerberos
The Kerberos authentication mechanism uses GSSAPI to share Kerberos credentials when Impala is configured with Kerberos Auth.
your_profile_name:
target: dev
outputs:
dev:
type: impala
host: [hostname]
port: [port] # default value: 21050
auth_type: [GSSAPI]
kerberos_service_name: [kerberos service name] # default value: None
use_http_transport: true # default value: true
use_ssl: true # TLS should always be used with LDAP to ensure secure transmission of credentials, default value: true
dbname: [db name] # this should be same as schema name provided below, starting with 1.1.2 this parameter is optional
schema: [schema name]
retries: [retries] # number of times impyla attempts retry conneciton to warehouse, default value: 3
Note: A typical setup of Cloudera EDH will involve the following steps to setup Kerberos before one can execute dbt commands:
- Get the correct realm config file for your installation (krb5.conf)
- Set environment variable to point to the config file (export KRB5_CONFIG=/path/to/krb5.conf)
- Set correct permissions for config file (sudo chmod 644 /path/to/krb5.conf)
- Obtain keytab using kinit (kinit username@YOUR_REALM.YOUR_DOMAIN)
- The keytab is valid for certain period after which you will need to run kinit again to renew validity of the keytab.
Instrumentation
By default, the adapter will send instrumentation events to Cloudera to help improve functionality and understand bugs. If you want to specifically switch this off, for instance, in a production environment, you can explicitly set the flag usage_tracking: false
in your profiles.yml
file.
Relatedly, if you'd like to turn off dbt Lab's anonymous usage tracking, see YAML Configurations: Send anonymous usage stats for more info
Supported Functionality
Name | Supported |
---|---|
Materialization: Table | Yes |
Materialization: View | Yes |
Materialization: Incremental - Append | Yes |
Materialization: Incremental - Insert+Overwrite | Yes |
Materialization: Incremental - Merge | No |
Materialization: Ephemeral | No |
Seeds | Yes |
Tests | Yes |
Snapshots | Yes |
Documentation | Yes |
Authentication: LDAP | Yes |
Authentication: Kerberos | Yes |