Impyla is returning values in bytes format

421 Views Asked by At

I'm trying to receive data in JH from Impyla, everything works fine except tables in one DB are returning data in b'' format.

Code:

from impala.dbapi import connect

conn = connect(host=host, port=21050, user={userName}, use_ssl=True, auth_mechanism='GSSAPI', kerberos_service_name='impala', database=db)
cursor = conn.cursor()
cursor.execute(sql)
data = cursor.fetchall()

example output:

b'', b'UK', b'X', b'Hlavn\xc3\xad 51',

It is happening only on 1 DB, other DBs and tables that I have tested are ok in utf-8 (tested on 4 DBs). + Not every column is in b''.

Packages:

impyla 0.17.0 pypi_0 pypi
bitarray 2.1.0 pypi_0 pypi
six 1.14.0 py_1 conda-forge
thrift 0.11.0 pypi_0 pypi
thrift-cpp 0.13.0 h62aa4f2_2 conda-forge
thrift-sasl 0.4.3 pypi_0 pypi
thriftpy 0.3.9 py37h516909a_1001 conda-forge
thriftpy2 0.4.14 py37h5e8e339_0 conda-forge
krb5 1.17.2 h926e7f8_0 conda-forge

However, if I run same query not from JH, but directly from server the output is in correct encoding - no bytes.

Packages on server:

impyla 0.16.3 py37hc8dfbb8_0 conda-forge
bitarray 2.0.1 py37h5e8e339_0 conda-forge
thrift 0.13.0 py37hcd2ae1e_2 conda-forge
thrift_sasl 0.4.2 py37h8f50634_0 conda-forge
thriftpy 0.3.9 py37h516909a_1001 conda-forge
thriftpy2 0.4.14 py37h5e8e339_0 conda-forge
six 1.15.0 pyh9f0ad1d_0 conda-forge
krb5 1.19.1 hcc1bbae_0 conda-forge

Any clues? :) Thank you.


EDIT: 07. 06. Format is in bytes because columns are varchar. String columns format = utf-8 encoded string. But varchars and chars are in bytes format. It appears that they changed it with version upgrade, as I have described behaviour server/JH (different versions). So I would have solved this by downgrading version, but the lower version is returning "invalid query handle" when trying to select a large number of rows :(

Im adding this link, which describes the issue, workaround and future progress: https://github.com/cloudera/impyla/issues/455

0

There are 0 best solutions below