Python MySQL doesn't encode surrogates for query parameters

60 Views Asked by At

Running tried this with both Python3.7 and Python3.8, with mysql-connector-python 8.0.13 and 8.1.0

MySQL 5.7.42

Collation on the database is set to 'utf8mb4_unicode_520_ci'

Connection from Python is:

db =  None
db = mysql.connector.connect(
    host="localhost",
    user=username,
    passwd=password,
    database=eventdb,
    charset="utf8mb4",
    use_unicode=True
)

cur = None
cur = db.cursor(dictionary=True)

I have a string that comes from a json.dump and attempting to run a parameterized query with it:

data["name"] = '\udced\udca0\udcbe\udced\udcb7\udca1\n\n\udced\udca0\udcbe\udced\udcb7\udca1\n\n♡ADANA♡♡EOMON♡'

sql = "SELECT db_name_id FROM db_name WHERE name = %s"
val = (data["name"],)
curr.execute(sql_text, sql_val)

mysql-connector-python 8.0.13 on both version of Python returns UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-5: surrogates not allowed

mysql-connector-python 8.1.0 on Python.38 returns _mysql_connector.MySQLInterfaceError: Failed converting Python 'str'

However if I execute a simple query:

cur.execute(SELECT db_name_id FROM db_name WHERE name = '\udced\udca0\udcbe\udced\udcb7\udca1\n\n\udced\udca0\udcbe\udced\udcb7\udca1\n\n♡ADANA♡♡EOMON♡')

Then it executes without error, this is a user entered field though and I really DON'T want to be doing the query without parameters.

The simplest example that replicates the exception error I'm seeing is using the C Extension directly:

import _mysql_connector

ccnx = _mysql_connector.MySQL()
ccnx.connect(
                host="localhost",
                user="user",
                password="password",
                database="database"
            )

bad_str = 'just_an_��_example'

try:
    str_converted = ccnx.convert_to_mysql(*[bad_str])
    print('str converted is %s', str_converted)
except Exception as e:
    print('cant convert bad str %s',bad_str)
    print(e)

I've only tested this with mysql-connector-python 8.1.0.

If I make the following change based on information MySQL Bug 99757, then the convert_to_mysql works:

import _mysql_connector

ccnx = _mysql_connector.MySQL()
ccnx.connect(
                host="localhost",
                user="user",
                password="password",
                database="database"
            )
ccnx.set_character_set('utf8')
bad_str = 'just_an_��_example'

try:
    str_converted = ccnx.convert_to_mysql(*[bad_str])
    print('str converted is %s', str_converted)
except Exception as e:
    print('cant convert bad str %s',bad_str)
    print(e)

It seems like the conversion to a mysql string is broken for some cases, including parameterized strings with surrogates. I'm hoping there's just something I missed.

0

There are 0 best solutions below