I am trying to use PyHive and SQLAlchemy to bulk insert data into a Hive database on a Hadoop cluster.
Here is the relevant part of my code
from sqlalchemy import DateTime, String, Float
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *
engine = create_engine(...)
meta = MetaData()
con = engine.connect()
dataTable = Table(
'data', meta,
Column("timestamp",DateTime),
Column("id", String),
...
)
dbdata = []
...
for data in some_source:
dbdata.append({
"timestamp": data.time, #this is a python DateTime object
"id": data.id,
...
})
con.execute(dataTable.insert(), dbdata)
I am receiving the following error:
(pyhive.exc.OperationalError) TExecuteStatementResp(status=TStatus(statusCode=3, infoMessages=['*org.apache.hive.service.cli.HiveSQLException:Error running query:
[INCOMPATIBLE_DATA_FOR_TABLE.CANNOT_SAFELY_CAST] org.apache.spark.sql.AnalysisException:
[INCOMPATIBLE_DATA_FOR_TABLE.CANNOT_SAFELY_CAST] Cannot write incompatible data for the table `spark_catalog`.`default`.`data`: Cannot safely cast `timestamp` "STRING" to "TIMESTAMP".:26:25',