I have the following table - note there is a composite primary key consisting of 5 columns.

test_ts = Table('test_ts', meta,
    Column('metric_id', ForeignKey('metric.id'), primary_key = True), 
    Column('entity_id', Integer, primary_key = True), 
    Column('date', DateTime, primary_key = True),              
    Column('freq', String, primary_key = True),           
    Column('context', String, primary_key = True, server_default=""),
    Column('value', String),
    Column("update_time", DateTime, server_default=func.now(), onupdate=func.current_timestamp()),
    Column('update_by', String, server_default=func.current_user()))

I am trying to bulk insert some data into this POSTGRES table, but where the 'context' column may be an empty string (''):

                       date context  value freq  entity_id  metric_id
1  1999-02-01T05:00:00.000Z    test    101    D       1105          4
2  1999-02-02T05:00:00.000Z    test    102    D       1105          4
8  1999-02-01T05:00:00.000Z            201    D       1105          4
9  1999-02-02T05:00:00.000Z            202    D       1105          4

When trying to do this however, i get the following error:

Traceback (most recent call last):
  File "/workspaces/service/data_svc.py", line 121, in bulk_insert
    cur.execute(sql_insert)
  File "/usr/local/lib/python3.11/site-packages/psycopg/cursor.py", line 732, in execute
    raise ex.with_traceback(None)
psycopg.errors.NotNullViolation: null value in column "context" of relation "test_ts" violates not-null constraint
DETAIL:  Failing row contains (4, 1105, 1999-02-01 05:00:00, D, null, 201, 2024-03-27 21:23:42.778517, db_user).        

You can see above, the row value contains a 'null' when it should be an empty string. I am not sure why the empty string is getting converted to null here - as it's not None or Nan, but explictly set to '' in my pandas dataframe.

The SQL im using are the following:

sql_create:
drop table if exists tmp_tbl; CREATE UNLOGGED TABLE tmp_tbl AS SELECT date,context,value,freq,entity_id,metric_id FROM test_ts LIMIT 0

sql_copy:
COPY tmp_tbl (date,context,value,freq,entity_id,metric_id) FROM STDIN (FORMAT CSV,  DELIMITER "\t")

sql_insert:
insert into test_ts(date,context,value,freq,entity_id,metric_id) 
select * from tmp_tbl on conflict(metric_id,entity_id,date,freq,context) 
do update set value = EXCLUDED.value;drop table if exists tmp_tbl;

and the code snippet looks like this:

      with psycopg.connect(self.connect_str, autocommit=True) as conn:
         io_buf = io.StringIO()
         df.to_csv(io_buf, sep='\t', header=False, index=False)
         io_buf.seek(0)
         
         with conn.cursor() as cur:
               cur.execute(sql_create)
               with cur.copy(sql_copy) as copy:
                  while data:=io_buf.read(self.batch_size):
                     copy.write(data)

               cur.execute(sql_insert)
         conn.commit()

I can work around this by using something like 'N/A' instead of ''...but would much prefer it to be an empty string. Any help is appreciated.

0

There are 0 best solutions below