I am trying to populate some calculated columns after each new row INSERT in PostgreSQL, but I can't get it to work.
I have already read these posts (I know these are for SQLite - this question was originally for SQLite but I have changed it to be for PostgreSQL - I will look at SQLite once I get PG version working):
and
but I want to know if I have to use a SELECT statement and the FOR EACH ROW statement within the TRIGGER because I only want to calculate values for the row just inserted, not all rows.
For more history, this question follows on from my previous question here:
Here is the table setup:
CREATE TABLE myTable (
datetime TEXT,
sys_id INT,
cputil REAL,
memfree REAL,
sessnum INT,
util_lag REAL, -- to be calculated AFTER INSERT via TRIGGER
mem_lag REAL, -- to be calculated AFTER INSERT via TRIGGER
util_diff REAL, -- to be calculated AFTER INSERT via TRIGGER
mem_diff REAL -- to be calculated AFTER INSERT via TRIGGER
util_change TEXT -- to be calculated AFTER INSERT via TRIGGER
);
Now for the TRIGGER setup.
I have tried the following SQL TRIGGER syntax, but I cannot get it to work:
CREATE TRIGGER tr_fill_calculated_columns
AFTER INSERT ON myTable
BEGIN
UPDATE myTable
SET util_lag = LAG(cputil) OVER (ORDER BY datetime),
SET mem_lag = LAG(memfree) OVER (ORDER BY datetime),
SET util_diff = cputil - util_lag,
SET mem_diff = memfree - mem_lag,
SET util_change = CASE
WHEN util_diff > 0 THEN 'Up'
WHEN util_diff < 0 THEN 'Down'
WHERE datetime = NEW.datetime AND sys_id = NEW.sys_id
END;
I have also tried the following (which I think is closer to the way it needs to be done in PostgreSQL):
CREATE OR REPLACE FUNCTION fn_calculate_columns_after_insert
RETURNS TRIGGER
LANGUAGE PLPGSQL
AS $$
BEGIN
NEW.util_lag := (SELECT LAG(cputil) OVER (ORDER BY datetime) FROM myTable);
NEW.mem_lag := (SELECT LAG(memfree) OVER (ORDER BY datetime) FROM myTable);
NEW.util_diff := cputil - util_lag;
NEW.mem_diff := memfree - mem_lag;
IF NEW.util_diff > 0 THEN
NEW.util_change := 'Up';
IF NEW.util_diff < 0 THEN
NEW.util_change := 'Down';
ELSE
NEW.util_change := '';
RETURN NEW;
END;
$$
CREATE TRIGGER tr_fill_calculated_columns
AFTER INSERT ON myTable
FOR EACH ROW
EXECUTE FUNCTION fn_calculate_columns_after_insert();
No matter what I try I keep getting SQL error [42601]: ERROR: syntax error at or near ";" or near some other character. This is driving me absolutely nuts. None of the official documentation or example tutorials give an example of exactly what I want to do, which I do not think is hard - update a simple calculated column based upon existing column values in a newly inserted row. Pretty basic in my view.
The full dataset contains hundreds of millions of rows, so performance is important - I only want the TRIGGER to calculate the last row immediately AFTER INSERT, and thereafter never query it again, that is, I only want the TRIGGER to act on the new row being inserted (each time a new row is inserted).
To help clarify exactly what I am looking for in the outcome, here is some sample PRE-TRIGGER data:
datetime,sys_id,cputil,memfree,sessnum
2019/05/03 08:06:14,100,0.57,0.51,47
2019/05/03 08:11:14,100,0.47,0.62,43
2019/05/03 08:16:14,100,0.56,0.57,62
2019/05/03 08:21:14,100,0.57,0.56,50
2019/05/03 08:26:14,100,0.35,0.46,43
2019/05/03 08:31:14,100,0.41,0.58,48
2019/05/03 08:36:14,100,0.57,0.35,58
2019/05/03 08:41:14,100,0.41,0.4,58
2019/05/03 08:46:14,100,0.53,0.35,62
2019/05/03 08:51:14,100,0.51,0.6,45
2019/05/03 08:56:14,100,0.32,0.37,47
2019/05/03 09:01:14,100,0.62,0.59,60
2019/05/03 09:06:14,100,0.66,0.72,57
2019/05/03 09:11:14,100,0.54,0.54,44
2019/05/03 09:16:14,100,0.29,0.4,47
2019/05/03 09:21:14,100,0.43,0.68,66
2019/05/03 09:26:14,100,0.49,0.66,65
2019/05/03 09:31:14,100,0.64,0.55,66
2019/05/03 09:36:14,100,0.42,0.6,42
2019/05/03 09:41:14,100,0.55,0.59,63
and here is a sample of the desired POST-TRIGGER result with calculated columns filled by the TRIGGER/FUNCTION:
datetime,sys_id,cputil,memfree,sessnum,util_lag,mem_lag,util_diff,mem_diff,util_change
2019/05/03 08:06:14,100,0.57,0.51,47,[NULL],[NULL],[NULL],[NULL],[NULL]
2019/05/03 08:11:14,100,0.47,0.62,43,0.57,0.51,-0.1,0.11,Down
2019/05/03 08:16:14,100,0.56,0.57,62,0.47,0.62,0.0900000000000001,-0.05,Up
2019/05/03 08:21:14,100,0.57,0.56,50,0.56,0.57,0.0099999999999999,-0.0099999999999999,Up
2019/05/03 08:26:14,100,0.35,0.46,43,0.57,0.56,-0.22,-0.1,Down
2019/05/03 08:31:14,100,0.41,0.58,48,0.35,0.46,0.06,0.12,Up
2019/05/03 08:36:14,100,0.57,0.35,58,0.41,0.58,0.16,-0.23,Up
2019/05/03 08:41:14,100,0.41,0.4,58,0.57,0.35,-0.16,0.05,Down
2019/05/03 08:46:14,100,0.53,0.35,62,0.41,0.4,0.12,-0.05,Up
2019/05/03 08:51:14,100,0.51,0.6,45,0.53,0.35,-0.02,0.25,Down
2019/05/03 08:56:14,100,0.32,0.37,47,0.51,0.6,-0.19,-0.23,Down
2019/05/03 09:01:14,100,0.62,0.59,60,0.32,0.37,0.3,0.22,Up
2019/05/03 09:06:14,100,0.66,0.72,57,0.62,0.59,0.04,0.13,Up
2019/05/03 09:11:14,100,0.54,0.54,44,0.66,0.72,-0.12,-0.18,Down
2019/05/03 09:16:14,100,0.29,0.4,47,0.54,0.54,-0.25,-0.14,Down
2019/05/03 09:21:14,100,0.43,0.68,66,0.29,0.4,0.14,0.28,Up
2019/05/03 09:26:14,100,0.49,0.66,65,0.43,0.68,0.06,-0.02,Up
2019/05/03 09:31:14,100,0.64,0.55,66,0.49,0.66,0.15,-0.11,Up
2019/05/03 09:36:14,100,0.42,0.6,42,0.64,0.55,-0.22,0.0499999999999999,Down
2019/05/03 09:41:14,100,0.55,0.59,63,0.42,0.6,0.13,-0.01,Up
Finally, I would like to know how to do this in SQLite too, i.e. what is the correct SQLite syntax to do exactly the same thing?
In PL/pgSQL, the
IFconditional statement requires a subsequentEND IF. The errorwas misleading because the parser was looking for that, found the nearest
ENDwhich was the final one. It wanted anEND IF, got anEND;instead - hence the complaint.You had missing parentheses in the signature.
The last semicolon in the function body is optional. The one right after it, is not optional.
You treated
NEW.xandxinterchangeably. The field is only defined within the scope of a query where it's a column in one of the source tables. Outside of it, you need to refer to the pseudo-recordNEWand its fields.You can use PL/pgSQL
SELECT..INTOto populate both variables at once (that's slightly different from SQLSELECT..INTOwhich is a worseCREATE TABLE AS).Use
IF..ELSIF..ELSIF..END IF;to specifically state what you want and when and avoid quietly handlednullinELSE.This has to happen in a
TRIGGER..BEFOREINSERT. Doing this in anAFTERaction you're not affecting the rows written into the table.This doesn't do what you expect it to:
it attempts to select all rows from the table, and in each, show the value from the previous one. It tried to assign the entire, slightly shifted set of rows to the variable.
If you happen to insert an out-of-order row that'd squeeze in between two existing rows, it will get the correct values, but it would invalidate the one that was previously comparing against what is now 2 rows back. You might want to detect and handle this scenario as well as decrease the table and index
fillfactoradequately, to always have some room to "slide-in" the out-of-order rows and update the ones that they invalidate.If you do large, multi-row inserts, a statement-level trigger should perform better - you could only fetch one last row from the table to start it off, then bulk-update the whole batch at once using the window function, the way you intially tried to.
Switch the
datetimecolumn totimestamptztype to save space, speed things up, index properly and enable adequate functions and operators:If you're struggling with
timestamptzinput format, you can useto_timestamp()with a specific mask: demoor you can change your
DateStylesetting:Add an index on the
datetimecolumn, strapcputilandmemfreefields to it as payload, increase itsfillfactorfrom default 90 to 100%, then additionallyCLUSTERthe table on it:That way, the query that's looking for the latest row doesn't need to touch the table and finds the values right in the index. Since you're only ever adding more and more recent rows, you don't need the 10% empty space in the index that's normally there to accommodate updates/deletes: both the table and the index will just keep appending new entries.
CLUSTERtells the db the table is meant to be physically written in the order corresponding to the index - in your case that happens naturally, but declaring that explicitly might help some execution plans.After some performance tests:
INCLUDEmight be unnecessary after all. It's doing an index-only scan, but since the table's being written to, it has to do a heap fetch to make sure the index payload matches the actual values in the row. So you might as well let it do a regular index scan by removing the payload, which might get you some more performance thanks to decreasing the overall index size, similar to howfillfactordid it.If you really only insert more recent rows, you can remove
WHERE datetime < NEW.datetime- that way you'll just peek the most recent row from the table, removing the extra check and speeding things up a bit further. This breaks out-of-order inserts.Demo at db<>fiddle:
Here's what it comes up with if you try to insert your sample batch:
Performance test on 220k rows:
where(doesn't finish)