I've a simple function that calculates z-scores:
CREATE OR REPLACE FUNCTION z_score(x FLOAT)
RETURNS FLOAT
COMMENT = 'Simple z-score calculation.'
AS 'NVL2(x, (x - AVG(x) OVER()) / STDDEV(x) OVER(), NULL)'
;
How can I extend the above definition to enable end user passing optional PARTITION BY / ORDER BY to the OVER() call so the score can be calculate within a group.
Details
In a non-functional form the one can execute
SELECT
NVL2(x, (x - AVG(x) OVER(PARTITION BY group_col)) / STDDEV(x) OVER(
PARTITION BY group_col), NULL)
FROM ...
Is it possible to mimic this behaviour in the function? It would like to make it possible to call function in two ways:
z_score(x)→ No grouping argument. Executes:NVL2(x, (x - AVG(x) OVER()) / STDDEV(x) OVER(), NULL)z_score(x, PARTITION BY group_col)→ With grouping argument, generating statement from example. Executes:NVL2(x, (x - AVG(x) OVER(PARTITION BY group_col)) / STDDEV(x) OVER( PARTITION BY group_col), NULL)
Example
--- Function experiment
CREATE
OR REPLACE TEMPORARY TABLE tmp_some_values (group_col VARCHAR, x_val FLOAT);
INSERT INTO
tmp_some_values
VALUES
('A', 786783),
('A', 7653.22),
('A', 75553.11),
('B', 1.33),
('B', 1.1),
('B', 1.2);
SELECT
x_val,
NVL2(x_val, (x_val - AVG(x_val) OVER()) / STDDEV(x_val) OVER(), NULL) AS z_score,
NVL2(x_val,(x_val - AVG(x_val) OVER(PARTITION BY group_col)) / STDDEV(x_val) OVER(PARTITION BY group_col), NULL) AS z_score_group
FROM
tmp_some_values;
Results
| X_VAL | Z_SCORE | Z_SCORE_GROUP |
|---|---|---|
| 786783 | 2.032252526 | 1.151122095 |
| 7653.22 | -0.4349140533 | -0.654227693 |
| 75553.11 | -0.2199045084 | -0.4968944024 |
| 1.33 | -0.4591442749 | 1.040531963 |
| 1.1 | -0.4591450032 | -0.9538209665 |
| 1.2 | -0.4591446866 | -0.08671099695 |
Workaround solution
I was able to come up with a workaround using IFF
-- Define Z-Score calculating function
CREATE OR REPLACE FUNCTION z_score(x FLOAT, group_col VARCHAR DEFAULT NULL)
RETURNS FLOAT
COMMENT = 'Z-score calculation'
AS
$$
IFF(group_col IS NULL,
NVL2(x, (x - AVG(x) OVER ()) / STDDEV(x) OVER (), NULL),
NVL2(x, (x - AVG(x) OVER (PARTITION BY group_col)) / STDDEV(x) OVER (PARTITION BY group_col), NULL)
)
$$
;
Still I would prefer a more intelligent way of handling optional SQL-like arguments.