I have the data like this :

+------+----------+---------+---------+-------------------+----+----------+
|  ID  | punchdate|punchtype|punchtime|      punchdatetime| uuu| feed_date|
+------+----------+---------+---------+-------------------+----+----------+
|  1000|2023-08-08|       IN| 05:40:00|2023-08-08 05:40:00| 002|2023-08-08|
|  1000|2023-08-08|      OUT| 06:24:00|2023-08-08 05:24:00| 002|2023-08-08|
| 10008|2023-08-08|       IN| 05:44:00|2023-08-08 05:44:00| 001|2023-08-08|
| 10008|2023-08-08|      OUT| 05:46:00|2023-08-08 05:46:00| 001|2023-08-08|
| 10008|2023-08-08|       IN| 05:54:00|2023-08-08 05:54:00| 001|2023-08-08|
| 10008|2023-08-08|      OUT| 06:00:00|2023-08-08 06:00:00| 001|2023-08-08|
|000003|2023-08-08|       IN| 05:44:00|2023-08-08 05:44:00| 001|2023-08-08|
|000003|2023-08-08|      OUT| 05:46:00|2023-08-08 05:46:00| 001|2023-08-08|
|000003|2023-08-08|       IN| 05:54:00|2023-08-08 05:54:00| 001|2023-08-08|
+------+----------+---------+---------+-------------------+----+----------+


Need to extract like this in hive
+-------+----------+-----------------------+---------------------+----+----------+
|   id  | punchdate|           punchINtime |       punchOUTtime  | uuu| feed_date|
+-------+----------+-----------------------+---------------------+----+----------+
|   1000|2023-08-08| 2023-08-08 05:40:00   | 2023-08-08 06:24:00 | 002|2023-08-08|
|  10008|2023-08-08| 2023-08-08 05:44:00   | 2023-08-08 05:46:00 | 001|2023-08-08|
|  10008|2023-08-08| 2023-08-08 05:54:00   | 2023-08-08 06:00:00 | 001|2023-08-08|
| 000003|2023-08-08| 2023-08-08 05:44:00   | 2023-08-08 05:46:00 | 001|2023-08-08|
| 000003|2023-08-08| 2023-08-08 05:54:00   | current_timestamp   | 001|2023-08-08|
+-------+----------+-----------------------+---------------------+----+----------+

I have tried few queries using MIN, MAX, Lead and lag. Can some help me in how to write this in Hive

1

There are 1 best solutions below

0
Asi Salo On

I think this should work for you:

SELECT
  id,
  punchdate,
  punchINtime,
  punchOUTtime,
  uuu,
  feed_date
FROM (
  SELECT
    id,
    punchdate,
    MIN(CASE WHEN punchtype = 'IN' THEN punchdatetime END) AS punchINtime,
    MAX(CASE WHEN punchtype = 'OUT' THEN punchdatetime END) AS punchOUTtime,
    uuu,
    feed_date,
    ROW_NUMBER() OVER (PARTITION BY id, feed_date ORDER BY punchdatetime) AS rn
  FROM table
  GROUP BY id, punchdate, uuu, feed_date
) subquery
WHERE punchINtime IS NOT NULL OR rn = 1;