Using Partition By to compare values based on date in Hive SQL

Question

Using Partition By to compare values based on date in Hive SQL

31 Views Asked by Nishad Gulvady At 14 August 2023 at 19:56

I have a table called referrals

CREATE TABLE [dataproduct].[referrals]
(
      [person_id] DOUBLE,
      [medical_service_cd] DOUBLE,
      [refer_from_organization_id] 
      [referral_org_name] STRING, 
      [refer_from_provider_id] DOUBLE, 
      [provider_from_name] STRING,  
      [refer_to_provider_id] DOUBLE,
      [provider_to_name] STRING,  
      [outbound_encntr_id DOUBLE,
      [order_id] DOUBLE,
      [referral_written_dt_tm] STRING,
      [requested_start_dt_tm] STRING,  
      [medical_service] STRING,  
      [referral_status] STRING, 
      [loc_code] STRING,  
      [service_type] STRING,
      [referral_reason] STRING,  
      [treatment_text] STRING

)

I want to find patients, by person_id, who were referred to Neurology FOLLOWED BY Medical Genetics (as listed under medical_service) using referral_written_dt_tm. I confirmed that some patients had both medical services, but want to find the ones who had them in the desired order. so I tried this:

  SELECT r.person_id, 
         COUNT(DISTINCT r.medical_service) as count,
         row_number() OVER (PARTITION BY person_id ORDER BY referral_written_dt_tm ASC) AS row_num 
    FROM matt_dataproduct.referrals r
   WHERE r.medical_service IN ('Medical Genetics',
                               'Neurology'
                               )
GROUP BY r.person_id,
  HAVING count > 1;

I got no results. I just want a list of patients who meet the criteria (medical genetics followed by neurology). Is there some way I should revise my query? Something about it seems off.

Original Q&A

There are 1 best solutions below

**EdmCoff** · Accepted Answer · 2023-08-14T22:01:33.210000

One way is probably something like the following:

SELECT r.person_id 
FROM matt_dataproduct.referrals rmg INNER JOIN matt_dataproduct.referrals rn
 ON rmg.person_id = rn.person_id
WHERE rmg.medical_service ='Medical Genetics'
 AND rn.medical_service = 'Neurology'
 AND rn.eferral_written_dt_tm < rmg.eferral_written_dt_tm
GROUP BY r.person_id

This simply joins the table to itself (once for Neurology and once for Medical Genetics) and finds rows where the same patient has Neurology before Medical Genetics. The group by is being used to get distinct values (could use distinct in the select part instead).

You said the type of eferral_written_dt_tm is a string (not an actual date). I'm assuming it's directly comparable without conversion (since you used it in your window function's order by), but if not you would replace rn.eferral_written_dt_tm < rmg.eferral_written_dt_tm with the appropriate functions or conversions so that it's comparable.

You could use the row_number window function as you tried (or maybe a variation with lag) by wrapping it in another query to compare values. But in this case I don't see any big advantage over the simple self-join.

Using Partition By to compare values based on date in Hive SQL

There are 1 best solutions below

Related Questions in SQL

Related Questions in HIVE

Related Questions in PARTITION-BY

Trending Questions

Popular # Hahtags

Popular Questions