How to use row_number and partition function in sqldf

3.4k Views Asked by At
Update

I can run below sql query in netezza database, but it goes wrong in sqldf package in R

> sqldf("SELECT TEXT, 
+ VEH_MAKE_NM, 
+ NEW_USED_CD, 
+ PRODUCT, 
+ OVERALL_SUBV_IND, 
+ AS_OF_DATE, 
+ CATEGORY,
+ ROW_NUMBER() OVER(PARTITION BY TEXT, VEH_MAKE_NM, NEW_USED_CD, PRODUCT, OVERALL_SUBV_IND, AS_OF_DATE ORDER BY CATEGORY DESC) RN_CATEGORY,
+  SUBCATEGORY,
+ ROW_NUMBER() OVER(PARTITION BY TEXT, VEH_MAKE_NM, NEW_USED_CD, PRODUCT, OVERALL_SUBV_IND, AS_OF_DATE ORDER BY SUBCATEGORY DESC) RN_SUBCATEGORY
+ FROM output
+ --GROUP BY 1,2,3,4,5,6")
Error in sqliteSendQuery(con, statement, bind.data) : 
  error in statement: near "(": syntax error

I think it might because sqldf package doesn't support netezza SQL. Is there a netezza sql package in R?

Thanks

1

There are 1 best solutions below

1
amit sinha On

Step 1. Add row number column into output dataframe:

output['RN_CATEGORY'] = output.sort_values(['CATEGORY'], 
    ascending=False).groupby(['TEXT', 'VEH_MAKE_NM', 'NEW_USED_CD', 'PRODUCT', 
    'OVERALL_SUBV_IND', 'AS_OF_DATE']).cumcount() + 1

output['RN_SUBCATEGORY'] =output.sort_values(['SUBCATEGORY'], 
    ascending=False).groupby(['TEXT', 'VEH_MAKE_NM', 'NEW_USED_CD', 'PRODUCT', 
    'OVERALL_SUBV_IND', 'AS_OF_DATE']).cumcount() + 1

Step 2.

sqldf("SELECT TEXT, 
    VEH_MAKE_NM, 
    NEW_USED_CD, 
    PRODUCT, 
    OVERALL_SUBV_IND, 
    AS_OF_DATE, 
    CATEGORY,
    RN_CATEGORY,
    SUBCATEGORY,
    RN_SUBCATEGORY
  FROM output
--GROUP BY 1,2,3,4,5,6")