Is there are way to load partial table from hive to pig relation?

158 Views Asked by At

I am currently loading a hive table to pig relation using below code.

a = LOAD 'hive_db.hive_table' using org.apache.hive.hcatalog.pig.HCatLoader();

This step would get all the records from hive table into pig but for my current scenario I wouldn't need the whole table in pig. Is there way to filter out the unwanted records while I get the data from hive?

2

There are 2 best solutions below

0
nobody On

No you can't load partial table.However you can filter it after the load statement.You can use filter for specific partitions or filter out records based on column values in the table loaded.

Examples here

0
savagedata On

If your Hive table is partitioned, you can load only certain partitions by doing a FILTER statement immediately after your LOAD statement.

From the documentation:

If only some partitions of the specified table are needed, include a partition filter statement immediately following the load statement in the data flow. (In the script, however, a filter statement might not immediately follow its load statement.) The filter statement can include conditions on partition as well as non-partition columns.

A = LOAD 'tablename' USING  org.apache.hive.hcatalog.pig.HCatLoader();
-- date is a partition column; age is not
B = filter A by date == '20100819' and age < 30;

The above will only load the partition date == '20100819'. This only works for partition columns.