To extract field from JSON file comparing it with plain text file matching values and extract specific field from JSON file

415 Views Asked by At

I have file1.json and plain text file2, Where using file2 values compare with file.json with matching values of file2 there will be the corresponding field which is CaseID in file1.json the resultant file should consist of those values. I have mentioned cases below with expected results.

I was trying to extract using the awk tool, where I don't get my expected answer

 awk -F, 'FNR==NR {f2[$1];next} !($0 in f2)' file2 file1

file1.json

{
    "Cases": [{
            "CaseID": "100",
            "CaseUpdatedByUser": "XYZ",
            "Case": {
                "CaseName": "Apple",
                "ID": "1"
            }
        },
        {
            "CaseID": "350",
            "CaseUpdatedByUser": "ABC",
            "Case": {
                "CaseName": "Mango",
                "ID": "1"
            }
        },
        {
            "CaseID": "440",
            "CaseUpdatedByUser": "PQR",
            "Case": {
                "CaseName": "Strawberry",
                "ID": "1"
            }
        }
    ]
}

file2

Apple
Strawberry
Mango

Expected output:

100
350
440
2

There are 2 best solutions below

0
Cyrus On BEST ANSWER

With jq, awk and sort:

jq -r '.Cases[] | "\(.Case.CaseName);\(.CaseID)"' file1 \
  | awk -F ';' 'NR==FNR{array[$1]=$2; next} {print array[$1]}' - file2 \
  | sort -n

Output:

100
350
440
0
Pin90 On

How about if you write an extract.py module that helps you to get the exact information that you need.

The module is flexible so it can be imported as a module into any project.

I've tried with a complex and long json file and it worked just fine.

The code of this module is:

#extract.py

def json_extract(obj, key):
    arr = []

    def extract(obj, arr, key):
        if isinstance(obj, dict):
            for k, v in obj.items():
                if isinstance(v, (dict, list)):
                    extract(v, arr, key)
                elif k == key:
                    arr.append(v)
        elif isinstance(obj, list):
            for item in obj:
                extract(item, arr, key)
        return arr
    
    values = extract(obj, arr, key)
    return values

For further explanation, this is the URL of the original post ( Extract Nested Data From Complex JSON ).