How to use Pandas to normalize a nested JSON - nested dictionaries and lists flattened

108 Views Asked by At

I would like to flatten a json request from an API. I have been trying to use pandas to achieve this.

Here is a sample of the json I am working with:

response = 
{
    "data": {
        "getTestExecutions": {
            "total": 50,
            "start": 1,
            "limit": 100,
            "results": [
                {
                    "issueId": "12345",
                    "jira": {
                        "key": "test-1234",
                        "summary": "test- JSON",
                        "project": {
                            "self": "some value",
                            "id": "4321",
                            "key": "tst",
                            "name": "test",
                            "projectTypeKey": "software",
                            "simplified": false,
                            "avatarUrls": {
                                "key" : "value",
                                "key1" : "value1",
                                "key2" : "value1",
                                "key4" : "value4"
                            }
                        }
                    },
                    "tests": {
                        "total": 25,
                        "start": 0,
                        "limit": 100,
                        "results": [
                            {
                                "issueId": "1234",
                                "testType": {
                                    "name": "Manual"
                                },
                                "jira": {
                                    "key": "example-123",
                                    "summary": "sample example"
                                }
                            },
                            {
                                "issueId": "12345",
                                "testType": {
                                    "name": "Manual"
                                },
                                "jira": {
                                    "key": "example-124",
                                    "summary": "sample example 1"
                                }
                            }

                        ]
                    },
                    "testRuns": {
                        "results": [
                            {
                                "status": {
                                    "name": "PASSED"
                                },
                                "executedById": "user id",
                                "startedOn": "date here",
                                "finishedOn": "date here",
                                "assigneeId": "assigneeId"
                            },
                            {
                                "status": {
                                    "name": "PASSED"
                                },
                                "executedById": "executedBy id",
                                "startedOn": "start date",
                                "finishedOn": "end date",
                                "assigneeId": "assigneeId"
                            }

                        ]
                    }
                }
                
            ]
        }
    }
}
    

this is my code so far:

    df = pd.json_normalize(d['data']['getTestExecutions']['results'])
    df_test_results = df.explode('tests.results')
    df_test_results_Norm = pd.json_normalize(df_test_results['tests.results']).add_prefix('test.')
    df_testRun_results = df.explode('testRuns.results')
    df_testRun_results_Norm = pd.json_normalize(df_testRun_results['testRuns.results']).add_prefix('testRuns.')
    df = pd.concat([df.drop(['tests.results','testRuns.results'],axis=1),df_test_results_Norm, df_testRun_results_Norm],axis=1)

The problems I run into is I want every column to be the keys of the dictionaries (without losing any) while maintaining order.I want a single dataframe with the parent records repeating for each of the child records. What I mean by this is as follows:

  • Test Executions is the parent and tests/testRuns are the children
  • For every test execution I could have 100 tests, and 100 testRuns. I want the parent record in the dataframe to apply to each child record

Right now my code produces a dataframe with N/A values for the blanks because, I am concatenating the results of different sizes.

Anyone know the best approach to do this? Thanks for the help.

1

There are 1 best solutions below

1
Jason Baker On

Code:

Parse parent and children separately then merge all 3 with reduce.

import pandas as pd
from functools import reduce


data = response.get("data").get("getTestExecutions")
meta = ["total", "start", "limit"]

jira = pd.json_normalize(
    data=data,
    meta=meta,
    record_path="results"
).drop(columns=["tests.results", "testRuns.results"])

tests = pd.json_normalize(
    data=data,
    meta=meta,
    record_path=["results", ["tests", "results"]]
)

runs = pd.json_normalize(
    data=data,
    meta=meta,
    record_path=["results", ["testRuns", "results"]]
)

dfs = [jira, tests, runs]
df_final = reduce(lambda left, right: pd.merge(
    left=left,
    right=right,
    on=meta,
    suffixes=("_meta", "_tests")
), dfs)
df_final.columns = df_final.columns.str.split(pat=".", n=3).str[-1]
print(df_final)

Output:

  issueId_meta   key_meta summary_meta        self    id  key  name projectTypeKey  simplified    key    key1    key2    key4 total start limit total start limit issueId_tests    name    key_tests     summary_tests   executedById   startedOn finishedOn  assigneeId    name
0        12345  test-1234   test- JSON  some value  4321  tst  test       software       False  value  value1  value1  value4    25     0   100    50     1   100          1234  Manual  example-123    sample example        user id   date here  date here  assigneeId  PASSED
1        12345  test-1234   test- JSON  some value  4321  tst  test       software       False  value  value1  value1  value4    25     0   100    50     1   100          1234  Manual  example-123    sample example  executedBy id  start date   end date  assigneeId  PASSED
2        12345  test-1234   test- JSON  some value  4321  tst  test       software       False  value  value1  value1  value4    25     0   100    50     1   100         12345  Manual  example-124  sample example 1        user id   date here  date here  assigneeId  PASSED
3        12345  test-1234   test- JSON  some value  4321  tst  test       software       False  value  value1  value1  value4    25     0   100    50     1   100         12345  Manual  example-124  sample example 1  executedBy id  start date   end date  assigneeId  PASSED