Pyspark Json Extract Values

85 Views Asked by At

I am working with a PySpark DataFrame that has a JSON column from CSV file budgetthresholdaction. The JSON structure looks like this:

{
  "budgetThresholdAction0threshold0": {
    "thresholdValue": 80,
    "action": "promotionsetup.BudgetThresholdPercentageParameter"
  },
  "budgetThresholdAction1action1": {
    "thresholdName": "budgetThresholdAction1threshold0",
    "action": "promotionsetup.BudgetTerminateActionParameter"
  },
  "budgetThresholdAction1action2": {
    "thresholdName": "budgetThresholdAction1threshold0",
    "action": "promotionsetup.BudgetTurnOffMessagingActionParameter"
  },
  "budgetThresholdAction1threshold0": {
    "name": "budgetThresholdAction1threshold0",
    "thresholdValue": 95,
    "action": "promotionsetup.BudgetThresholdPercentageParameter"
  },
  "budgetThresholdAction1action0": {
    "thresholdName": "budgetThresholdAction1threshold0",
    "action": "promotionsetup.BudgetNotifyActionParameter"
  },
  "budgetThresholdAction0action0": {
    "thresholdName": "budgetThresholdAction0threshold0",
    "action": "promotionsetup.BudgetNotifyActionParameter"
  }
}

I need to create two separate columns, threshold and action, in my PySpark DataFrame with the following values:

threshold action
80 promotionsetup.BudgetThresholdPercentageParameter
95 promotionsetup.BudgetNotifyActionParameter
95 promotionsetup.BudgetTurnOffMessagingActionParameter
95 promotionsetup.BudgetTerminateActionParameter

I've attempted to use get_json_object and when functions, but I'm struggling to get the desired output. Could you please guide me on how to achieve this transformation in PySpark?

What I tried :

# To get threshold values as array and then explode it using :
df.select(array(get_json_object(col('budgetthresholdaction'),'$.budgetThresholdAction0threshold0.thresholdValue'),get_json_object(col('budgetthresholdaction'),'$.budgetThresholdAction1threshold0.thresholdValue')).alias('T')).show(1,False)

But I'm struggling to map Threshold and Action Values as for single threshold there can be multiple action.

We can have more than 2 Threshold and for every threshold it can have 1 or more action.

Mapping is : ActionNThreshold0 => ActionNActionX

For example :

Action0Threshold0 = [ Action0action0, Action0action1,Action0action2...]

Action1Threshold0 = [ Action1action0,Action1action1...]

0

There are 0 best solutions below