How to produce a list of all unique key path from a given JSON

88 Views Asked by At

I would like to produce a list of all unique key path from a given JSON, each one in a line.

For example, from this input JSON:

{
   "results":[
      {
         "id":306,
         "name":"First Company",
         "branches":[
            {
               "id":4191,
               "city":"San Francisco",
               "customers":[
                  {
                     "id":446,
                     "name":"Big Tech 1"
                  },
                  {
                     "id":447,
                     "name":"Big Tech 2"
                  }
               ]
            },
            {
               "id":4192,
               "city":"Miami",
               "customers":[
                  {
                     "id":448,
                     "name":"Insurance Tech 1"
                  },
                  {
                     "id":449,
                     "name":"Health Tech 2"
                  }
               ]
            }
         ]
      }
   ]
}

I would like to produce this output:

  results
  results.id
  results.name
  results.branches
  results.branches.id
  results.branches.city
  results.branches.customers
  results.branches.customers.id
  results.branches.customers.name

I am struggling with the command line below but it is not working.

jq '
def path(k):
  if k | type == "object":
    reduce(.keys[] | path(.), k)
  else:
    [k]
;

.[] | path(".") | flatten | sort' example.json

Any help, please?

3

There are 3 best solutions below

0
RomanPerekhrest On

With paths function, filtering out array indices of object keys and collecting unique paths:

jq '[paths(.) | map(select(type != "number")) | join(".")] | unique[]' test.json 

"results"
"results.branches"
"results.branches.city"
"results.branches.customers"
"results.branches.customers.id"
"results.branches.customers.name"
"results.branches.id"
"results.id"
"results.name"
0
pmf On

You could map the paths to only contain strings, then drop the duplicates using unique, and join the items by providing a delimiter:

jq -r '[paths | map(strings)] | unique[] | join(".")'
results
results.branches
results.branches.city
results.branches.customers
results.branches.customers.id
results.branches.customers.name
results.branches.id
results.id
results.name

Demo

To retain the document order (unique also sorts the array), you could deduplicate the list by using the joined paths as (unique-by-definition) object fields instead:

jq -r '[paths | map(strings) | join(".") | {(.):.}] | add[]'
results
results.id
results.name
results.branches
results.branches.id
results.branches.city
results.branches.customers
results.branches.customers.id
results.branches.customers.name

Demo

0
sudocracy On

Here's a solution that outputs the following to make it clear where the arrays are as well:

results[]
results[].branches[]
results[].branches[].city
results[].branches[].customers[]
results[].branches[].customers[].id
results[].branches[].customers[].name
results[].branches[].id
results[].id
results[].name

This is quite handy to see the "shape" of the JSON quickly before manipulating it.

If you have this code in a script called show-shape.jq (demo):

#! /usr/local/bin/jq -r -f 

def extract:
    [ paths 
    | [ .[] | if type == "number" then "[]" else . end ] 
    | join (".") ] ;

def longest:
    . | sort_by(length) | reverse| .[0] ; 

def clean:
    [ .[] | gsub("[.][[]]"; "[]") ]  
    | group_by(. | rtrimstr("[]")) 
    | [ .[] | longest ];

def show:
    extract | sort | unique | clean | .[];
    
show

You can use it as follows:

cat some.json | show-shape.jq

Example:

§ echo '{"x":{"a":1,"b":[{"c":1},{"c":2}],"d":{"e":1}}}' | show-shape.jq 

x
x.a
x.b[]
x.b[].c
x.d
x.d.e

Notice the distinction between how x.b[].c and x.d.e are displayed.

One shortcoming of the script, thuogh, is that it does not show a trailing [] for an empty array:

§ echo '{ "a" : [1] }' | show-json-shape.jq 
a[]
§ echo '{ "a" : [] }' | show-json-shape.jq 
a