How to recursively print the path of all keys in jq

85 Views Asked by At

I'd like to get a list of all paths available in a JSON document, just to get an idea of the layout of a big json document.

One liners preferred, of course a big jq/python program can do the trick.

Example:

echo '{"k1": {"k12": "v12"}, "k2": {"k21": {"k211": "v211"}}}' | jq -r 'not-so-long-magic-here'
k1
k1.k12
k2
k2.k21
k2.k21.k211

There are variations (not exactly duplicates) of this question like 'all paths matching a given pattern' or 'print paths plus value'.

There is also gron, that prints keys and values colorized.

4

There are 4 best solutions below

3
mmoya On BEST ANSWER

Use jq -r 'paths | map(numbers |= "[\(.)]") | join(".")'.

For example:

$ echo '{"k1": "v1", "k2": {"k21": "v21"}, "k3": {"k31": {"k311": "v311"}}, "k4": [{"k41": "v41"}, {"k42": "v42"}], "k5": {"0": "v50", "1": "v51"}}' | jq -r 'paths | map(numb
ers |= "[\(.)]") | join(".")'
k1
k2
k2.k21
k3
k3.k31
k3.k31.k311
k4
k4.[0]
k4.[0].k41
k4.[1]
k4.[1].k42
k5
k5.0
k5.1

A simpler solution using just paths | join(".") will render array indices and numeric object keys the same way. That might be not desirable.

$ echo '{"k1": "v1", "k2": {"k21": "v21"}, "k3": {"k31": {"k311": "v311"}}, "k4": [{"k41": "v41"}, {"k42": "v42"}], "k5": {"0": "v50", "1": "v51"}}' | jq -r 'paths | join(".")'
k1
k2
k2.k21
k3
k3.k31
k3.k31.k311
k4
k4.0
k4.0.k41
k4.1
k4.1.k42
k5
k5.0
k5.1

Credit for both solutions goes to @pmf.

0
Philippe On

This version handles the cases where keys contain spaces, and the result can be given back to jq :

jq -nr '[{a:1},{"a b":[{c:"2"}]}]|
paths | (if (first|type)=="string" then "" else "." end) + (map(
        if type=="string"
        then (
             (if test("^[[:alpha:]_][\\w_]*$")
             then ""
             else "\""
             end) as $q |
             "."+$q+.+$q
             )
        else "[\(.)]"
        end) |
        join("")
      )'

.[0]
.[0].a
.[1]
.[1]."a b"
.[1]."a b"[0]
.[1]."a b"[0].c
0
peak On

To get just the paths of the keys (including the key), consider starting with:

paths
| select(.[-1] | type == "string")

For example:

[{a:1},{"a b":[{c:"2"}]}]
| paths
| select(.[-1] | type == "string")

yields:

[0,"a"]
[1,"a b"]
[1,"a b",0,"c"]

If you want just the paths to the leaf keys (without their names), then tack on | .[:-1] to the filter.

If you want some other representation of the path, consider @csv as that is "lossless", unlike join(".").

For very large JSON documents, the full list of paths will probably not be so helpful if there are arrays in the JSON document. In that case, you could change all the numeric components to "*" (for example), and then remove redundant lines, e.g.

def uniq(s):
  foreach s as $x (null;
    if . and $x == .[0] then .[1] = false
    else [$x, true]
    end;
    if .[1] then .[0] else empty end);

[{a: [1,2], x: {"a b":[{c:2}]} } ]
| paths, "",
  uniq(paths
   | select(.[-1] | type == "string")
   | map(if type == "number" then "*" else . end) )

This produces:

[0]
[0,"a"]
[0,"a",0]
[0,"a",1]
[0,"x"]
[0,"x","a b"]
[0,"x","a b",0]
[0,"x","a b",0,"c"]

["*","a"]
["*","x"]
["*","x","a b"]
["*","x","a b","*","c"]

To understand the structure of large JSON documents, an inferred structural schema can be handy. To give an idea of what this might mean, let's use the JSON corresponding to the example immediately above, using schema.jq:

jq 'include "schema" {search: "./"}; schema' <<< '[{"a": [1,2], "x": {"a b":[{"c":2}]} } ]'
{
  "a": [
    "number"
  ],
  "x": {
    "a b": [
      {
        "c": "number"
      }
    ]
  }
}

yields

{
  "a": [
    "number"
  ],
  "x": {
    "a b": [
      {
        "c": "number"
      }
    ]
  }
}
0
nntrn On
jq '[paths|(map(if type == "number" then "[]" else ".\(.)" end)|join(""))]|unique'

Here's an example using stocks data from:
https://query1.finance.yahoo.com/v8/finance/chart/GTLB

[
  ".chart",
  ".chart.error",
  ".chart.result",
  ".chart.result[]",
  ".chart.result[].indicators",
  ".chart.result[].indicators.adjclose",
  ".chart.result[].indicators.adjclose[]",
  ".chart.result[].indicators.adjclose[].adjclose",
  ".chart.result[].indicators.adjclose[].adjclose[]",
  ".chart.result[].indicators.quote",
  ".chart.result[].indicators.quote[]",
  ".chart.result[].indicators.quote[].close",
  ".chart.result[].indicators.quote[].close[]",
  ".chart.result[].indicators.quote[].high",
  ".chart.result[].indicators.quote[].high[]",
  ".chart.result[].indicators.quote[].low",
  ".chart.result[].indicators.quote[].low[]",
  ".chart.result[].indicators.quote[].open",
  ".chart.result[].indicators.quote[].open[]",
  ".chart.result[].indicators.quote[].volume",
  ".chart.result[].indicators.quote[].volume[]",
  ".chart.result[].meta",
  ".chart.result[].meta.chartPreviousClose",
  ".chart.result[].meta.currency",
  ".chart.result[].meta.currentTradingPeriod",
  ".chart.result[].meta.currentTradingPeriod.post",
  ".chart.result[].meta.currentTradingPeriod.post.end",
  ".chart.result[].meta.currentTradingPeriod.post.gmtoffset",
  ".chart.result[].meta.currentTradingPeriod.post.start",
  ".chart.result[].meta.currentTradingPeriod.post.timezone",
  ".chart.result[].meta.currentTradingPeriod.pre",
  ".chart.result[].meta.currentTradingPeriod.pre.end",
  ".chart.result[].meta.currentTradingPeriod.pre.gmtoffset",
  ".chart.result[].meta.currentTradingPeriod.pre.start",
  ".chart.result[].meta.currentTradingPeriod.pre.timezone",
  ".chart.result[].meta.currentTradingPeriod.regular",
  ".chart.result[].meta.currentTradingPeriod.regular.end",
  ".chart.result[].meta.currentTradingPeriod.regular.gmtoffset",
  ".chart.result[].meta.currentTradingPeriod.regular.start",
  ".chart.result[].meta.currentTradingPeriod.regular.timezone",
  ".chart.result[].meta.dataGranularity",
  ".chart.result[].meta.exchangeName",
  ".chart.result[].meta.exchangeTimezoneName",
  ".chart.result[].meta.firstTradeDate",
  ".chart.result[].meta.gmtoffset",
  ".chart.result[].meta.hasPrePostMarketData",
  ".chart.result[].meta.instrumentType",
  ".chart.result[].meta.priceHint",
  ".chart.result[].meta.range",
  ".chart.result[].meta.regularMarketPrice",
  ".chart.result[].meta.regularMarketTime",
  ".chart.result[].meta.symbol",
  ".chart.result[].meta.timezone",
  ".chart.result[].meta.validRanges",
  ".chart.result[].meta.validRanges[]",
  ".chart.result[].timestamp",
  ".chart.result[].timestamp[]"
]

The paths created above can even be used in jq:

$ jq '.chart.result[].meta.exchangeName' gtlb.json 
"NMS"