How to reorder yaml keys and maintain comments according to a predefined template?

65 Views Asked by At

I would like to set a template for the key order and line spacing of a yaml file and apply this to the repository of 100s of yaml files I have. In general, I want like to do the following:

  1. Load the existing yaml file
  2. Reorder the keys and values according to the template
  3. Delete any comments that are just a new line
  4. Apply the line spacing from the template
  5. Save the yaml file

I am using python 3.10 and ruamel.yaml version. At a very basic level, I understand that the YAML object in ruamel.yaml is based upon an ordered dictionary and the accepted answer here seems like a simple way to ensure a specific order of a dictionary's keys, but I don't know how to apply that to the YAML object.

To maintain comments, I presume that the .ca attribute can be copied although I don't know how to then apply the line spacing rules from the template.

Further complicating the matter is that some keys themselves may have multiple values (I think these would be a CommentedSequence in ruamel.yaml ?) each of which should follow the templated order - and the last one will need a blank line after it.

Here is a basic version of the template that should provide an overview of the structure I'm talking about:

template='''
name:
region:
origin:
description:

go_live_date:
status:

governance:
  business_owner:
    am:
    eu:
    ap:
  technical_owner:
    am:
    eu:
    ap:

architecture:
  protocol:
  platform:

environments:
  - name:
    description:
    tier:
    locations:

'''

In the following example, the key order is wrong, there are missing and double line spacing plus some comments:

'''
name: MyApp
description: My wonderful application
origin: internal
governance:
  technical_owner:
    am:
      - Nico Ferrell
    ap:
      - Benedict Berger
      - Elsie Parsons
    eu:
      - Frances Case

  business_owner:
    eu:
      - Audrey Dalton
    am:
      - John Carpenter # to be updated

architecture:
  protocol: [TCP]
  platforms: [python_3_10, java_16]


status: in production
go_live_date: 2024-01-01
environments:
  - name: EU Prod
    description: production environment for EMEA
    tier: production
    locations: [ABC, XYZ]
  - name: EU UAT
    description: UAT environment for EMEA
    locations: [LMN]
    tier: uat
# further environmental details to be added
'''

After applying the template and steps outlined to this example, the resultant file should look like this:

'''
name: MyApp
origin: internal
description: My wonderful application

status: in production
go_live_date: 2024-01-01

governance:
  technical_owner:
    am:
      - Nico Ferrell
    eu:
      - Frances Case
    ap:
      - Benedict Berger
      - Elsie Parsons

  business_owner:
    am:
      - John Carpenter # to be updated
    eu:
      - Audrey Dalton

architecture:
  protocol: [TCP]
  platforms: [python_3_10, java_16]

environments:
  - name: EU Prod
    description: production environment for EMEA
    tier: production
    locations: [ABC, XYZ]
  - name: EU UAT
    description: UAT environment for EMEA
    tier: uat
    locations: [LMN]
# further environmental details to be added

'''

I don't know how to tackle this and would appreciate some help

1

There are 1 best solutions below

3
Anthon On BEST ANSWER

You tackle this by writing a program, using order of the keys to re-insert the keys of the example in the order of the template. You can either use the .insert() methode that is available on the CommentedMap() instance that is used to load a YAML mapping inserting at position 0 using the reverse key order from the template. But you can also use the normal key order and pop and assign, that will get the first key at the back, then followed by the others until the first key is at the front.

To execute that, you can use a function that keeps a path to find corresponding the corresponding data strucure in the example, or recurse in parallel.

import sys
from pathlib import Path
import ruamel.yaml

yaml = ruamel.yaml.YAML()
yaml.indent(mapping=2, sequence=4, offset=2)
yaml.preserve_quotes = True
template = yaml.load(Path('template.yaml'))
data = yaml.load(Path('example.yaml'))

def reorder(t, d):
    if isinstance(t, dict):
        for k, v in t.items():
            try:
                dv = d.pop(k)
            except:
                # this handles e.g. the key 'region' that is missing from the example
                continue  
            d[k] = dv
            reorder(v, dv)
    elif isinstance(t, list):
        # assume the template has one element, the example multiple
        for idx, elem in enumerate(d):
            reorder(t[0], elem)

reorder(template, data)
yaml.dump(data, sys.stdout)

which gives:

name: MyApp
origin: internal
description: My wonderful application
go_live_date: 2024-01-01


status: in production
governance:
  business_owner:
    am:
      - John Carpenter # to be updated

    eu:
      - Audrey Dalton
  technical_owner:
    am:
      - Nico Ferrell
    eu:
      - Frances Case

    ap:
      - Benedict Berger
      - Elsie Parsons
architecture:
  platforms: [python_3_10, java_16]
  protocol: [TCP]
environments:
  - name: EU Prod
    description: production environment for EMEA
    tier: production
    locations: [ABC, XYZ]
  - name: EU UAT
    description: UAT environment for EMEA
    tier: uat
# further environmental details to be added
    locations: [LMN]

This gets your keys in the order of the template, but doesn't handle the empty lines. That is on purpose as we only recurse into the template data structure and the newline after "John Carpenter" is part of the sequence that is not part of the template. (As you can check with print(data['governance']['business_owner']['am'].ca)) Because of the way ruamel.yaml currently processes comments, attaching them to the last fully parsed node, the comment # further.. is assoicated with the key 'tier', and properly shifts position with reordering (although that might not be what you want).

Since ruamel.yaml was concieved to update values in existing YAML (config) files preserving as much as possible (key order, comments, empty lines) and you are certainly not doing anything close to that, you'll have some work doing the other steps.

I would first walk over the resuling example data an print the comments you find:

def remove_empty_lines(d):
    if isinstance(d, dict):
        for k, v in d.items():
            if d.ca.comment:
                print('comment', d.ca.comment)
            if (itemc := d.ca.items.get(k)) is not None:
                print('itemc', v, itemc)
            remove_empty_lines(v)
    elif isinstance(d, list):
        for idx, elem in enumerate(d):
            if d.ca.comment:
                print('lcomment', d.ca.comment)
            if (itemc := d.ca.items.get(idx)) is not None:
                print('litemc', elem, itemc)
            remove_empty_lines(elem)

remove_empty_lines(data)

which gives:

itemc in production [None, [CommentToken('\n\n', line: 22, col: 0)], None, None]
litemc John Carpenter [CommentToken('# to be updated\n\n', line: 17, col: 23), None, None, None]
litemc Frances Case [CommentToken('\n\n', line: 11, col: 8), None, None, None]
itemc uat [None, None, CommentToken('\n# further environmental details to be added\n', line: 35, col: 0), None]

So you will need to inspect those items and update the CommentToken. E.g. by using

print(dir(data['governance']['business_owner']['am'].ca.items[0][0]))
print(data['governance']['business_owner']['am'].ca.items[0][0].value)

you'll see that the the .value attribute contains the actual comment, that you can strip of spurious newlines.

Once that is done, walk over both template and data once more, check the template for comments, and insert/update the example. Make sure to create new CommentTokens do not copy them from the template. Examples for that you can find here