import sys
import ruamel.yaml
yaml_str = """\
hello: world
foo: &core_foo
s: 1
"""
yaml_str2 = """\
hello1 : world
foo:
<<: *core_foo
"""
yaml = ruamel.yaml.YAML()
yaml.allow_duplicate_keys = True
yaml.dump(data, sys.stdout)
data = yaml.load(yaml_str + yaml_str2)
I tried to concatenate and read with allowing duplicate keys. While the result of load is as I expected, dump is not preserving the merge, and aliases
Expected:
hello: world
foo:
<<: *core_foo
hello1: world
Actual:
hello: world
foo:
s: 1
hello1: world
Is this how it is expected?
First of all it is unlikely that your program generates the output you show, because you set
databy loading the concatenated strings after you dump it. I am also not sure why you concatenate the strings, but that might be a remnant from experimenting with the code.The behaviour is as expected. When allowing duplicate keys,
ruamel.yamldrops any recurring instances. Some other parsers don't check for duplicate keys and silently overwrite the original entry (but by then will have the alias resolved, so the merged mapping data will probably be there). Inruamel.yamlthe key-value pairfooand the "merge", although they get parsed, are then dropped. This causes the value for the first keyfooto have an anchor, but that value has only one reference. The id (core_foo) is attached to the data structure (as can be seen from the output of the code below)During dump
ruamel.yamltracks the nodes that are going to be dumped and if the same (Python)idis encountered the first occurence gets an anchor and any following an alias. So essentially you need to wait until you can dump any node, until you know it doesn't need an anchor (i.e. essentially walk over the data structure twice). Since the seconds occurence offoogets discarded, there is no second reference to the data structure, and the initial occurence never needs an anchor. You can easily check that behaviour by changingfooin youryaml_str2to a key that doesn't occur in that mapping.It is however possible to force dump a loaded anchor by setting its
always_dumpattribute. There is no global option on theYAML()instance to do that, so you either need to know where the anchor is located or recursively walk the data structure:which gives:
Keeping track of
ids is necessary in any kind of data structure representation that might be self referencing. Since it takes time some "dumpers", like thejsonpackage in the standard library allow you to speed things up by specifying your data structure is not self-referencing (json.dumpdoes this by providingcheck_circular=Falseargument). Even your average__repr__should do this, as became clear when ordereddict originally was added to Python 2: it would crash on self-referential structures, (and that although the author of that change was aware of a test suite for ordereddict implementations that included tests for this)