I am dealing with an xml like this, with many subnodes. The same attribute name may be used in different node levels.
<reporte>
<reporte_cabezal fecha_generado="08/05/2023 19:02" cant_compras="1539"/>
<reporte_dato>
<compra id_compra="1022855" id_ucc="65" num_compra="3">
<items>
<item nro_item="1" cant_pedida="1200.00" id_articulo="26058">
<atributos_item>
<atributo_item id_prop_atributo="4" desc_prop_atributo="TIPO">
<atributo_valores>
<atributo_valor valor_texto="DOBLE ENVOLTORIO"/>
</atributo_valores>
<atributo_item id_prop_atributo="5" desc_prop_atributo="MARCA">
</atributos_item>
<item nro_item="2" cant_pedida="1300.00" id_articulo="26048">
</items>
<aclaraciones_lla>
<aclaracion texto="PARA"/>
<aclaracion texto="Acta de Apertura" fecha="21/04/2023 12:31"/>
</aclaraciones_lla>
</compra>
<compra...
...
</reporte_dato>
</reporte>
I am trying to get a data frame for each node, with some sort of key/keys that would allow for matching.
I wrote this function, which works returns each node as a df:
xml_to_tibbles <- function(xml) {
xml_object <- xml %>%
read_xml()
nodes <- xml_object %>%
xml_find_all("//*") %>%
xml_name(data) %>%
unique()
nodes %>%
map(function(node) {
xml_object %>%
xml_find_all(paste0("//", node)) %>%
map(xml_attrs) %>%
bind_rows() %>%
clean_names()
}) %>%
set_names((make_clean_names(nodes))) %>%
keep(~ ncol(.x) > 0) %>%
return()
}
However, I don't have anyway to match the rows of each df to the rows of the parent/child nodes df because I lack keys.
The first attribute of each node works as a key. I am looking for a way to add to the dfs the first attribute of all the ancestor nodes. Or perhaps there is a much better way of achieving what I am looking for.