Difficulty formatting xml file in Perl

139 Views Asked by At

I'm not really a Perl developer, but I have to parse/modify several files, so figured Perl would be a good choice for this adhoc script... So I apologize in advance, I'm probably missing an important fundamental concept (I actually had ChatGPT generate the initial script for me).

For some reason, my script isn't able to use xmllint or seemingly any other library or CLI executable to concisely (without specifying the schema of the XML) format my XML file... I'm wondering if this has anything to do with IPC and piping of output/input data (concepts I don't worry about in TypeScript, C#, etc)?

Here is some attempted code on my end:

#!/usr/bin/perl

use strict;
use warnings;

# Check if the command-line argument is provided
if (@ARGV != 1) {
    die "Usage: $0 <csproj_file>\n";
}

my $csproj_file = $ARGV[0];

# Format the entire XML file using xmllint
open my $xmllint_pipe, '|-', 'xmllint --format --recover - ' or die "Cannot open pipe to xmllint: $!";
print $xmllint_pipe $content;  # Send the original XML content to xmllint for formatting
close $xmllint_pipe;  # Close the pipe

# Write the updated and formatted content back to the file
open my $output_fh, '>encoding(utf8)', $csproj_file or die "Could not open file '$csproj_file' for writing: $!";
print $output_fh $content;  # Write the formatted content to the file
close $output_fh;  # Close the filehandle

my $formatted_content = `xmllint --format $csproj_file`;

# Write the updated and formatted content back to the file
open my $output_fh, '>encoding(utf8)', $csproj_file or die "Could not open file '$csproj_file' for writing: $!";
print $output_fh $content;  # Write the formatted content to the file
close $output_fh;  # Close the filehandle

Basic use-case: I just want to fix up the spacing/tabbing in some XML files that I'm editing via script (it's not super important, but figure I may as well if it's easy).

3

There are 3 best solutions below

1
user3773048 On

So I took @Gilles Quénot 's hint, and just used the underlying shell-scripting language on my platform (which is either cmd or PowerShell as I'm currently on Windows).

Just went for a simple solution to the XML formatting. Not the most efficient, but very easy to read. Below is the relevant modifications that I made to get it to work:

my $tmp_filepath = $csproj_file . '_tmp';

print "xmllint --format $csproj_file > $tmp_filepath";

system("xmllint --format $csproj_file > $tmp_filepath");

print "move $tmp_filepath $csproj_file";

system("move $tmp_filepath $csproj_file");

Basically: just use a temporary file, and run basic CMD commands to: first, format the XML and save in temp file; second, rename the temp filename to the original filename.

0
Gilles Quénot On

What I would do, no need using Perl at all:

#!/bin/sh

set -v # for verbosity: display executed commands
xmllint --format "$csproj_file" | sponge "$csproj_file"

sponge from more-utils to edit on the fly.

6
Boyd On

This is what you want using XML::LibXML, the standard binding for libxml2. Much as I tried, I couldn't get XML::LibXML's toString method to output indentations, so I opted for XML::LibXML::PrettyPrint

#!/usr/bin/perl

use strict;
use warnings;
use XML::LibXML;
use XML::LibXML::PrettyPrint;

my $csproj_file = shift @ARGV;
die "Usage: $0 <csproj_file>\n" unless $csproj_file;

my $doc = XML::LibXML->new->parse_file($csproj_file);
my $pp  = XML::LibXML::PrettyPrint->new(indent_string => '  ');

$pp->pretty_print($doc); # modified in-place

$doc->toFile($csproj_file);