Clear content in xml brackets in all files in directory tree on Windows using Strawberry Perl and twig

Question

I want to clear whole content that is placed inside of <loot> </loot> elements in XML files in a directory tree. I am using Strawberry Perl for windows 64 bit.

For example this XML file:

<?xml version="1.0" encoding="UTF-8"?>
<monster name="Dragon"/>
<health="10000"/>
<immunities>
   <immunity fire="1"/>
</immunities>
<loot>
<item id="1"/>
  <item id="3"/>
      <inside>
        <item id="6"/>
      </inside>
  </item>
</loot>

The changed file should look:

<?xml version="1.0" encoding="UTF-8"?>
<monster name="Dragon"/>
<health="10000"/>
<immunities>
   <immunity fire="1"/>
</immunities>
<loot>
</loot>

I have this code:

#!/usr/bin/perl
use warnings;
use strict;

use File::Find::Rule;
use XML::Twig;

sub delete_loot {
   my ( $twig, $loot ) = @_;
   foreach my $loot_entry ( $loot -> children ) {
      $loot_entry -> delete;
   }
   $twig -> flush;
}

my $twig = XML::Twig -> new ( pretty_print => 'indented', 
                              twig_handlers => { 'loot' => \&delete_loot } ); 

foreach my $file ( File::Find::Rule  -> file()
                                     -> name ( '*.xml' )
                                     -> in ( 'C:\Users\PIO\Documents\serv\monsters' ) ) {

    print "Processing $file\n";
    $twig -> parsefile_inplace($file); 
}

But it edits correctly only the first file it meets and the rest files leaves clear (0 kb clear files)


Show source
| xml   | perl   | xml-twig   2017-01-02 12:01 2 Answers

Answers to Clear content in xml brackets in all files in directory tree on Windows using Strawberry Perl and twig ( 2 )

  1. 2017-01-05 12:01

    UPDATE   With just flush changed to print the posted code works for me (with valid XML).

    However, I still recommend the code below, either version.

    Note   Both versions below were tested with two groups of valid XML files. Replaced flush with print. Emptying loot is simplified -- remove it altogether and then add it back, empty.


    When XML::Twig->new(...) is set first and then files looped over and processed, I get the same behavior. The first file is processed correctly, the others completely blanked.

    The reason may have something to do with new being a class method. However, I don't see why this needs to affect handling of multiple files. The callback is installed outside of the loop, but I've tested with it being re-installed for each file and it doesn't help.

    Finally, flush-ing isn't needed while it may well hurt here, by clearing the state (which was created by the class method new). This doesn't affect code below, but it is still replaced by print.

    Then just do everything in the loop. A simple version

    use strict;
    use warnings;
    use File::Find::Rule;
    use XML::Twig;
    
    my @files = File::Find::Rule->file->name('*.xml')->in('...');
    
    foreach my $file (@files)
    {
        print "Processing $file\n";
        my $t = XML::Twig->new( 
            pretty_print => 'indented', 
            twig_handlers => { 
                loot => sub { 
                    my $parent = $_->parent;              # fetch parent
                    $_->delete;                           # delete element
                    $parent->insert_new_elt('loot', '');  # create new, empty
                }, 
            },
        );
        $t->parsefile_inplace($file)->print;
    }
    

    The callback code is simplified, to remove the element altogether and then add it back, empty.

    We can avoid calling new in the loop by using another class method, nparse. Also, the code for emptying the element is now moved into a sub, and does not need its name hardcoded.

    my $t = XML::Twig->new( pretty_print => 'indented' );
    
    foreach my $file (@files) 
    {
        print "Processing $file\n";
        my $tobj = XML::Twig->nparse( 
            twig_handlers => { loot => \&clear_elt }, $file
         );
         $tobj->parsefile_inplace($file)->print;
     }
    
     sub clear_elt {
         my ($t, $elt) = @_; 
         my $elt_name = $elt->name;                # get the name
         my $parent = $elt->parent;                # fetch the parent
         $elt->delete;                             # remove altogether
         $parent->insert_new_elt($elt_name, '');   # add it back
     }
    

    We do have to first call the new constructor, even as it isn't directly used in the loop.


    Note that calling new before the loop without twig_handlers and then setting handlers inside

    $t->setTwigHandlers(loot => sub { ... });
    

    does not help. We still only get the first file processed correctly.

  2. 2017-01-05 14:01

    The XML::Twig doc says that "Multiple twigs are not well supported" (http://search.cpan.org/dist/XML-Twig/Twig.pm#TODO). If you look at the state of the twig object (using Data::Dumper for example) you see a strong difference between the first and subsequent runs. It looks like it considers that is has been totally flushed already (which is true, as there was a complete flush during the first run). It probably has nothing more to print for the subsequent files and the file ends up empty.

    Recreating the twig object at each loop worked for me:

    #!/usr/bin/perl
    use warnings;
    use strict;
    
    use File::Find::Rule;
    use XML::Twig;
    
    sub delete_loot {
       my ( $twig, $loot ) = @_;
       foreach my $loot_entry ( $loot -> children ) {
            $loot_entry -> delete;
        }
    }
    
    foreach my $file ( File::Find::Rule  -> file()
                                         -> name ( '*.xml' )
                                         -> in ( '/home/dabi/tmp' ) ) {
    
        print "Processing $file\n";
        my $twig = XML::Twig -> new ( pretty_print => 'indented', 
                                      twig_handlers => { loot => \&delete_loot, } ); 
        $twig -> parsefile($file); 
        $twig -> print_to_file($file);
    }
    

    Also, I had to change the XML file structure to have it processed:

    <?xml version="1.0" encoding="UTF-8"?>
    <monster name="Dragon">
    <health value="10000"/>
    <immunities>
       <immunity fire="1"/>
    </immunities>
    <loot>
    <item id="1"/>
      <item id="3">
          <inside>
            <item id="6"/>
          </inside>
      </item>
    </loot>
    </monster>
    

Leave a reply to - Clear content in xml brackets in all files in directory tree on Windows using Strawberry Perl and twig

◀ Go back