Skip to main content

Sucking XML in Perl

I'm working on some XML stuff in Perl, and took quite some time to explore the available XML parsers in Perl. There are quite a lot of them, so choosing the right one isn't allways easy :


* XML::Twig looks quite interesting and powerfull, but has quit a steap learning curve due to the imo cumbersome interface, espacially the twighandler stuff. Not intended if you want to do something quickly in Perl, unless you have the Twig experience.


* XML::Simple seemed a good tool so I started implementing using this module, but the performance is horrible : it took 14 seconds to parse a document around 3500 lines big, which took me straight back to the drawing board.


* XML::Parser was close to what I wanted, but the output was too cluttered, certainly if you're working with more complicated XML files.


* XML::LibXML is a module specifically built to corner some of the performance issues of XML::Simple, but it was built around the Gnome XML libraries, which weren't available on the HP-UX 11.11i server.


* XML::Smart seemed a great and intuitive interface for my problem. Unfortunately it isn't available in the default Perlmods, this could not be used. As it had quite some dependancies, installing it on the server wasn't an option.


There are also the SAX modules, but they seem more stuff you want to use when you're working in framework related stuff, and the SAX thing seemed too much of a burden to carry around (the program wasn't that big after all...).


So what did I do then ? As far as speed goes, the reality seems to be: a regexp-based, non-XML parser is going to be faster than a "close to the metal" parser (XML::Parser or XML::LibXML), which is going to be faster than a more convenient parser (XML::Simple, XML::Twig) which is going to be faster than a pipeline involving passing events througth various object-oriented layers (XML::SAX). So with that in mind, I implemented my own regexp based XML parser. It took to my surprise only half an hour to get it working, and was about 1500% faster than XML::Simple.