novice perl programmer, trying convert simple xml string tab delimited text file. struggled using xml::parser (and xml::twig/simple , xslt), couldn't figure out how main data parts column headings.

then started trying xslt, can't figure out how separator between elements -- (then use split and/or join?) run in 1 string.

i manually printed column headings manually. there easy way template?

  1. what's easiest way this, generally, , should using xslt (which i've been trying understand).

  2. how can fix below this?

it seems i'm close need delimiter xslt output string can split , join "\t" in output tab-delimited text file. ??

this xml (sms logs twilio):

  <?xml version="1.0" encoding="utf-8"?>   <twilioresponse>      <smsmessages end="49" firstpageuri="/2010-04-01/accounts/accbaa0/sms/messages?page=0&amp;pagesize=50" lastpageuri="/2010-04-01/accounts/accbaa/sms/messages?page=54&amp;pagesize=50" nextpageuri="/2010-04-01/accounts/accbaa0103c/sms/messages?page=1&amp;pagesize=50&amp;aftersid=smc20cf7" numpages="55" page="0" pagesize="50" previouspageuri="" start="0" total="2703" uri="/2010-04-01/accounts/accbaa0103cf/sms/messages">         <smsmessage>            <sid>sme24eb108b7eb6a3b</sid>            <datecreated>fri, 09 aug 2013 00:07:59 +0000</datecreated>            <dateupdated>fri, 09 aug 2013 00:07:59 +0000</dateupdated>            <datesent>fri, 09 aug 2013 00:07:59 +0000</datesent>            <accountsid>accbaa0103c4141e5cd754042cb424d4ff</accountsid>            <to>+14444444444</to>            <from>+15555555555</from>            <body>hi there!</body>            <status>sent</status>            <direction>outbound-api</direction>            <price>-0.01000</price>            <priceunit>usd</priceunit>            <apiversion>2010-04-01</apiversion>            <uri>/2010-04-01/accounts/accbaa01/sms/messages/sme24eb108b</uri>         </smsmessage>         <smsmessage>             ... etc. ...         </smsmessage>      </smsmessages>   </twilioresponse> 

this xslt trying use:

   <?xml version="1.0" encoding="iso-8859-1"?>    <xsl:stylesheet version="1.0" xmlns:xsl="" xmlns:xs="" exclude-result-prefixes="xs">    <xsl:template match="//twilioresponse">    <xsl:for-each select="smsmessage">        <xsl:value-of select="sid"/>        <!-- tried these, too: &#x20   &#x9;  &#xa;   -->        <xsl:text>&#09;</xsl:text>        <!-- tried question -->        <xsl:if test="position() != last()">, </xsl:if>        <xsl:value-of select="datecreated"/>        <xsl:text>&#x9;</xsl:text>        <xsl:value-of select="dateupdated"/>        <xsl:text>&#09;</xsl:text>        <xsl:value-of select="datesent"/>        <xsl:text>&#xa;</xsl:text>        <xsl:value-of select="accountsid"/>        <xsl:text>&#09;</xsl:text>        <xsl:text>&#xa;</xsl:text>        <xsl:text>&#x20;</xsl:text>        <xsl:text>&#x9;</xsl:text>        <xsl:value-of select="to"/>        <xsl:text>&#x9;</xsl:text>        <xsl:value-of select="from"/>        <xsl:text>&#x9;</xsl:text>        <xsl:value-of select="body"/>        <xsl:text>&#x9;</xsl:text>        <xsl:value-of select="status"/>        <xsl:text>&#x9;</xsl:text>        <xsl:value-of select="direction"/>        <xsl:text>&#x9;</xsl:text>        <xsl:value-of select="price"/>        <xsl:text>&#x9;</xsl:text>        <xsl:value-of select="priceunit"/>        <xsl:text>&#x9;</xsl:text>        <xsl:value-of select="apiversion"/>        <xsl:text>&#x9;</xsl:text>        <xsl:value-of select="uri"/>        <!-- tried both of these: line feed char -->        <xsl:text>&#xa;</xsl:text>        <xsl:text>&#10;</xsl:text>      </xsl:for-each>    </xsl:template>  </xsl:stylesheet> 

and relevant part of perl code:

use xml::xslt;  $logs = $twilio -> ('sms/messages'); $string = $logs->{content};  $xsl = 'xsl.txt'; $xslt = xml::xslt->new ($xsl); $xslt->transform ($string); $xslttostring = $xslt->tostring;      print $xslttostring;  $columnheadings = "sid\tdatecreated\tdateupdated\tdatesent\taccountsid\tto\tfrom\tbody\tstatus\tdirection\tprice\tpriceunit\tapiversion\turi\n";  open(my $fh, '>', 'textfile.txt') || die("unable open file. $!");     print $fh  $columnheadings;     foreach $k (@split) {         print $fh join("\t", $xslttostring) . "\t";     }                #print $fh split("\t", $val). "\t"; ; close($fh); $xslt->dispose();   # p.s. i'm sure there's better way check , see how many lines saved.  $xmllines = 0; open $fh, '<', 'textfile.txt' or die "could not open file. $!";    while (<$fh>) {       $xmllines++;    } print ("\n" . $xmllines . " lines saved tab-delimited logs textfile. \n");    close $fh;   

i'd think xslt wrong tool problem: awesome xml→xml transformations, verbose xml→csv transformation. instead of applying xslt style, can use perl’s xml::libxml module or comparable parse xml , apply xpath queries, , text::csv emit data file.

use strict; use warnings; use autodie; use xml::libxml; use text::csv;  # parse xml $xml = xml::libxml->load_xml(string => ...);  # prepare csv open $csv_fh, ">:utf8", "textfile.csv"; $csv = text::csv->new({   binary => 1,   eol => "\n",   # sep_char => "\t", # tab separation. default comma   # quote_space => 0, # makes tab seperated data better. });  @columns = qw/   sid   datecreated  dateupdated  datesent   accountsid     body   status   direction   price  priceunit   apiversion   uri /;  $csv->print($csv_fh, \@columns);  # print header  # loop through messages. note `print` wants arrayref. $sms ($xml->findnodes('//smsmessage')) {   $csv->print($csv_fh, [ map { $sms->findvalue("./$_") } @columns ]); } 


sid,datecreated,dateupdated,datesent,accountsid,to,from,body,status,direction,price,priceunit,apiversion,uri sme24eb108b7eb6a3b,"fri, 09 aug 2013 00:07:59 +0000","fri, 09 aug 2013 00:07:59 +0000","fri, 09 aug 2013 00:07:59 +0000",accbaa0103c4141e5cd754042cb424d4ff,+14444444444,+15555555555,"hi there!",sent,outbound-api,-0.01000,usd,2010-04-01,/2010-04-01/accounts/accbaa01/sms/messages/sme24eb108b ,,,,,,,,,,,,, 

or tab-separated version:

sid     datecreated     dateupdated     datesent        accountsid              body   status   direction       price   priceunit       apiversion      uri sme24eb108b7eb6a3b      fri, 09 aug 2013 00:07:59 +0000 fri, 09 aug 2013 00:07:59 +0000 fri, 09 aug 2013 00:07:59 +0000 accbaa0103c4141e5cd754042cb424d4ff      +14444444444    +15555555555   hi there!        sent    outbound-api    -0.01000        usd     2010-04-01      /2010-04-01/accounts/accbaa01/sms/messages/sme24eb108b 

(last line not show)

note using csv separator char bad idea: happens when message contains newlines or tabs? basic gsm 03.38 charset includes @ least lf , cr characters.

edit: further explanations

the \ reference operator, \@columns array reference pointing @columns array.

the map function takes block of code , list. foreach loop, executes block each value in list. in each iteration, $_ variable set current element. unlike foreach loop, map returns list of values. makes suitable transformations. e.g double numbers:

my @doubles = map { $_ * 2 } 1 .. 5; #=> 2, 4, 6, 8, 10 

the findvalue method of dom nodes applies xpath expression in context of node , returns text value of found element. xpath expression ./foo equivalent foo, , searches child element called foo. use $_ variable denote column name/tag name. map expression

map { $sms->findvalue("./$_") } @columns 

transforms list of columns list of text values. used form ./foo xpath expression because think better conveys meaning “give me immediate child (/) tag name foo of this sms (.)”, when 1 used notation of file paths.

the [ ... ] operator way create array reference list inside. e.g. [1, 2, 3] shortcut for

  @temp = (1, 2, 3);   \@temp; 

(note \ operator again).


