JENNIE: Regular expression to extract text from XML-ish data using GNU sed

Wednesday, 7 August 2013

Regular expression to extract text from XML-ish data using GNU sed

Regular expression to extract text from XML-ish data using GNU sed

I have a file full of lines extracted from an XML file using "gsed regexp
-i FILENAME". The lines in the file are all of one of either format:
<field number='1' name='Account' type='STRING'W/>
<field number='2' name='AdvId' type='STRING'W>
I've inserted a 'W' in the end which represents optional whitespace. The
order and number of properties are not necessarily the same in all lines
throughout the file although "number" is always before "type".
What I'm searching for is a regular expression "regexp" that I can give to
gnu sed so that this command:
gsed regexp -i FILENAME
gives me a file with lines looking like this:
1 STRING
2 STRING
I don't care about the amount of whitespace in the result as long as there
is some after the number and a newline at the end of each line.
I'm sure it is possible, but I just can't figure out how in a reasonable
amount of time. Can anyone help?
Thanks a lot, jules

JENNIE

Wednesday, 7 August 2013

Regular expression to extract text from XML-ish data using GNU sed

No comments:

Post a Comment