[LUNI] Scripting help (bash/perl)
Joe Frost
joe at the-frosts.org
Sun Oct 22 14:46:20 CDT 2006
On Sun, 2006-10-22 at 12:17 -0500, Branko Kotur wrote:
> Basically, what I need from the HTML file is the URL located in the IMG SRC
> and then the ALT text. All I need is the actual URL, (well, just the picture
> name), not the whole IMG tag or the quotes around it. Same with the ALT
> text.
Beware. This isn't pretty, but maybe it can get you started.
$ more test.html
<html>
<head><title>test</title></head>
<body>
some text with some pictures<br>
<img src="images/joe.jpg" alt="a picture of me, joe"><br>
<p>more text</p>
and another picture <img src="images/joe2.jpg" alt="i'm so handsome">
</body>
</html>
$ grep img test.html |\
sed -e 's/.*src="\(.*\)".*alt="\(.*\)".*/\1 \2/'|\
awk -F/ '{print $NF}'
joe.jpg a picture of me, joe
joe2.jpg i'm so handsome
First I grab the lines from the file with img in them (hopefully, that's
only the <img src="blah"> stuff. Then with sed pull out the stuff
between quotes after src and alt and separate them with a tab. Finally
print the final field with awk (everything after the last /).
Good luck,
JOe
More information about the luni
mailing list