[LUNI] Scripting help (bash/perl)

Joe Frost joe at the-frosts.org
Sun Oct 22 14:46:20 CDT 2006


On Sun, 2006-10-22 at 12:17 -0500, Branko Kotur wrote:

> Basically, what I need from the HTML file is the URL located in the IMG SRC 
> and then the ALT text.  All I need is the actual URL, (well, just the picture 
> name), not the whole IMG tag or the quotes around it.  Same with the ALT 
> text.  

Beware.  This isn't pretty, but maybe it can get you started.


$ more test.html
<html>
<head><title>test</title></head>
<body>
some text with some pictures<br>
<img src="images/joe.jpg" alt="a picture of me, joe"><br>
<p>more text</p>
and another picture <img src="images/joe2.jpg" alt="i'm so handsome">
</body>
</html>


$ grep img test.html |\
sed -e 's/.*src="\(.*\)".*alt="\(.*\)".*/\1        \2/'|\
awk -F/ '{print $NF}'
joe.jpg a picture of me, joe
joe2.jpg        i'm so handsome



First I grab the lines from the file with img in them (hopefully, that's
only the <img src="blah"> stuff.  Then with sed pull out the stuff
between quotes after src and alt and separate them with a tab.  Finally
print the final field with awk (everything after the last /).

Good luck,
JOe





More information about the luni mailing list