Blag o' dkam

p0wning tubez

Archive for the ‘hpricot’ Category

Consistency

without comments

Well, apparently I’ve not been consistent enough — an important quality when one bags out others for being inconsistent. So — I’ve updated all references to “Bookie” to the now correct “Booko”.

While I was at it I updated the scraping code for Fishpond who have again updated their site. At least they’ve updated it for the better. Compare and contrast the old and new Hpricot XPath code for grabbing the book title and author.

The Old:

book.title     = (doc/"table/tr/td/div/h1").inner_html
book.author    = (doc/"table/tr/td/p[2]/a/font/u").inner_html

The New:

book.title     = (doc/"h1#product_title").first.inner_html
book.author    = (doc/"p#product_author/a").first.inner_html

Much nicer!

Written by dkam

May 28th, 2008 at 11:20 pm