Archive for the ‘hpricot’ Category
Well, apparently I’ve not been consistent enough — an important quality when one bags out others for being inconsistent. So — I’ve updated all references to “Bookie” to the now correct “Booko”.
While I was at it I updated the scraping code for Fishpond who have again updated their site. At least they’ve updated it for the better. Compare and contrast the old and new Hpricot XPath code for grabbing the book title and author.
book.title = (doc/"table/tr/td/div/h1").inner_html book.author = (doc/"table/tr/td/p/a/font/u").inner_html
book.title = (doc/"h1#product_title").first.inner_html book.author = (doc/"p#product_author/a").first.inner_html