Hpricot <text>sometext</text> workaround

As noted by the open trouble ticket here, The most awesome Hpricot seems to have come down with a bug, in that it’s not able to access “sometext” inside this: “<text>sometext</text>” It parses it ok (puts.doc.inspect definately shows the proper {elem}) you just cant get to it. So heres my ugly little hack/workaround for this issue until it’s resolved. (I’m posting it here, since I cant seem to signup to make a comment on the bug report on the Hpricot home page… and someone might find this useful) This hack is specifically for web documents, however would also work for strings or files with only minor tweaks.

## Begin hack
doc = “”
open(url) do |f|
doc=doc + f.read
end
doc = doc.gsub(/<text>/, “<mtext>”)
doc = doc.gsub(/<\/text>/, “</mtext>”)
doc = Hpricot(doc)
## Should be one line
## doc = Hpricot(open(url))
## End hack

Leave a Reply