Worked around an annoying little bug in Plucker

I noticed that occasionally, when I "distilled" pages using Plucker, I would see some ugly characters interspersed within the text - things like the copyright symbol next to the Euro symbol. These characters were very obviously not part of the original text, did not show up when I viewed the page in a web browser, and were mostly single and double quotes.

Did some digging, and found that even though I forced the charset in plucker-build thusly:

plucker-build --doc-name="Overclocked-Cory Doctorow" --doc-file=plkr-overclocked \
--pluckerdir=palmos/toinstall --\
home-url="~/overclocked.html" --bpp=0 --charset=iso8859-1

I would still see those nasty characters.

Did some more digging, and found a workaround in this Plucker bug: 1382: UTF-8 incorrectly distilled (causes problems with ESR's pages).

I edited my system's TextParser.py module like this:

sudo joe /usr/lib/python2.4/site-packages/PyPlucker/TextParser.py

and on the next plucker-build call, I got normal quotations marks. The file looked good!

As the workaround author noted, it is an ugly hack, but it also worked. So here's to someone coding a reasonable solution!


Written by Andrew Ittner in misc on Wed 07 February 2007. Tags: palmos, python