Worked around an annoying little bug in Plucker
I noticed that occasionally, when I "distilled" pages using Plucker, I would see some ugly characters interspersed within the text - things like the copyright symbol next to the Euro symbol. These characters were very obviously not part of the original text, did not show up when I viewed the page in a web browser, and were mostly single and double quotes.
Did some digging, and found that even though I forced the charset in plucker-build thusly:
plucker-build --doc-name="Overclocked-Cory Doctorow" --doc-file=plkr-overclocked \ --pluckerdir=palmos/toinstall --\ home-url="~/overclocked.html" --bpp=0 --charset=iso8859-1
I would still see those nasty characters.
Did some more digging, and found a workaround in this Plucker bug: 1382: UTF-8 incorrectly distilled (causes problems with ESR's pages).
I edited my system's TextParser.py module like this:
sudo joe /usr/lib/python2.4/site-packages/PyPlucker/TextParser.py
and on the next plucker-build call, I got normal quotations marks. The file looked good!
As the workaround author noted, it is an ugly hack, but it also worked. So here's to someone coding a reasonable solution!