lovingboth: (Default)
Ian ([personal profile] lovingboth) wrote2004-03-30 04:00 pm

Google & PDFs, partly a note to myself

When converting these to HTML, it doesn't look like it can cope with some ligatures - using a special character to represent particular combinations of two letters: '&' is the most common example (it's 'et', the Latin for 'and') and 'æ' would be another example. Google is ok with these.

But some programs use the ability in some fonts to do this for combinations of letters that otherwise clash visually like fi, fl, ft, tt etc. And google isn't ok with these.

So "This will be the best chance in fifty years to change things for the better" can become, in google's eyes, "This will be the best chance in fi y years to change things for the be er"!

It certainly can't cope with images in PDFs, which is odd. And text at an angle produces some interesting effects...

How often do people use this feature? Is it worth setting up the PDF to be usable in this way, or do people look at / print PDFs directly?

Post a comment in response:

(will be screened)
(will be screened if not validated)
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

If you are unable to use this captcha for any reason, please contact us by email at support@dreamwidth.org