lovingboth: (Default)
[personal profile] lovingboth
When converting these to HTML, it doesn't look like it can cope with some ligatures - using a special character to represent particular combinations of two letters: '&' is the most common example (it's 'et', the Latin for 'and') and 'æ' would be another example. Google is ok with these.

But some programs use the ability in some fonts to do this for combinations of letters that otherwise clash visually like fi, fl, ft, tt etc. And google isn't ok with these.

So "This will be the best chance in fifty years to change things for the better" can become, in google's eyes, "This will be the best chance in fi y years to change things for the be er"!

It certainly can't cope with images in PDFs, which is odd. And text at an angle produces some interesting effects...

How often do people use this feature? Is it worth setting up the PDF to be usable in this way, or do people look at / print PDFs directly?

(no subject)

Date: 2004-03-30 02:00 pm (UTC)
From: [identity profile] a-musing-amazon.livejournal.com
I'd assume the main purpose in google htmling a pdf was to make the text readable to its word indexing software (it also, sometimes, allows you to read something in google's cache that has otherwise been deleted in the orginal).

Profile

lovingboth: (Default)
Ian

July 2025

S M T W T F S
  12345
6 789101112
13141516171819
20212223242526
2728293031  

Most Popular Tags

Active Entries

Style Credit

Expand Cut Tags

No cut tags