Page 1 of 1

Exported PDF is not searchable

Posted: 2012-01-13 06:55:51
by chazzo
I've just exported a NWP document to PDF. It's wonderful to see Comments converted to sticky notes in the PDF, plus more clickable links than I can shake the proverbial stick at.

BUT ... the text of the PDF seems to be in binary format: it looks fine on screen, but if I copy a section and paste it into NWP or a text editor, I get garbage. As a result, the PDF is not searchable, and this is a pain.

The document was converted from Word, so I guess that may be the answer. Another PDF from a document created from scratch in NWP has ASCII text and is searchable.

Can anyone confirm this? Is there a workaround? Where can I read up on this issue, which I've encountered before in other contexts, such as when removing pages from PDFs and re-saving? I'm using NWP 2.0.1 on Mac PPC, MacOS 10.5.8, and viewing PDFs in Preview. Is this a Mac thing?

Charles

Re: Exported PDF is not searchable

Posted: 2012-01-13 14:26:42
by martin
The way text is encoded in a PDF is dependent on OSX and the fonts you're using; NWP itself doesn't really handle that aspect of saving a PDF.

Are you using fonts with non-standard text encodings? eg: a transliteration font from the Classic Mac days? Well, if you send us an example file, we can take a look and see what might be going wrong.

Re: Exported PDF is not searchable

Posted: 2012-01-16 03:00:28
by chazzo
Thank you Martin. The problem seems to be the Microsoft font Calibri.

I understand that this may not be Nisus's problem, but since NWP is behaving differently from TextEdit, I'd be grateful if you can shed any more light on the issue.

If I use Calibri in a newly-created Nisus document I get a PDF in which the text set in Calibri has binary encoding, so is not searchable. This happens when I use the "Save as PDF..." command and also when I choose "Print > Save as PDF..." (I'm not sure if there is any difference as far as NWP is concerned).

If I open the same document in TextEdit and print to PDF, I get plain-text encoding for the Calibri text, though the spaces seem to have disappeared.

I have v. 2.00 of Calibri, dated 2005, installed on my iMac G5 running Leopard and NWP 2.0.1. I did a clean system install just a month ago, so Calibri must have been installed by Office 2008.

I haven't checked any other MS fonts except Tahoma, which is included in the attached sample files. Unlike Calibri, Tahoma works fine. My version of Tahoma is a TT (Mac) font, whereas Calibri is "OT (TT flavored)", so I suppose that might account for the difference...

Any ideas? The document I'm working on is set in Calibri and will need to be distributed as PDF. Obviously the PDF needs to be searchable, and I don't think my client will be keen to change fonts.

Re: Exported PDF is not searchable

Posted: 2012-01-16 06:14:22
by Hamid
I cannot reproduce the problem.
There is no version number on my Calibri but it is ttf and dated 2009.
Here is the output:
fonts test.pdf.zip
(16.7 KiB) Downloaded 470 times

Re: Exported PDF is not searchable

Posted: 2012-01-16 07:11:42
by chazzo
Thanks Hamid. I confirm that your PDF is all-ASCII, though for Calibri I get $ signs instead of spaces when I paste into Textwrangler.

I tried again with some more random fonts (list below). Provisional conclusion is that OT fonts from Microsoft cause this problem. OT fonts from other sources seem OK, as do TT fonts from MS and Type 1 fonts.

Again, TextEdit behaves differently from NWP.

I tried to find a TT version of Calibri but without success. http://www.fontonic.com/download.asp?id=1142 has a version (1.05) called "CALIBRI.TTF", but FontExplorer X says it's actually OT and it behaves identically to the other version.

Charles

Cambria (OTF, MS) - binary
Consolas (OTF, MS) - binary
Candara (OTF, MS) - binary
Corbel (OTF, MS) - binary
Modern No. 20 (TT) - ASCII
Desdemona (TT) - ASCII
Adobe Caslon Pro (OTF) - ASCII
Adobe Garamond Pro (OTF) - ASCII
Andale Mono (OTF) - ASCII
Meta Plus Medium (PS Type 1) - ASCII

Re: Exported PDF is not searchable

Posted: 2012-01-16 17:06:22
by martin
chazzo wrote:I understand that this may not be Nisus's problem, but since NWP is behaving differently from TextEdit, I'd be grateful if you can shed any more light on the issue.
It looks like the underlying encoding/characters used by Calibri are changing when NWP adds document metadata (eg: your author name, date created, etc) to the PDF. To do that, the PDF goes through a second pass in Apple's PDF interpreter, which is somehow affecting the character encoding used by Calibri.

I can't explain the underlying reason for that, except that it's probably a bug in OSX or Calibri that was fixed at some point. I can reproduce the issue on OSX 10.6.8 and Calibri 2005, but the issue does not occur on OSX 10.7 (which Hamid is also using).
This happens when I use the "Save as PDF..." command and also when I choose "Print > Save as PDF..." (I'm not sure if there is any difference as far as NWP is concerned).
Theoretically there shouldn't be any difference between the commands, though such a thing is possible, so it was good to check.

Re: Exported PDF is not searchable

Posted: 2012-01-17 02:18:19
by chazzo
Thanks very much, Martin and Hamid. I couldn't easily find anything about this on the web, so I hope this thread will be useful to others. Time for Lion!

Re: Exported PDF is not searchable

Posted: 2012-01-17 13:01:31
by martin
chazzo wrote:Time for Lion!
I hope that will do it. I didn't check the version number of Calibri that was installed on the Lion machine I tested with yesterday, and I don't have access to that machine today (I can look tomorrow). It's possible that updating to Lion will not fix this, and you might need to get a later version of Calibri, though that's not my intuition.

Re: Exported PDF is not searchable

Posted: 2012-01-18 15:19:05
by martin
I placed the older version of Calibri on Lion, and the PDF's text is searchable, so upgrading to Lion should do the trick. Though I should mention spaces still turn into funny characters (for me it's "%", not "$" as with Hamid's PDF).

Re: Exported PDF is not searchable

Posted: 2012-01-19 00:55:17
by chazzo
Thanks for your careful work, Martin. I'm a fossil, and that's official!