Reply to topic  [ 3 posts ] 
Strange things happen when opening a PDF document into NWP 
Author Message

Joined: 2014-12-02 12:29:50
Posts: 4
I have thousands of documents from previous word processors that I converted into PDFs so that I can search the entire text with DEVONThink, or Spotlight. When I open an old PDF document into NWP, some very strange things happen:

---about one in ten lines, but not all, have lost the spacing between words. Other lines are fine, but I spend hours reinserting spaces between words in random lines. Very odd.
---even odder: NWP will not fully display words that have a "ff" or an "fi" in them, such as "first" or "effect." I have to go through the document and type in the missing "ff" and "fi". This seems like a terribly specialized bug.

I didn't expect to preserve paragraph formatting or other niceties when going from PDF to NWP, but these bugs seem like weirdnesses beyond the pale of weird. Has anyone else run into this? Anyone found a solution? Exorcism, perhaps?


2016-01-23 21:18:57
Profile

Joined: 2006-12-08 00:46:44
Posts: 416
Location: London or Exeter, UK
kkatzmar wrote:
I have thousands of documents from previous word processors that I converted into PDFs so that I can search the entire text with DEVONThink, or Spotlight. When I open an old PDF document into NWP, some very strange things happen:

---about one in ten lines, but not all, have lost the spacing between words. Other lines are fine, but I spend hours reinserting spaces between words in random lines. Very odd.
---even odder: NWP will not fully display words that have a "ff" or an "fi" in them, such as "first" or "effect." I have to go through the document and type in the missing "ff" and "fi". This seems like a terribly specialized bug.

I didn't expect to preserve paragraph formatting or other niceties when going from PDF to NWP, but these bugs seem like weirdnesses beyond the pale of weird. Has anyone else run into this? Anyone found a solution? Exorcism, perhaps?

It might help diagnosis if you said how you "open an old PDF document" in NWP; do you open it from the "Open" command in the 'File' menu, or do you copy the text in the PDF and then paste it into an NWP document? If the latter, do you just use "Paste" or do you use "Paste Text Only"? Does any of those methods make a difference?

On the latter point, I would suspect that, in NWP, you are using a font that doesn't have the glyphs for ligatures like 'ff' and 'fi'. They have a totally different code-point to 'f' and 'i', so, if I'm right, they are not displaying as the font in use in NWP is blank for that code point. Try highlighting the whole text and changing the font to one that does have the ligatures.

Mark


2016-01-24 04:37:46
Profile
Official Nisus Person
User avatar

Joined: 2002-07-11 17:14:10
Posts: 4251
Location: San Diego, CA
Mark's suggestion to try different ways of getting your PDF content into NWP is a good one. Opening the PDF in Preview and using copy-paste might prove to be more effective than opening a PDF directly in NWP. That's because opening a PDF file in NWP directly extracts just the text that the system (OSX) provides "for free" to all applications.

It's very nice of Apple to provide some PDF import to all apps, but the quality can be lacking. It really varies greatly depending on the PDF. The text from some PDFs comes through very nicely, while others leave a lot to be desired– especially PDFs created by a scan or OCR. I've also seen text encoding troubles depending on the fonts used in the PDF, or if the PDF was ever resaved by Apple's Preview. For example, adding just a simple red circle or comment to a PDF can disrupt some of the encoded characters. The text might appear correctly on screen, but the underlying character codes are no longer valid for operations like copy-paste.


2016-01-25 15:25:04
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 3 posts ] 

Who is online

Users browsing this forum: No registered users and 7 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software