Page 1 of 1

Rtf encoding for non-ASCII characters

Posted: 2013-10-08 14:14:18
by alanterra
I am confused about why Nisus uses certain character-codes to indicate non-ascii characters in rtf documents. (No, I haven't read the rtf spec, and that might answer my question).

If I type "Cardón" into Scrivener, it represents it as "Card\'f3n", and "Vázquez" as "V\'e1zquez". However, Nisus represents these words as "Card\u243 ?n" and "V\u225 ?zquez". And sometimes Nisus inserts a carriage return right into the middle of the word before the question mark.

As far as I can tell, both are valid rtf code. But my problem is that Nisus' encoding causes problems when I try to process a file using Sente (to generate a bibliography). The question marks cause lots of problems for Sente's rtf processing engine. Yes, I know that this is (probably) Sente's fault, but it would be wonderful if there were a switch in Nisus that would allow me to control how non-ascii characters are represented. Is there one?

Re: Rtf encoding for non-ASCII characters

Posted: 2013-10-09 13:44:56
by martin
Those are indeed both valid ways of representing characters in RTF. An escape sequence like "\'f3" specifies a byte dependent on a particular text encoding (eg: Windows-1252, Mac Roman, Shift JIS, etc) while a sequence like "\u243 ?" encodes the (globally unique) Unicode code point for the character.

These days Unicode is the canonical method for exchanging text characters. It's more robust because there's no chance for the receiving application to misunderstand (or simply not implement) a particular text encoding. For that reason NWP prefers to use Unicode when saving RTF files.

If you want to try and fiddle with how NWP emits special characters, you can adjust the font that's applied to your text. RTF essentially links the text encoding to the RTF font table. If the applied font's "most compatible text encoding" is different, it can affect which characters are emitted using Unicode escapes. This incidentally is another reason NWP prefers to use Unicode escapes: if an app is reading an RTF file and incorrectly determines the font it can errantly affect the text encoding, garbling your text.

I hope that helps, though I should say trying to affect this may be futile. The process for deciding how to emit a particular sequence of characters is relatively complex, and you might get Unicode escapes no matter what you try.

Re: Rtf encoding for non-ASCII characters

Posted: 2013-10-09 13:46:59
by alanterra
Thanks, Martin. I guess I'll take this up with Sente and see if they can get it to process Unicode escape sequences.