converting Classic fonts with non-standard encodings

Get help using and writing Nisus Writer Pro macros.
Post Reply
User avatar
martin
Official Nisus Person
Posts: 5227
Joined: 2002-07-11 17:14:10
Location: San Diego, CA
Contact:

converting Classic fonts with non-standard encodings

Post by martin »

One issue that can come up for Classic users is converting documents that make use of fonts that have non-standard encodings. These fonts basically co-opt regular Mac Roman characters and displays something else instead. This is far from ideal, since losing/changing that font changes the meaning of your text, likely producing gibberish.

The best is for all characters to be encoded in absolute terms (Unicode), independent from the font that is applied. The trouble is converting old documents. Nisus Writer Pro will automatically convert the fonts we know of that make use of standard encodings (eg: the "AB Geeza" font uses the Mac Arabic encoding). But this doesn't cover fonts with non-standard encodings, like the popular "METimes" (Mid-East Times).

Now that NWP 1.1 preserves unavailable fonts when opening files, it's possible to write macros that can convert these old font automatically. To that end we thought it would be nice to provide templates for macro authors who want to create macros to do the font transliteration. If you take one of the template files you have to do two things:

1. Fill in the mapping table for the font. Eg: for each ASCII character that displays as something else, add it to the macro using lines like:

Code: Select all

$map{'*'} = '〇'
$map{'!'} = '〄'
2. Apply the old font (the one to be converted) to the Find command in the macro.

Both of these steps are highlighted in the template macro file comments.

Also included with the templates is an example macro "NSAramian to Unicode" which converts text in the Armenian "NSAramian" font to Unicode. Thanks to Nerses Boyadjian for providing the character mapping.

If anyone does author macros to cover additional fonts that others may be using, it would be nice to add them to our macro repository once 1.1 is released. We'd appreciate receiving copies of such macros, submitted through the usual menu Help > Send Feedback.
Nobumi Iyanaga
Posts: 158
Joined: 2007-01-17 05:46:17
Location: Tokyo, Japan
Contact:

Post by Nobumi Iyanaga »

Hello Martin,

This is a very good idea! I worked a lot in this area. See my web page "East Asian Diacritical Fonts and Unicode" http://www.bekkoame.ne.jp/~n-iyanag/res ... icode.html and "Classic Nisus Writer to Nisus Writer Express/Pro" http://www.bekkoame.ne.jp/~n-iyanag/res ... tonwe.html, but with the new macro command, the same thing could be done more easily.

I will send you by a separate mail a Classic Nisus file, "Times_keyboard_map", which displays all the 255 characters (especially those from 128 to 255) in Times; changing the font of this file, you will see all the glyphs of a specific font; and this will help you to make your transliteration map.
Best regards,

Nobumi Iyanaga
Tokyo,
Japan
Nobumi Iyanaga
Posts: 158
Joined: 2007-01-17 05:46:17
Location: Tokyo, Japan
Contact:

Post by Nobumi Iyanaga »

Hello Martin,

I am trying to make a mapping table for a transliteration font for East Asian languages, but found a problem: that is, there are many glyphs that can be only obtained using combining characters. For example, U+0072 + U+0325 + U+0304, etc. I think with the new "$text.transliterateInRange($range, $map)", you cannot use a mapping table with such values...

In Perl, we used to use a formula like the following:

Code: Select all

s/($char)/exists $map{$1} ? $map{$1} : $1/geo;
(extracted from a code by Kino).

Without a command like this, it would be impossible to address this kind of cases.

This brings another important issue, that of surrogate paired characters... I know these are Unicode problems, and not yours, but you would have to deal with them.
Best regards,

Nobumi Iyanaga
Tokyo,
Japan
User avatar
martin
Official Nisus Person
Posts: 5227
Joined: 2002-07-11 17:14:10
Location: San Diego, CA
Contact:

Post by martin »

You're quite right Nobumi, I'll see about changing the "transliterateInRange" command so the map values (not keys) can be arbitrary strings (to allow for both composed character sequences and code points that require surrogate pairs). Thanks for pointing that out.
Nobumi Iyanaga
Posts: 158
Joined: 2007-01-17 05:46:17
Location: Tokyo, Japan
Contact:

Post by Nobumi Iyanaga »

Hello Martin,

I tried to write a conversion map for the font "Norman" using your "NSAramian to Unicode.nwm" as template. It seems to work more or less -- I have not yet checked the result very closely. Anyway, I will re-write this macro when you will update the command "transliterateInRange", and then check if it works really.

However, I have a question before that: with your macro template, how is the resulting font set? Using it, I got a file in which the font used for transliteration is the default font for the Nisus New File.dot. Is this intended? I am not sure if this is the best choice -- perhaps we should ask the user to choose a font for the conversion.

On the other hand, when you do "Find '.+', 'Eua'", what is the range of document which is the object of the Find command? For example, comments also are selected with this command?
Best regards,

Nobumi Iyanaga
Tokyo,
Japan
User avatar
martin
Official Nisus Person
Posts: 5227
Joined: 2002-07-11 17:14:10
Location: San Diego, CA
Contact:

Post by martin »

Nobumi Iyanaga wrote:However, I have a question before that: with your macro template, how is the resulting font set?
The font may be changed in the macro via this command:

Code: Select all

Menu ':Format:Font:Remove Font Attribute'
This simply removes any font overrides and will leave the text to display in whatever font the styles for the file define. Most often this means that the text will display using the font of the Normal style in the document. If Normal (or another paragraph style) is not applied, then yes, the font from the Nisus New File will be used.
On the other hand, when you do "Find '.+', 'Eua'", what is the range of document which is the object of the Find command?
This command should include all text everywhere in the document: main body, footnotes, tables, comments, etc. However, it currently omits headers and footers, which is a bug.
Windsor
Posts: 46
Joined: 2008-04-28 22:10:11
Contact:

NSAramian to Unicode macro - Template

Post by Windsor »

Martin:
I need to suggest that when I saved the macro with the NSAramian as the source font, I was able to maintain any English words in the file. So, the file should, must, have NSAramian font for the Armenian text. NOT NSTimes (for example), otherwise the English words are being converted to the target font, in this case the Unicode font. And to change the target font to the desired Unicode font, this is what I did:

# apply new font
Menu ':Format:Font:UNork'

I chose UNork for my Unicode font. And the English words are kept intact.

I just wanted to share this. I thought it might be helpful.
User avatar
martin
Official Nisus Person
Posts: 5227
Joined: 2002-07-11 17:14:10
Location: San Diego, CA
Contact:

Post by martin »

Thanks for the tips Windsor (and thanks again for the Armenian mappings). We'll add the Armenian and transliteration macros to the macro repository once NWP 1.1 is released.
Post Reply