sorting arrays

Get help using and writing Nisus Writer Pro macros.
Post Reply
User avatar
phspaelti
Posts: 1313
Joined: 2007-02-07 00:58:12
Location: Japan

sorting arrays

Post by phspaelti »

I've been having trouble sorting arrays. It seems that the sort order for arrays is different from the sort that can be used in documents (from the Menu). Is that correct? :(

The Macro reference mentions sort "options" and refers one to "compare". There it mentions the following possibilities: n, l, s. (Below the table, instead of 's', it says 'u'. That is apparently a typo.) For my purposes it seems that 'n', 'l' and nothing all give the same result, and 's' is a Unicode sort. More specifically here is the problem. I am trying to sort words that include diacritics. With sort from the menu I get the following order, which is what I want.
  • i
  • î
  • it
  • ko
With array sorting using the 'n' option I get:
  • i
  • it
  • î
  • ko
With the 's' option I get:
  • i
  • it
  • ko
  • î
Neither is really useful. The truth is I can't even understand the 'n' order. Has the alphabet suddenly added some new members? So that looks like a bug to me.
philip
Kino
Posts: 400
Joined: 2008-05-17 04:02:32

Re: sorting arrays

Post by Kino »

I reproduced the problem in OS X 10.6.2 when using sort command with ‘l’ option without language argument. I don’t know if it is a Snow Leopard issue, though. A workaround would be to specify the language.

Code: Select all

$a = Array.new 'it', 'ko', 'i', 'î'
$lang = Language.systemLanguage
$a.sort 'l', $lang
exit "$lang: $a"
which returns what you expect when the system language is English - US or French at least.

A feature request: I’d like to have something like Language.systemSortOrder returning the language for “Order for sorted lists” which may not be the same as the system UI language.
Kino
Posts: 400
Joined: 2008-05-17 04:02:32

Re: sorting arrays

Post by Kino »

This is a SL issue. The code for English US has changed from “en” to “en_US” which confuses NWP, it seems. The macro below works without problem if you set the System UI language to a language which is not English, e.g. French or British English ;-)

Code: Select all

$a = Array.new 'it', 'ko', 'i', 'î', 'I'
$a.sort 'li'
exit $a
Edit: That was not exact. The code for English (defaults read -g AppleLanguages) remains ‘en’ except that, if I’m not mistaken, SL added a new language U.S. English whose code is ‘en-US’. I’m confused too.
User avatar
phspaelti
Posts: 1313
Joined: 2007-02-07 00:58:12
Location: Japan

Re: sorting arrays

Post by phspaelti »

Thank you Kino.
I had completely disregarded the language setting, since it didn't appear relevant. But now it makes sense.

I can now also understand the following from the macro reference on the 'n' sort option:
In this mode logically equivalent character sequences are considered equal. For example, pre-composed sequences like ‘é’ (U+00E9) will be equal to counterparts that use combining marks, eg: ‘e’ + Acute Accent (U+0301).
I guess they mean by "considered equal" something like "are treated as". So in normal mode the diacritics are treated as separate characters,. So 'î' becomes "i^" and will therefore follow all 'i's, but precede all 'j's.
philip
Kino
Posts: 400
Joined: 2008-05-17 04:02:32

Re: sorting arrays

Post by Kino »

Then, I was misunderstanding your problem. I was believing it was because ‘l’ option had not worked as expected that you tried ‘n’ and ‘s’ options.

It is somewhat confusing that an extended ASCII sort is labelled as normal for we don’t feet it natural. It is called so presumably because that is the only sort order which was available in Nisus Writer Express 2.x. Perhaps it is Cocoa’s default sort order? … but now I’m just speculating.

Anyway, if you give an appropriate language object as the language argument, sort command will return the expected results in most of cases. However, it is not so much intelligent as Sort Paragraphs commands when numbers are involved. You will not get “1<tab>abc, 2<tab>def, 13<tab>ghi” but “1<tab>abc, 13<tab>ghi, 2<tab>def” from the macro below as tab is not treated as a field separator.

Code: Select all

$a = Array.new "1\tabc", "2\tdef", "13\tghi"
$lang = Language.languageWithCode 'en_US'
$a.sort 'l', $lang
exit $a
In that case, you have to use "01\tabc", "02\tdef", "13\tghi" or something alike.

Elements consisting of just numbers are sorted by their numerical value and you will get, for example, “1, 2, 13” even with ‘s’ option. I have never met a situation in which this behaviour would cause a problem but I tend to think ‘s’ option should be more strict.
User avatar
martin
Official Nisus Person
Posts: 5227
Joined: 2002-07-11 17:14:10
Location: San Diego, CA
Contact:

Re: sorting arrays

Post by martin »

phspaelti wrote:It seems that the sort order for arrays is different from the sort that can be used in documents (from the Menu). Is that correct?
Yes, that is correct- although as Kino has explained, perhaps not in as big a way as you originally thought. The sort menu commands take into account numbers and list bullets, but otherwise would be like using the macro's "l" sort option with the default language, eg:

Code: Select all

$array.sort 'l'
The Macro reference mentions sort "options" and refers one to "compare". There it mentions the following possibilities: n, l, s. (Below the table, instead of 's', it says 'u'. That is apparently a typo.)
The 'u' option is indeed a typo and I'll have it fixed at some point, thanks.
Kino wrote:This is a SL issue. The code for English US has changed from “en” to “en_US” which confuses NWP, it seems.
All of the sorting eventually uses the system for comparisons. Things seem to have changed a bit for Snow Leopard. If you want to use a particular language's sorting rules, it would be best to provide that language to the macro command as a parameter instead of relying on the default sort, whatever that may mean to Cocoa/OSX. Exactly how one's system preferences will be taken into account when sorting I don't know. Speaking of which:
Elements consisting of just numbers are sorted by their numerical value and you will get, for example, “1, 2, 13” even with ‘s’ option.
This is something that changed with 10.6. On 10.5 the 's' option won't do that kind of sorting. Perhaps we would provide another option that does strict comparisons of Unicode code point values (eg: an "ASCII" sort) and bypasses the system altogether.
Kino wrote:It is somewhat confusing that an extended ASCII sort is labelled as normal
The normal 'n' sort actually isn't an "ASCII like" (numerical character code) sort, but merely prevents fancy character comparisons (eg: consider precomposed characters equal to their counterparts with combining characters).
Post Reply