Reply to topic  [ 6 posts ] 
sorting arrays 
Author Message
User avatar

Joined: 2007-02-07 00:58:12
Posts: 876
Location: Japan
I've been having trouble sorting arrays. It seems that the sort order for arrays is different from the sort that can be used in documents (from the Menu). Is that correct? :(

The Macro reference mentions sort "options" and refers one to "compare". There it mentions the following possibilities: n, l, s. (Below the table, instead of 's', it says 'u'. That is apparently a typo.) For my purposes it seems that 'n', 'l' and nothing all give the same result, and 's' is a Unicode sort. More specifically here is the problem. I am trying to sort words that include diacritics. With sort from the menu I get the following order, which is what I want.
  • i
  • î
  • it
  • ko
With array sorting using the 'n' option I get:
  • i
  • it
  • î
  • ko
With the 's' option I get:
  • i
  • it
  • ko
  • î
Neither is really useful. The truth is I can't even understand the 'n' order. Has the alphabet suddenly added some new members? So that looks like a bug to me.

_________________
philip


2010-01-10 01:01:58
Profile

Joined: 2008-05-17 04:02:32
Posts: 400
I reproduced the problem in OS X 10.6.2 when using sort command with ‘l’ option without language argument. I don’t know if it is a Snow Leopard issue, though. A workaround would be to specify the language.
Code:
$a = Array.new 'it', 'ko', 'i', 'î'
$lang = Language.systemLanguage
$a.sort 'l', $lang
exit "$lang: $a"
which returns what you expect when the system language is English - US or French at least.

A feature request: I’d like to have something like Language.systemSortOrder returning the language for “Order for sorted lists” which may not be the same as the system UI language.


2010-01-10 04:44:31
Profile

Joined: 2008-05-17 04:02:32
Posts: 400
This is a SL issue. The code for English US has changed from “en” to “en_US” which confuses NWP, it seems. The macro below works without problem if you set the System UI language to a language which is not English, e.g. French or British English ;-)
Code:
$a = Array.new 'it', 'ko', 'i', 'î', 'I'
$a.sort 'li'
exit $a


Edit: That was not exact. The code for English (defaults read -g AppleLanguages) remains ‘en’ except that, if I’m not mistaken, SL added a new language U.S. English whose code is ‘en-US’. I’m confused too.


2010-01-10 05:11:04
Profile
User avatar

Joined: 2007-02-07 00:58:12
Posts: 876
Location: Japan
Thank you Kino.
I had completely disregarded the language setting, since it didn't appear relevant. But now it makes sense.

I can now also understand the following from the macro reference on the 'n' sort option:
Quote:
In this mode logically equivalent character sequences are considered equal. For example, pre-composed sequences like ‘é’ (U+00E9) will be equal to counterparts that use combining marks, eg: ‘e’ + Acute Accent (U+0301).

I guess they mean by "considered equal" something like "are treated as". So in normal mode the diacritics are treated as separate characters,. So 'î' becomes "i^" and will therefore follow all 'i's, but precede all 'j's.

_________________
philip


2010-01-10 10:56:22
Profile

Joined: 2008-05-17 04:02:32
Posts: 400
Then, I was misunderstanding your problem. I was believing it was because ‘l’ option had not worked as expected that you tried ‘n’ and ‘s’ options.

It is somewhat confusing that an extended ASCII sort is labelled as normal for we don’t feet it natural. It is called so presumably because that is the only sort order which was available in Nisus Writer Express 2.x. Perhaps it is Cocoa’s default sort order? … but now I’m just speculating.

Anyway, if you give an appropriate language object as the language argument, sort command will return the expected results in most of cases. However, it is not so much intelligent as Sort Paragraphs commands when numbers are involved. You will not get “1<tab>abc, 2<tab>def, 13<tab>ghi” but “1<tab>abc, 13<tab>ghi, 2<tab>def” from the macro below as tab is not treated as a field separator.
Code:
$a = Array.new "1\tabc", "2\tdef", "13\tghi"
$lang = Language.languageWithCode 'en_US'
$a.sort 'l', $lang
exit $a
In that case, you have to use "01\tabc", "02\tdef", "13\tghi" or something alike.

Elements consisting of just numbers are sorted by their numerical value and you will get, for example, “1, 2, 13” even with ‘s’ option. I have never met a situation in which this behaviour would cause a problem but I tend to think ‘s’ option should be more strict.


2010-01-10 21:54:34
Profile
Official Nisus Person
User avatar

Joined: 2002-07-11 17:14:10
Posts: 4251
Location: San Diego, CA
phspaelti wrote:
It seems that the sort order for arrays is different from the sort that can be used in documents (from the Menu). Is that correct?

Yes, that is correct- although as Kino has explained, perhaps not in as big a way as you originally thought. The sort menu commands take into account numbers and list bullets, but otherwise would be like using the macro's "l" sort option with the default language, eg:
Code:
$array.sort 'l'


Quote:
The Macro reference mentions sort "options" and refers one to "compare". There it mentions the following possibilities: n, l, s. (Below the table, instead of 's', it says 'u'. That is apparently a typo.)

The 'u' option is indeed a typo and I'll have it fixed at some point, thanks.

Kino wrote:
This is a SL issue. The code for English US has changed from “en” to “en_US” which confuses NWP, it seems.

All of the sorting eventually uses the system for comparisons. Things seem to have changed a bit for Snow Leopard. If you want to use a particular language's sorting rules, it would be best to provide that language to the macro command as a parameter instead of relying on the default sort, whatever that may mean to Cocoa/OSX. Exactly how one's system preferences will be taken into account when sorting I don't know. Speaking of which:
Quote:
Elements consisting of just numbers are sorted by their numerical value and you will get, for example, “1, 2, 13” even with ‘s’ option.

This is something that changed with 10.6. On 10.5 the 's' option won't do that kind of sorting. Perhaps we would provide another option that does strict comparisons of Unicode code point values (eg: an "ASCII" sort) and bypasses the system altogether.

Kino wrote:
It is somewhat confusing that an extended ASCII sort is labelled as normal

The normal 'n' sort actually isn't an "ASCII like" (numerical character code) sort, but merely prevents fancy character comparisons (eg: consider precomposed characters equal to their counterparts with combining characters).


2010-01-11 19:13:09
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 6 posts ] 

Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software