Reply to topic  [ 14 posts ] 
Change a set of letters for a set of others 
Author Message

Joined: 2007-04-12 14:59:36
Posts: 229
Is it possible with a Nisus macro to change a set of characters in a selection for another set, like f.e. changing every a for n, every b for m and every c for l?
Is the same possible for words, like f.e. every cat for dog, and every rat for mouse?


2010-10-02 05:36:28
Profile

Joined: 2008-05-17 04:02:32
Posts: 400
Quote:
Is it possible with a Nisus macro to change a set of characters in a selection for another set, like f.e. changing every a for n, every b for m and every c for l?
Possible. See these threads.
http://nisus.com/forum/viewtopic.php?f=17&t=2932
http://nisus.com/forum/viewtopic.php?p=16612#p16612
Quote:
Is the same possible for words, like f.e. every cat for dog, and every rat for mouse?
Code:
 $conv = Hash.new
$conv{'cat'} = Cast to String 'dog'
$conv{'rat'} = Cast to String 'mouse'

$doc = Document.active
if $doc == undefined
   exit
end
$find = $conv.keys
$find = $find.join '|'
$sels = $doc.text.findAll $find, 'E-iw' # w: whole word
foreach $sel in reversed $sels
   $sel.text.replaceInRange $sel.range, $conv{$sel.substring}
end


2010-10-02 09:31:40
Profile

Joined: 2007-04-12 14:59:36
Posts: 229
Thank you. Both work fine.


2010-10-02 13:40:11
Profile

Joined: 2008-05-17 04:02:32
Posts: 400
I seem to have taken your problems to be much more complicated than they actually are. If your conversion tables are so simple, not containing any overlapping, you can just use Replace All for each pair. That is faster, at least for the second problem.

What I had in mind is a way to transform, for example, “Mary is younger than Lucy. Lucy is older than Mary” into “Lucy is older than Mary. Mary is younger than Lucy”.


2010-10-02 18:07:03
Profile
Official Nisus Person
User avatar

Joined: 2002-07-11 17:14:10
Posts: 4251
Location: San Diego, CA
In case it wasn't clear, Kino was referring to the problem where a sequence of replacements done one after the other yields unwanted results. Looking at his example:
Quote:
Mary is younger than Lucy. Lucy is older than Mary.

If you first replace "Mary" with "Lucy", you have:
Quote:
Lucy is younger than Lucy. Lucy is older than Lucy.

And then replace "Lucy" with "Mary", you have:
Quote:
Mary is younger than Mary. Mary is older than Mary.

That's not the desired result if one wanted to swap Mary and Lucy at the same time. But the code for that is quite easy to write:
Code:
Replace All "Mary", "Lucy", "-iw"
Replace All "Lucy", "Mary", "-iw"

Kino's more complicated code solves the problem by doing all the replacements in a single pass.


2010-10-04 13:50:57
Profile WWW

Joined: 2007-04-12 14:59:36
Posts: 229
Quote:
I seem to have taken your problems to be much more complicated than they actually are.


Well it depends.(Thanks for Martins' demonstration for possible conflicts.) Some problems I can solve now are really simple. But another one which I would like to solve is in fact more complicated: It is to replace different systems of transcription systems for Asiatic texts. Lets say the name of the Capital of China in Chinese is officially transcribed as "Zhongguo". In the the so called Wade-Giles system used in Taiwan this is "Chung-kuo". Or the Name of Chinas biggest city is "Chongqing", or respectively "Ch'ung-ch'ing". (Here you have the nasty apostrophes that have to be dealt with). Now the problem is not only that the units of speech in one system are separated by a "-" sign but not in the other where the two syllable have first to be identified as such. But there is also a problem with syllabic units that to contain each other. F. e.within one system you have a unit "ta" but you have also a unit "tang". Also the problem Martin describes is frequent: "ch'ang" in system 1 is "chang" in system 2, but "chang" exists also in system 2, for a different value. Final problem: some foreign words might look the same as some English words appearing in a mixed text. F.a. "an" (meaning "peace"), looks like the English article in "an artist". Sometimes I wonder if all these can be solved, or whether one should look for a solution that does most of the problems, and marks possible ambiguities. I don't know whether one found solutions for Japanese transcription systems which must suffer similar problems.


2010-10-07 10:17:17
Profile

Joined: 2008-05-17 04:02:32
Posts: 400
js wrote:
Some problems I can solve now are really simple. But another one which I would like to solve is in fact more complicated: It is to replace different systems of transcription systems for Asiatic texts.
A Perl module for that kind of conversion is available as
http://search.cpan.org/~xern/Lingua-ZH-PinyinConvert-0.05/
http://search.cpan.org/~xern/Lingua-ZH-PinyinConvert-0.05/PinyinConvert.pm (description)
As I’m not familiar with the language, I have never used it. So I cannot tell if it is accurate enough. But with the module installed (Developer Tools might be required though the install script does not run a C compiler), you should be able to use it in a NWP macro by calling it in a Perl block.
Quote:
Final problem: some foreign words might look the same as some English words appearing in a mixed text. F.a. "an" (meaning "peace"), looks like the English article in "an artist".
You have not applied a custom language on romanized Chinese texts? That is what you should do before everything. If all texts in your document do no have correct language attributes, you cannot make selections to be processed by a macro.
Quote:
I don't know whether one found solutions for Japanese transcription systems which must suffer similar problems.
I don’t know. I never romanize Japanese except when I place an order with a foreign bookseller with my name and address in latin characters.

Edit: But perhaps you may still need Developer Tools to install the Perl module if you don’t have /usr/bin/make. I don’t know if make belongs to the OS X default installation.


2010-10-08 20:26:01
Profile

Joined: 2008-05-17 04:02:32
Posts: 400
You don’t need install the Perl module using Xcode Tools. I remembered a trick learnt from Nobumi years ago.

1. Download Lingua-ZH-PinyinConvert-0.05.tar.gz from
http://search.cpan.org/~xern/Lingua-ZH-PinyinConvert-0.05/
and unzip it;

2. Create folders as /Users/you/Library/Perl/Lingua/ZH/;

3. Put PinyinConvert.pm in /Users/you/Library/Perl/Lingua/ZH/.

Select texts in Wade-Giles system (non-contiguous selections supported) and run the macro.

As you will notice, PinyinConvert module does not insert hyphens when converting into Wade-Giles system. You have to do it manually.

You can set any romanization system supported by the module to $from and $to.

If you don’t want the macro to convert straight single quotes into curly quotes, remove the routine near the end of the macro.

Code:
$from = 'Wade-Giles'
$to = 'hanyu'
$removeAccent = true

$doc = Document.active
if $doc == undefined
   exit # no open document
end

$range = TextSelection.activeRange
if ! $range.length
   exit 'Nothing selected, exiting...'
end

Find All in Selection '\S+', 'E'
$sels = $doc.textSelections

$str = $doc.selectedSubstrings
$sep = Text.newWithCodepoint 0x20
$str = $str.join $sep

if $from == 'Wade-Giles'
   $str.replaceAll '(?<=\p{Latin})\’(?=\p{Latin})', '\'', 'E'
end

Set Exported Perl Variables 'from', 'to', 'str', 'removeAccent'
begin Perl
   BEGIN {
      unshift @INC, "$ENV{'HOME'}/Library/Perl";
   };
   eval { require Lingua::ZH::PinyinConvert };
   if ($@) {
      print STDERR "PinyinConvert.pm not found. Cannot continue...\n";
      exit;
   } else {
      use Lingua::ZH::PinyinConvert qw/convert/;
   }
   if ($removeAccent) {
      use Unicode::Normalize;
      $str = NFD ($str);
      $str =~ s/\p{InCombiningDiacriticalMarks}//g;
   }
   $str = convert ($from, $to, $str);
end

if $from == 'Wade-Giles'
   $str.replaceAll '(?<=\p{Latin})-(?=\p{Latin})', '', 'E'
end
if $to == 'Wade-Giles'
   $str.replaceAll '(?<=\p{Latin})\'(?=\p{Latin})', '\’', 'E'
end

$str = $str.split $sep

foreach $sel in reversed $sels
   $sel.text.replaceInRange $sel.range, $str.pop
end


Edit: PinyinConvert.pm you have to put in /Users/you/Library/Perl/Lingua/ZH/ is not Lingua-ZH-PinyinConvert-0.05/PerlIO-via-PinyinConvert/PinyinConvert.pm but Lingua-ZH-PinyinConvert-0.05/PinyinConvert.pm.

Edit: Cleaned up the code and modified it a bit so that character attributes will be retained per word.

Edit: Added $removeAccent option. When it is set to true, “Ch’u Tz’ŭ” and “Ta Ch’ê”, for example, are treated as “Ch’u Tz’u” and “Ta Ch’e”.


2010-10-09 08:23:35
Profile

Joined: 2007-04-12 14:59:36
Posts: 229
Thanks for showing how a Perl macro can be used and modified in "collaboration" with Nisus macros.
As to the Perl module it is unfortunately no real help for me, for several reasons:
1. It cannot deal with hyphens, which should be there in Wade-Giles and should not be there in Pinyin (which puts the hyphened components together as words).
2. It does not know how to deals with "umlaut" signs: instead of "chüeh" it translates "chueh"
3. These two are amateurish. But there really mistakes: jue is tranformed to chueh which is half-correct. But if you try to transform back what you get, instead of getting back jue, you get „zhueh“ which is absurd.

Apart from these: the module asks you to identify in your text that chunks you want it to transcribe. In other words: The macro asks you to do half of the work yourself instead of doing it for you. The basic situation is having an English texts with lots of Chinese transcriptions in it. It the whole text consists of transcribed characters there is no need for a macro.

Last but not least: My basic need would be to input the tables with the corresponding syllables myself. This is because sinologists seem to feel very often that official systems of transcriptions are not as good as they could have done and the invent new ones, which the expect the world to adopt. And of course there are also may historical attempts, and in different languages. Many of them are perfectly consistent, but are not used anymore. These texts can now be scanned, and some of us would love to transcribe them into the modern standard system.
For this one would need a macro that can deal with a table of correspondences that the user can modify. Such a macro should also be able to identify within a given text, syllables that appear in this table. But I realize that this is maybe not so simple.


2010-10-10 03:53:00
Profile

Joined: 2008-05-17 04:02:32
Posts: 400
As you’ll see in PinyinConvert.pm, it supports just ASCII characters. As it is a Perl script of very simple structure, it would be quite easy for those who are familiar with Chinese romanization systems to improve it so that it supports accented letters, I guess. Just add conversion tables and modify regular expressions at lines 463 and 468 accordingly. After modification, you should save the file as plain text in UTF-8.

And if you know what sequences of characters constitute boundaries, it should be a piece of cake to make a regular expression(s) for inserting hyphens.

js wrote:
Apart from these: the module asks you to identify in your text that chunks you want it to transcribe. In other words: The macro asks you to do half of the work yourself instead of doing it for you.
What? It’s you who said that it is impossible to differentiate Chinese ‘an’ from English ‘an’, no? Perhaps you could create a regular expression matching any Chinese character/word represented by a given romanization system. However, you can never use it safely for an isolated word. Personally I’m using macros for translating between two major Arabic romanization systems, namely the German one and that invented by LC which is quite bad. But I have never ever dreamt a macro could distinguish a romanized Arabic word from a European word. It is very, very surprising that you seem to have done nothing in order to make romanized Chinese in your files recognisable as such by a computer program.

As I wrote before, you should apply a custom language or a special character style on them, or custom languages or special character styles if your documents contains Chinese texts represented by multiple romanization systems — what a mess???


2010-10-10 09:24:10
Profile

Joined: 2007-04-12 14:59:36
Posts: 229
Maybe there is a bit a misunderstanding about the purpose of what I was looking for. I don't want to translate my own documents into different transcription systems. I use whatever a government declares to be it's global standard. Just as when I write in English. I don't feel an urge to write New York in such a way as might please my native German ears. I am talking of texts that have been written by people who did not yet have an official system or who think the world has been waiting for theirs. Quite often those turn out to be good for native speakers of the English language only, and some boring clerks in some boring ministries had at least heard that there are some other languages around and that no system on earth can serve everbodys ears. Now the better of such texts with strange transcriptions are not a mess. Most of them are following a consistent method, so that comparative tables are possible. But sometimes one system is overlapping with another.

There is also the case of overlapping with English, like the English article "a" or "an" are possible Chinese syllables. Of course logically this means that a macro cannot decide wither it is one or the other, but in most cases it can. A macro that can deal with most cases is also a good thing to have. Moreover it might be possible to mark whatever is ambiguous to let you manually adjust those cases. I am aware that this is probably not an issue for the majority on this forum. But I thought I might ask the question anyway. You never know. But from what I see I might have neither the time nor the expertise to do better than what seems not so good to me. Thanks for your help anyway.


2010-10-10 14:07:35
Profile

Joined: 2008-05-17 04:02:32
Posts: 400
js wrote:
Maybe there is a bit a misunderstanding about the purpose of what I was looking for.
Finally you disclosed your secret purpose. Sigh. You seem to expect someone other than you to be able to figure out your real purpose from your very poor description of your problem. Reread your postings in this thread one by one, chronologically, attentively as far as you can, and try to imagine what someone else could conceive from your unnecessarily poor information. You should type EVERYTHING you know about your problem in the very first posting. You should not omit anything because you don’t know what can be omitted and what cannot. You ask to the forum just because you don’t know the true nature of your problem.

You are a dedicated consumer of others’ time.

You should not take this as an insult. I'm a realist and nothing else.


2010-10-11 09:44:29
Profile

Joined: 2007-04-12 14:59:36
Posts: 229
Quote:
Reread your postings in this thread one by one, chronologically, attentively as far as you can ...
You are a dedicated consumer of others’ time.
You should not take this as an insult. I'm a realist and nothing else.

As a matter of fact what you say _is_ quite insulting. But this forum is no place for me to discuss that any further. I am sure nobody here would like to trade my arguments, good or bad, for your ever helpful expertise, myself included. So you see: I'm a realist as well. Though I would not pretend to be nothing else.


2010-10-12 05:14:14
Profile

Joined: 2008-05-17 04:02:32
Posts: 400
I apologize to you and to others. Sorry, I should not have directed my irritation to you but to myself who spent the time to try to solve a macro problem looking interesting (my hobby) instead of spending it to do a headachy task: I have to index over 4,000 romanized words or sequences of words in someone’s book I was asked to clean up. The nature of the book makes it difficult to do it automatically. All my macro attempts turned up to be unreliable, including a spell checking method (e.g. an may be an Arabic conjunction or ‘an which is a preposition). As it is a paid job, I cannot abandon it however tedious.


2010-10-12 17:03:00
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 14 posts ] 

Who is online

Users browsing this forum: Bing [Bot] and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software