Change a set of letters for a set of others

js · Post by js » 2010-10-02 05:36:28

Is it possible with a Nisus macro to change a set of characters in a selection for another set, like f.e. changing every a for n, every b for m and every c for l?
Is the same possible for words, like f.e. every cat for dog, and every rat for mouse?

Kino · Post by **Kino** » 2010-10-02 09:31:40

Is it possible with a Nisus macro to change a set of characters in a selection for another set, like f.e. changing every a for n, every b for m and every c for l?

Possible. See these threads.
http://nisus.com/forum/viewtopic.php?f=17&t=2932
http://nisus.com/forum/viewtopic.php?p=16612#p16612

Is the same possible for words, like f.e. every cat for dog, and every rat for mouse?

Code: Select all

 $conv = Hash.new
$conv{'cat'} = Cast to String 'dog'
$conv{'rat'} = Cast to String 'mouse'

$doc = Document.active
if $doc == undefined
	exit
end
$find = $conv.keys
$find = $find.join '|'
$sels = $doc.text.findAll $find, 'E-iw' # w: whole word
foreach $sel in reversed $sels
	$sel.text.replaceInRange $sel.range, $conv{$sel.substring}
end

js · Post by js » 2010-10-02 13:40:11

Thank you. Both work fine.

Kino · Post by **Kino** » 2010-10-02 18:07:03

I seem to have taken your problems to be much more complicated than they actually are. If your conversion tables are so simple, not containing any overlapping, you can just use Replace All for each pair. That is faster, at least for the second problem.

What I had in mind is a way to transform, for example, “Mary is younger than Lucy. Lucy is older than Mary” into “Lucy is older than Mary. Mary is younger than Lucy”.

Post by **martin** » 2010-10-04 13:50:57

In case it wasn't clear, Kino was referring to the problem where a sequence of replacements done one after the other yields unwanted results. Looking at his example:

Mary is younger than Lucy. Lucy is older than Mary.

If you first replace "Mary" with "Lucy", you have:

Lucy is younger than Lucy. Lucy is older than Lucy.

And then replace "Lucy" with "Mary", you have:

Mary is younger than Mary. Mary is older than Mary.

That's not the desired result if one wanted to swap Mary and Lucy at the same time. But the code for that is quite easy to write:

Code: Select all

Replace All "Mary", "Lucy", "-iw"
Replace All "Lucy", "Mary", "-iw"

Kino's more complicated code solves the problem by doing all the replacements in a single pass.

js · Post by js » 2010-10-07 10:17:17

I seem to have taken your problems to be much more complicated than they actually are.

Well it depends.(Thanks for Martins' demonstration for possible conflicts.) Some problems I can solve now are really simple. But another one which I would like to solve is in fact more complicated: It is to replace different systems of transcription systems for Asiatic texts. Lets say the name of the Capital of China in Chinese is officially transcribed as "Zhongguo". In the the so called Wade-Giles system used in Taiwan this is "Chung-kuo". Or the Name of Chinas biggest city is "Chongqing", or respectively "Ch'ung-ch'ing". (Here you have the nasty apostrophes that have to be dealt with). Now the problem is not only that the units of speech in one system are separated by a "-" sign but not in the other where the two syllable have first to be identified as such. But there is also a problem with syllabic units that to contain each other. F. e.within one system you have a unit "ta" but you have also a unit "tang". Also the problem Martin describes is frequent: "ch'ang" in system 1 is "chang" in system 2, but "chang" exists also in system 2, for a different value. Final problem: some foreign words might look the same as some English words appearing in a mixed text. F.a. "an" (meaning "peace"), looks like the English article in "an artist". Sometimes I wonder if all these can be solved, or whether one should look for a solution that does most of the problems, and marks possible ambiguities. I don't know whether one found solutions for Japanese transcription systems which must suffer similar problems.

Kino · Post by **Kino** » 2010-10-08 20:26:01

js wrote:Some problems I can solve now are really simple. But another one which I would like to solve is in fact more complicated: It is to replace different systems of transcription systems for Asiatic texts.

A Perl module for that kind of conversion is available as
http://search.cpan.org/~xern/Lingua-ZH- ... vert-0.05/
http://search.cpan.org/~xern/Lingua-ZH- ... Convert.pm (description)
As I’m not familiar with the language, I have never used it. So I cannot tell if it is accurate enough. But with the module installed (Developer Tools might be required though the install script does not run a C compiler), you should be able to use it in a NWP macro by calling it in a Perl block.

Final problem: some foreign words might look the same as some English words appearing in a mixed text. F.a. "an" (meaning "peace"), looks like the English article in "an artist".

You have not applied a custom language on romanized Chinese texts? That is what you should do before everything. If all texts in your document do no have correct language attributes, you cannot make selections to be processed by a macro.

I don't know whether one found solutions for Japanese transcription systems which must suffer similar problems.

I don’t know. I never romanize Japanese except when I place an order with a foreign bookseller with my name and address in latin characters.

Edit: But perhaps you may still need Developer Tools to install the Perl module if you don’t have /usr/bin/make. I don’t know if make belongs to the OS X default installation.

Kino · Post by **Kino** » 2010-10-09 08:23:35

You don’t need install the Perl module using Xcode Tools. I remembered a trick learnt from Nobumi years ago.

1. Download Lingua-ZH-PinyinConvert-0.05.tar.gz from
http://search.cpan.org/~xern/Lingua-ZH- ... vert-0.05/
and unzip it;

2. Create folders as /Users/you/Library/Perl/Lingua/ZH/;

3. Put PinyinConvert.pm in /Users/you/Library/Perl/Lingua/ZH/.

Select texts in Wade-Giles system (non-contiguous selections supported) and run the macro.

As you will notice, PinyinConvert module does not insert hyphens when converting into Wade-Giles system. You have to do it manually.

You can set any romanization system supported by the module to $from and $to.

If you don’t want the macro to convert straight single quotes into curly quotes, remove the routine near the end of the macro.

Code: Select all

$from = 'Wade-Giles'
$to = 'hanyu'
$removeAccent = true

$doc = Document.active
if $doc == undefined
	exit # no open document
end

$range = TextSelection.activeRange
if ! $range.length
	exit 'Nothing selected, exiting...'
end

Find All in Selection '\S+', 'E'
$sels = $doc.textSelections

$str = $doc.selectedSubstrings
$sep = Text.newWithCodepoint 0x20
$str = $str.join $sep

if $from == 'Wade-Giles'
	$str.replaceAll '(?<=\p{Latin})\’(?=\p{Latin})', '\'', 'E'
end

Set Exported Perl Variables 'from', 'to', 'str', 'removeAccent'
begin Perl
	BEGIN {
		unshift @INC, "$ENV{'HOME'}/Library/Perl";
	};
	eval { require Lingua::ZH::PinyinConvert };
	if ($@) {
		print STDERR "PinyinConvert.pm not found. Cannot continue...\n";
		exit;
	} else {
		use Lingua::ZH::PinyinConvert qw/convert/;
	}
	if ($removeAccent) {
		use Unicode::Normalize;
		$str = NFD ($str);
		$str =~ s/\p{InCombiningDiacriticalMarks}//g;
	}
	$str = convert ($from, $to, $str);
end

if $from == 'Wade-Giles'
	$str.replaceAll '(?<=\p{Latin})-(?=\p{Latin})', '', 'E'
end
if $to == 'Wade-Giles'
	$str.replaceAll '(?<=\p{Latin})\'(?=\p{Latin})', '\’', 'E'
end

$str = $str.split $sep

foreach $sel in reversed $sels
	$sel.text.replaceInRange $sel.range, $str.pop
end

Edit: PinyinConvert.pm you have to put in /Users/you/Library/Perl/Lingua/ZH/ is not Lingua-ZH-PinyinConvert-0.05/PerlIO-via-PinyinConvert/PinyinConvert.pm but Lingua-ZH-PinyinConvert-0.05/PinyinConvert.pm.

Edit: Cleaned up the code and modified it a bit so that character attributes will be retained per word.

Edit: Added $removeAccent option. When it is set to true, “Ch’u Tz’ŭ” and “Ta Ch’ê”, for example, are treated as “Ch’u Tz’u” and “Ta Ch’e”.

js · Post by js » 2010-10-10 03:53:00

Thanks for showing how a Perl macro can be used and modified in "collaboration" with Nisus macros.
As to the Perl module it is unfortunately no real help for me, for several reasons:
1. It cannot deal with hyphens, which should be there in Wade-Giles and should not be there in Pinyin (which puts the hyphened components together as words).
2. It does not know how to deals with "umlaut" signs: instead of "chüeh" it translates "chueh"
3. These two are amateurish. But there really mistakes: jue is tranformed to chueh which is half-correct. But if you try to transform back what you get, instead of getting back jue, you get „zhueh“ which is absurd.

Apart from these: the module asks you to identify in your text that chunks you want it to transcribe. In other words: The macro asks you to do half of the work yourself instead of doing it for you. The basic situation is having an English texts with lots of Chinese transcriptions in it. It the whole text consists of transcribed characters there is no need for a macro.

Last but not least: My basic need would be to input the tables with the corresponding syllables myself. This is because sinologists seem to feel very often that official systems of transcriptions are not as good as they could have done and the invent new ones, which the expect the world to adopt. And of course there are also may historical attempts, and in different languages. Many of them are perfectly consistent, but are not used anymore. These texts can now be scanned, and some of us would love to transcribe them into the modern standard system.
For this one would need a macro that can deal with a table of correspondences that the user can modify. Such a macro should also be able to identify within a given text, syllables that appear in this table. But I realize that this is maybe not so simple.

Kino · Post by **Kino** » 2010-10-10 09:24:10

As you’ll see in PinyinConvert.pm, it supports just ASCII characters. As it is a Perl script of very simple structure, it would be quite easy for those who are familiar with Chinese romanization systems to improve it so that it supports accented letters, I guess. Just add conversion tables and modify regular expressions at lines 463 and 468 accordingly. After modification, you should save the file as plain text in UTF-8.

And if you know what sequences of characters constitute boundaries, it should be a piece of cake to make a regular expression(s) for inserting hyphens.

js wrote:Apart from these: the module asks you to identify in your text that chunks you want it to transcribe. In other words: The macro asks you to do half of the work yourself instead of doing it for you.

What? It’s you who said that it is impossible to differentiate Chinese ‘an’ from English ‘an’, no? Perhaps you could create a regular expression matching any Chinese character/word represented by a given romanization system. However, you can never use it safely for an isolated word. Personally I’m using macros for translating between two major Arabic romanization systems, namely the German one and that invented by LC which is quite bad. But I have never ever dreamt a macro could distinguish a romanized Arabic word from a European word. It is very, very surprising that you seem to have done nothing in order to make romanized Chinese in your files recognisable as such by a computer program.

As I wrote before, you should apply a custom language or a special character style on them, or custom languages or special character styles if your documents contains Chinese texts represented by multiple romanization systems — what a mess???

js · Post by js » 2010-10-10 14:07:35

Maybe there is a bit a misunderstanding about the purpose of what I was looking for. I don't want to translate my own documents into different transcription systems. I use whatever a government declares to be it's global standard. Just as when I write in English. I don't feel an urge to write New York in such a way as might please my native German ears. I am talking of texts that have been written by people who did not yet have an official system or who think the world has been waiting for theirs. Quite often those turn out to be good for native speakers of the English language only, and some boring clerks in some boring ministries had at least heard that there are some other languages around and that no system on earth can serve everbodys ears. Now the better of such texts with strange transcriptions are not a mess. Most of them are following a consistent method, so that comparative tables are possible. But sometimes one system is overlapping with another.

There is also the case of overlapping with English, like the English article "a" or "an" are possible Chinese syllables. Of course logically this means that a macro cannot decide wither it is one or the other, but in most cases it can. A macro that can deal with most cases is also a good thing to have. Moreover it might be possible to mark whatever is ambiguous to let you manually adjust those cases. I am aware that this is probably not an issue for the majority on this forum. But I thought I might ask the question anyway. You never know. But from what I see I might have neither the time nor the expertise to do better than what seems not so good to me. Thanks for your help anyway.

Kino · Post by **Kino** » 2010-10-11 09:44:29

js wrote:Maybe there is a bit a misunderstanding about the purpose of what I was looking for.

Finally you disclosed your secret purpose. Sigh. You seem to expect someone other than you to be able to figure out your real purpose from your very poor description of your problem. Reread your postings in this thread one by one, chronologically, attentively as far as you can, and try to imagine what someone else could conceive from your unnecessarily poor information. You should type EVERYTHING you know about your problem in the very first posting. You should not omit anything because you don’t know what can be omitted and what cannot. You ask to the forum just because you don’t know the true nature of your problem.

You are a dedicated consumer of others’ time.

You should not take this as an insult. I'm a realist and nothing else.

js · Post by js » 2010-10-12 05:14:14

Reread your postings in this thread one by one, chronologically, attentively as far as you can ...
You are a dedicated consumer of others’ time.
You should not take this as an insult. I'm a realist and nothing else.

As a matter of fact what you say _is_ quite insulting. But this forum is no place for me to discuss that any further. I am sure nobody here would like to trade my arguments, good or bad, for your ever helpful expertise, myself included. So you see: I'm a realist as well. Though I would not pretend to be nothing else.

Kino · Post by **Kino** » 2010-10-12 17:03:00

I apologize to you and to others. Sorry, I should not have directed my irritation to you but to myself who spent the time to try to solve a macro problem looking interesting (my hobby) instead of spending it to do a headachy task: I have to index over 4,000 romanized words or sequences of words in someone’s book I was asked to clean up. The nature of the book makes it difficult to do it automatically. All my macro attempts turned up to be unreliable, including a spell checking method (e.g. an may be an Arabic conjunction or ‘an which is a preposition). As it is a paid job, I cannot abandon it however tedious.

nisus.com

Change a set of letters for a set of others

Change a set of letters for a set of others

Re: Change a set of letters for a set of others

Re: Change a set of letters for a set of others

Re: Change a set of letters for a set of others

Re: Change a set of letters for a set of others

Re: Change a set of letters for a set of others

Re: Change a set of letters for a set of others

Re: Change a set of letters for a set of others

Re: Change a set of letters for a set of others

Re: Change a set of letters for a set of others

Re: Change a set of letters for a set of others

Re: Change a set of letters for a set of others

Re: Change a set of letters for a set of others

Re: Change a set of letters for a set of others