Page 1 of 1

modify Create Word List macro

Posted: 2006-12-12 09:56:45
by derekroff
I need to modify the Create Word List macro that comes with Nisus Writer Express. This macro will extract all unique words from a document. However, punctuation creates unique words, so a list might contain:

está.
está,
está!
está;

as four separate entries. Underlines and numbers will also show up as separate words. I would like to get just the words, with no numbers and no punctuation.

Un-commenting a line in the Create Word List macro will modify it to remove punctuation. However, this will also remove diacritics, which I need to retain, and it will leave in numbers, which I want to remove. I attempted to guess my way to modifying the macro, but not knowing Perl, I failed.

Can anyone give me guidance on how to modify this macro, so that I could get a simple word list, with no numbers nor punctuation, but with diacritics and foreign language characters?

Thanks,

Derek

Re: modify Create Word List macro

Posted: 2006-12-12 14:16:15
by martin
derekroff wrote:Can anyone give me guidance on how to modify this macro, so that I could get a simple word list, with no numbers nor punctuation, but with diacritics and foreign language characters?
You'll want to replace this line:

Code: Select all

$line =~ s/[^a-z0-9']+/ /gi;
With this:

Code: Select all

$line =~ s/[0-9\.\,\!\?\(\)\:\;\*\'"]+/ /gi;
If you find any other punctuation that you'd like to have removed, just add it to the braces after that last quotation mark. Just be sure to prefix the character with a backslash.

As an example let's add angle brackets to the list. This would be the result:

Code: Select all

$line =~ s/[0-9\.\,\!\?\(\)\:\;\*\'"\<\>]+/ /gi;