modify Create Word List macro

Have a problem? A question? This is the place for answers from other Express users.
Post Reply
derekroff
Posts: 11
Joined: 2006-12-12 09:39:15

modify Create Word List macro

Post by derekroff »

I need to modify the Create Word List macro that comes with Nisus Writer Express. This macro will extract all unique words from a document. However, punctuation creates unique words, so a list might contain:

está.
está,
está!
está;

as four separate entries. Underlines and numbers will also show up as separate words. I would like to get just the words, with no numbers and no punctuation.

Un-commenting a line in the Create Word List macro will modify it to remove punctuation. However, this will also remove diacritics, which I need to retain, and it will leave in numbers, which I want to remove. I attempted to guess my way to modifying the macro, but not knowing Perl, I failed.

Can anyone give me guidance on how to modify this macro, so that I could get a simple word list, with no numbers nor punctuation, but with diacritics and foreign language characters?

Thanks,

Derek
Derek Roff
Language Learning Center
University of New Mexico
derek@unm.edu
User avatar
martin
Official Nisus Person
Posts: 5230
Joined: 2002-07-11 17:14:10
Location: San Diego, CA
Contact:

Re: modify Create Word List macro

Post by martin »

derekroff wrote:Can anyone give me guidance on how to modify this macro, so that I could get a simple word list, with no numbers nor punctuation, but with diacritics and foreign language characters?
You'll want to replace this line:

Code: Select all

$line =~ s/[^a-z0-9']+/ /gi;
With this:

Code: Select all

$line =~ s/[0-9\.\,\!\?\(\)\:\;\*\'"]+/ /gi;
If you find any other punctuation that you'd like to have removed, just add it to the braces after that last quotation mark. Just be sure to prefix the character with a backslash.

As an example let's add angle brackets to the list. This would be the result:

Code: Select all

$line =~ s/[0-9\.\,\!\?\(\)\:\;\*\'"\<\>]+/ /gi;
Post Reply