I need to modify the Create Word List macro that comes with Nisus Writer Express. This macro will extract all unique words from a document. However, punctuation creates unique words, so a list might contain:
está.
está,
está!
está;
as four separate entries. Underlines and numbers will also show up as separate words. I would like to get just the words, with no numbers and no punctuation.
Un-commenting a line in the Create Word List macro will modify it to remove punctuation. However, this will also remove diacritics, which I need to retain, and it will leave in numbers, which I want to remove. I attempted to guess my way to modifying the macro, but not knowing Perl, I failed.
Can anyone give me guidance on how to modify this macro, so that I could get a simple word list, with no numbers nor punctuation, but with diacritics and foreign language characters?
Thanks,
Derek
modify Create Word List macro
- martin
- Official Nisus Person
- Posts: 5230
- Joined: 2002-07-11 17:14:10
- Location: San Diego, CA
- Contact:
Re: modify Create Word List macro
You'll want to replace this line:derekroff wrote:Can anyone give me guidance on how to modify this macro, so that I could get a simple word list, with no numbers nor punctuation, but with diacritics and foreign language characters?
Code: Select all
$line =~ s/[^a-z0-9']+/ /gi;
Code: Select all
$line =~ s/[0-9\.\,\!\?\(\)\:\;\*\'"]+/ /gi;
As an example let's add angle brackets to the list. This would be the result:
Code: Select all
$line =~ s/[0-9\.\,\!\?\(\)\:\;\*\'"\<\>]+/ /gi;