Regexp with style definition

Everything related to our flagship word processor.
Post Reply
Groucho
Posts: 497
Joined: 2007-03-03 09:55:06
Location: Europe

Regexp with style definition

Post by Groucho »

I use this regular expression

Code: Select all

\<[A-Z\s’‘-]{2,}\>
to find one or more all-caps words separated by space, quote or dash.
Why doesn't it work when Attribute Sensitive is checked along with a style definition?

Thanks a bunch!
Henry.
Kino
Posts: 400
Joined: 2008-05-17 04:02:32

Re: Regexp with style definition

Post by Kino »

As I don't have your document ;-) I tried the following on Nisus Macro Reference.rtf, which seems to work. But I seldom use attribute sensitive search. So I may overlook something obvious...

1. Type "\b\p{Upper}\p{Lower}+\b" in the Find field;

2. Apply "Object Name" (Character Style) or "Code Block" (Paragraph Style) on the find expression;

3. Apply "Remove Attributes Except Styles" on the find expression.

4. Hit Next.

The step 3 is unnecessary when the find field had not had any attribute.
Groucho
Posts: 497
Joined: 2007-03-03 09:55:06
Location: Europe

Re: Regexp with style definition

Post by Groucho »

Thanks, Kino. I tried setting a style and then selecting Remove Attributes Except Styles, but it doesn't work. Note that the All-Caps regexp plus style did work on version 1.0. By the way, I think your regexp will search single capitalized words (Like These), whereas mine will search all-caps works separated by space, hyphen or quote (LIKE-THESE).
Apparently, there is something wrong with the word boundary regular expressions (\< and \>) when Attribute Sensitive is checked. Using this expression works:

Code: Select all

[A-Z\s’‘-]{2,}
Many thanks
Henry.
Kino
Posts: 400
Joined: 2008-05-17 04:02:32

Re: Regexp with style definition

Post by Kino »

Groucho wrote:By the way, I think your regexp will search single capitalized words (Like These)
Because it is convenient for the test file that I used. And the file has a merit that all NW Pro users have it so that they can countercheck easily.
whereas mine will search all-caps works separated by space, hyphen or quote (LIKE-THESE).
and...

ABC

DEF

i.e. it will find the whole "ABC\n\nDEF", for example.
there is something wrong with the word boundary regular expressions (\< and \>)
Yeah, there seems to be a bug. You'd better report it to Nisus soft if you want to make your expression work. While

\<[A-Z'’\s-]{2,}\>
\<[A-Z'’ -]{2,}\>
\<[A-Z'’\x20-]{2,}\>

do not work, the followings work.

\<[A-Z'’\t-]{2,}\>
\<[A-Z'’\n-]{2,}\>
\<[A-Z'’\xA0-]{2,}\>

Thus, seemingly it is not \s itself but a character or metacharacter matching \x20 that prevents it from working.

But why are you using such a tricky expression? (Some people said I'm tricky but I have never gone so far ;-) In your expression, it is only \< and \> that assure the first and the last characters are [A-Z] and not ['’\s-]. So...

Code: Select all

[A-Z\s’‘-]{2,}
will match also "\n\n", "--", "<tab><tab>", etc., etc.

A common expression for that purpose would be

\b[A-Z][A-Z'’\x20-]*[A-Z]\b

or \<[A-Z][A-Z'’\x20-]*[A-Z]\> if you prefer. The latter won't work in other apps/programs, though.
Groucho
Posts: 497
Joined: 2007-03-03 09:55:06
Location: Europe

Re: Regexp with style definition

Post by Groucho »

Thanks for your help, Kino.
That confirms my suspicion. Only you went deeper than I. Boundary regular expressions in some way conflict with a style definition. My usual workaround is:

1) Select all text with the same style.
2) Run the search pattern with Search In Selection on.

About my tricky patterns, I think it has to do with my nature. I am pretty sure that there are no occurrences of \t\\t or \s\s or -- in my documents when I run this search pattern. And, anyway, I perform a step-by-step search.
I use this pattern simply to convert old ASCII text files from the early 1970's or so. Whilom, there were some people who used to capitalize words to simulate italics. Now, in those documents headings are capitalized too, hence the need of a style-defined search. By the way, previously to NWP (when I used BBEdit for this purpose), I used a formula like yours, but I thought it was too tricky (just so!).
The phrase was then converted into lowercase and delimited by underscores, like this:

MY WORLD: _my world_

Greetings, Henry.
User avatar
martin
Official Nisus Person
Posts: 5230
Joined: 2002-07-11 17:14:10
Location: San Diego, CA
Contact:

Re: Regexp with style definition

Post by martin »

Would one of you be able to send in a document where the search fails? I just gave Henry's original expression a try with the Emphatic character style and didn't have any issues.
Groucho
Posts: 497
Joined: 2007-03-03 09:55:06
Location: Europe

Re: Regexp with style definition

Post by Groucho »

My document is on the way.
Note, though, that my search pattern works with character styles. But it does not with paragraph styles. When I said style, I supposed paragraph style.
Sorry for the inconvenience.

Thanks, Henry.
Groucho
Posts: 497
Joined: 2007-03-03 09:55:06
Location: Europe

Re: Regexp with style definition

Post by Groucho »

Sorry to harp on, Martin, but there's still the shadow of a bug, I guess.
Words now are correctly selected, but only when followed by punctuation marks or double quote (”), while they are skipped when they are followed by single quote (’), space or return. See the image below.
Picture 2.gif
Picture 2.gif (49.33 KiB) Viewed 11843 times
Thanks, Henry.
User avatar
martin
Official Nisus Person
Posts: 5230
Joined: 2002-07-11 17:14:10
Location: San Diego, CA
Contact:

Re: Regexp with style definition

Post by martin »

Thanks Henry, we'll revisit this; sorry for any trouble it's caused.
Groucho
Posts: 497
Joined: 2007-03-03 09:55:06
Location: Europe

Re: Regexp with style definition

Post by Groucho »

Thank you, Martin,
By the way, any chance of having \l,\L,\U and \u in a next version of PowerFind Pro?
Henry
User avatar
martin
Official Nisus Person
Posts: 5230
Joined: 2002-07-11 17:14:10
Location: San Diego, CA
Contact:

Re: Regexp with style definition

Post by martin »

Groucho wrote:Thank you, Martin,
By the way, any chance of having \l,\L,\U and \u in a next version of PowerFind Pro?
Henry
We actually already have "\u" and "\U", which allow Unicode hexadecimal representations, taking 4 and 8 hex digits respectively. Eg: to search for the letter e (U+0065) you could use either "\u0065" or "\U00000065".

What would "\I" and "\L" do?
Groucho
Posts: 497
Joined: 2007-03-03 09:55:06
Location: Europe

Re: Regexp with style definition

Post by Groucho »

My fault, Martin. I meant upper-/lowercase. You can use the \u, \l and so on, but only in a macro. Ordinarily, though, macros operate batch conversions, whereas I need a step-by-step one, at least in this case.

Cheers, Henry.
User avatar
martin
Official Nisus Person
Posts: 5230
Joined: 2002-07-11 17:14:10
Location: San Diego, CA
Contact:

Re: Regexp with style definition

Post by martin »

Thanks for the clarification Henry. We do have that feature filed, but I couldn't say when it will be added.

I'm not sure I understand about using those options in macros but not elsewhere. Do you mean using the case change menu commands? Eg:

Code: Select all

Find All 'i', 'a-iw'
To Uppercase
Could elaborate on what you need to do? Perhaps we can put together some kind of solution for you.
Groucho
Posts: 497
Joined: 2007-03-03 09:55:06
Location: Europe

Re: Regexp with style definition

Post by Groucho »

You are right. For some reason, I believed that \U, \u, \l and \L could change the next captured text into upper- or lowercase in a macro. It is so with TextWrangler. I admit I was biased.

Greetings, Henry.
Post Reply