PowerFind question

writerhoward · Post by **writerhoward** » 2013-02-19 13:58:46

I've been trying, unsuccessfully, to do the following. I occasionally work with an RTF in which a space is missing either before or after (or both) an em dash, as in "people —it." I would like to set up PowerFind to check whether immediately to the left or right of the em dash there's a letter of the alphabet and, if there is, to replace it with a space. Thus, for my previous example, the result would be "people — it."

Any help would be much appreciated.
Howard

phspaelti · Post by **phspaelti** » 2013-02-19 15:47:53

The trick to doing this in one go, is to turn what you want to do upside down. Replace all em-dashes, with any preceding or following spaces to the pattern <space>—<space>. This will pointlessly replace all the correct cases, but it will straighten out all the wrong cases. To do this make sure you are using PowerFind (or PowerFind Pro). Then type the following into the find box:

Code: Select all

 <0+>— <0+>

The <0+> must be chosen from the "Repeat" menu.

The replace box should just have " — ".

If you have em-dashes in your document that are not between text, and you don't want to have spaces surrounding them, then you need to make the above a bit more elaborate.

Code: Select all

<PrecededBy(><AnyWordCharacter><)> <0+>— <0+><FollowedBy(><AnyWordCharacter><)>

<AnyWordCharacter> is from the "Wildcard" menu, and <PrecededBy(>…<)> is from the "Match" menu.

In PowerFind Pro the complete expression is:

Code: Select all

(?<=\w) *— *(?=\w)

Post by **martin** » 2013-02-19 18:33:13

Thanks for that Philip, very instructive. Also, just to be clear, those <0+> entries Philip was mentioning will look like this once inserted:

: powerf.png (9.92 KiB) Viewed 16602 times

We call those PowerFind "bubbles". You can see that I've also used such a bubble for the space character, just to make it more visible.

writerhoward · Post by **writerhoward** » 2013-02-20 11:52:13

martin wrote:Thanks for that Philip, very instructive. Also, just to be clear, those <0+> entries Philip was mentioning will look like this once inserted:
powerf.png
We call those PowerFind "bubbles". You can see that I've also used such a bubble for the space character, just to make it more visible.

Martin,
Thank you for what you wrote. I was confused by the meaning of <0+>.

What specifically do the <0+> in your attachment do?

Also, is there any way to save entries made in the Find/Replace popup?

Howard

phspaelti · Post by **phspaelti** » 2013-02-20 17:01:48

writerhoward wrote:Martin, Thank you for what you wrote. I was confused by the meaning of <0+>.

Sorry. I guess my explanation was a bit too short.
The real power of Nisus' Find/Replace lies in those pop-up menus to the left of the Find Box.

writerhoward wrote:What specifically do the <0+> in your attachment do?

The Repeat commands apply to the immediately preceding character. If you want them to apply to several characters at once, you need to use Match brackets.

writerhoward wrote:Also, is there any way to save entries made in the Find/Replace popup?

The pop-up has a feature to save expressions. Another option is you can use the Macroize… feature which will save the Find and the Replace as a unit command in a Nisus macro in the Macro menu.

: Nisus_PowerFind.jpg (59.56 KiB) Viewed 16574 times

PS: With the above expression I just found out that the Nisus Macro Reference has 6 instances of doubled the. They all seem to by typos

Post by **martin** » 2013-02-21 19:22:07

phspaelti wrote:
writerhoward wrote:What specifically do the <0+> in your attachment do?
The Repeat commands apply to the immediately preceding character. If you want them to apply to several characters at once, you need to use Match brackets.

That's exactly right. The easiest one to understand is the repetition bubble (1+), which means to match one or more of something. So if we had the expression:

: oneplus.png (8.97 KiB) Viewed 16549 times

You can see that "1+" follows the "AnyDigit" bubble, so we are trying to find any bit of text that consists of one or more consecutive digits, eg: match one digit, two digits, three digits, etc.

So looking at the find expression:

: powerf.png (9.92 KiB) Viewed 16549 times

We can see that (Space) is followed by (0+), which means to match zero or more spaces. That means we'll match not only 1 space, 2 spaces, etc, but also *zero* spaces, ie: the situation where no space character exists. This is exactly the situation we're trying to fix: where the spaces are missing from either side of the dash.

PS: With the above expression I just found out that the Nisus Macro Reference has 6 instances of doubled the. They all seem to by typos

Oops, we'll have to get those fixed. I also found some "of of" typos using the PowerFind expression:

: words.png (10.92 KiB) Viewed 16548 times

..with the "whole word" option enabled.

writerhoward · Post by **writerhoward** » 2013-02-22 05:59:47

Martin,

Your latest explanation is quite helpful. What effect does the code in "capture( any word ) captured1" have?

Howard

phspaelti · Post by **phspaelti** » 2013-02-22 06:39:04

writerhoward wrote:Your latest explanation is quite helpful. What effect does the code in "capture( any word ) captured1" have?

Captured1 refers to the first capture group. So this has the effect of asking for "any word" and then a space followed by the same "any word". So the effect is of looking for places with the same word twice in a row (with a space between them).

This will match "the the", "of of", "Nisus Nisus", "Howard Howard", etc.

phspaelti · Post by **phspaelti** » 2013-02-22 06:51:31

Just to be clear here:
The Capture() brackets creates capture groups. You can have as many such groups as you want, and they can even be nested. But not overlapping.

The Captured1, Captured2, etc. are variables. They refer to the capture groups that you create with the capture brackets. You can use the "Captured" variables, either in the Find box or the Replace box. When you use them in the Find box, you match patterns that have identical/repeated bits. If you use them in the Replace box, you can rearrange the order of the bits you match.

writerhoward · Post by **writerhoward** » 2013-02-22 10:41:21

phspaelti wrote:
writerhoward wrote:Your latest explanation is quite helpful. What effect does the code in "capture( any word ) captured1" have?
Captured1 refers to the first capture group. So this has the effect of asking for "any word" and then a space followed by the same "any word". So the effect is of looking for places with the same word twice in a row (with a space between them).

This will match "the the", "of of", "Nisus Nisus", "Howard Howard", etc.

So with regard to the previous sentence, the capture code will "grab" the word "This" as it's followed by a space, add the word "This" to "This<space>" as "This" is now stored in Capture1, and search through the sentence for "This<space>This" -- If it finds it, it will replace "This<space>This" with whatever is in the Replace box. Then, Nisus will proceed to the second word in the sentence, which is "will" and search for "will<space>will" -- if found, that will be replaced by what's in the Replace box. This process continues until the last word in the sentence (in this case, "etc") is checked. Is that correct?

Howard

Post by **martin** » 2013-02-22 11:57:24

writerhoward wrote:So with regard to the previous sentence, the capture code will "grab" the word "This" as it's followed by a space, add the word "This" to "This<space>" as "This" is now stored in Capture1, and search through the sentence for "This<space>This" -- If it finds it, it will replace "This<space>This" with whatever is in the Replace box. Then, Nisus will proceed to the second word in the sentence, which is "will" and search for "will<space>will" -- if found, that will be replaced by what's in the Replace box. This process continues until the last word in the sentence (in this case, "etc") is checked. Is that correct?

You basically have the right of it. Technically speaking what you've outlaid is not entirely how it works inside the code, but it's a correct enough way to think about the process that it makes no difference.

The key difference is that at no point will the search engine "search through the sentence" (or rest of the text) for some double-word pair that you're imagining was constructed. Instead, once a single word has been matched (and captured), followed by a space, it's sufficient to merely continue the comparison at that single point to see if the captured word occurs immediately afterwards. If it does you have a full match, but if not, the search moves on and simply "forgets" the word that was captured. But that's all esoteric and essentially irrelevant to understanding the matches that will be found.

writerhoward · Post by **writerhoward** » 2013-02-22 12:33:50

Could "Found" have been used in the code example that found text duplication and replaced the duplicates with one copy? If it could, what would Found replace and where would Found appear in the code?

Howard

Post by **martin** » 2013-02-22 12:56:35

writerhoward wrote:Could "Found" have been used in the code example that found text duplication and replaced the duplicates with one copy? If it could, what would Found replace and where would Found appear in the code?

Not in this double-word matching example, because the (Found) bubble is a stand-in for the entire match. So in this case using (Found) in the replacement pattern would reinsert both matched words and the space, eg: "this this". If you wanted to use a replace pattern to fix these double-word typos you'd just replace the whole match with (Captured1), eg: replace "this this" with just "this".

Unfortunately that strategy doesn't actually work so well in practice, at least not with the macro guide, because there are lots of cases where the double words are valid, eg:

Set Text Color color

writerhoward · Post by **writerhoward** » 2013-02-22 14:42:27

martin wrote:
writerhoward wrote:Could "Found" have been used in the code example that found text duplication and replaced the duplicates with one copy? If it could, what would Found replace and where would Found appear in the code?
Not in this double-word matching example, because the (Found) bubble is a stand-in for the entire match. So in this case using (Found) in the replacement pattern would reinsert both matched words and the space, eg: "this this". If you wanted to use a replace pattern to fix these double-word typos you'd just replace the whole match with (Captured1), eg: replace "this this" with just "this".

Unfortunately that strategy doesn't actually work so well in practice, at least not with the macro guide, because there are lots of cases where the double words are valid, eg:
Set Text Color color

Could you provide an example that includes the (Found) bubble?

Post by **martin** » 2013-02-22 14:55:45

writerhoward wrote:Could you provide an example that includes the (Found) bubble?

Sure thing. As a contrived example let's add quotation marks around all words that start with the letter "A":

: found.png (31.71 KiB) Viewed 16496 times

nisus.com

PowerFind question

PowerFind question

Re: PowerFind question

Re: PowerFind question

Re: PowerFind question

Re: PowerFind question

Re: PowerFind question

Re: PowerFind question

Re: PowerFind question

Re: PowerFind question

Re: PowerFind question

Re: PowerFind question

Re: PowerFind question

Re: PowerFind question

Re: PowerFind question

Re: PowerFind question