PowerFind question
-
- Posts: 50
- Joined: 2013-02-03 05:10:26
PowerFind question
I've been trying, unsuccessfully, to do the following. I occasionally work with an RTF in which a space is missing either before or after (or both) an em dash, as in "people —it." I would like to set up PowerFind to check whether immediately to the left or right of the em dash there's a letter of the alphabet and, if there is, to replace it with a space. Thus, for my previous example, the result would be "people — it."
Any help would be much appreciated.
Howard
Any help would be much appreciated.
Howard
Re: PowerFind question
The trick to doing this in one go, is to turn what you want to do upside down. Replace all em-dashes, with any preceding or following spaces to the pattern <space>—<space>. This will pointlessly replace all the correct cases, but it will straighten out all the wrong cases. To do this make sure you are using PowerFind (or PowerFind Pro). Then type the following into the find box:
The <0+> must be chosen from the "Repeat" menu.
The replace box should just have " — ".
If you have em-dashes in your document that are not between text, and you don't want to have spaces surrounding them, then you need to make the above a bit more elaborate.
<AnyWordCharacter> is from the "Wildcard" menu, and <PrecededBy(>…<)> is from the "Match" menu.
In PowerFind Pro the complete expression is:
Code: Select all
<0+>— <0+>
The replace box should just have " — ".
If you have em-dashes in your document that are not between text, and you don't want to have spaces surrounding them, then you need to make the above a bit more elaborate.
Code: Select all
<PrecededBy(><AnyWordCharacter><)> <0+>— <0+><FollowedBy(><AnyWordCharacter><)>
In PowerFind Pro the complete expression is:
Code: Select all
(?<=\w) *— *(?=\w)
philip
- martin
- Official Nisus Person
- Posts: 5230
- Joined: 2002-07-11 17:14:10
- Location: San Diego, CA
- Contact:
Re: PowerFind question
Thanks for that Philip, very instructive. Also, just to be clear, those <0+> entries Philip was mentioning will look like this once inserted:
We call those PowerFind "bubbles". You can see that I've also used such a bubble for the space character, just to make it more visible.-
- Posts: 50
- Joined: 2013-02-03 05:10:26
Re: PowerFind question
Martin,martin wrote:Thanks for that Philip, very instructive. Also, just to be clear, those <0+> entries Philip was mentioning will look like this once inserted:We call those PowerFind "bubbles". You can see that I've also used such a bubble for the space character, just to make it more visible.
Thank you for what you wrote. I was confused by the meaning of <0+>.
What specifically do the <0+> in your attachment do?
Also, is there any way to save entries made in the Find/Replace popup?
Howard
Re: PowerFind question
Sorry. I guess my explanation was a bit too short.writerhoward wrote:Martin, Thank you for what you wrote. I was confused by the meaning of <0+>.
The real power of Nisus' Find/Replace lies in those pop-up menus to the left of the Find Box.
The Repeat commands apply to the immediately preceding character. If you want them to apply to several characters at once, you need to use Match brackets.writerhoward wrote:What specifically do the <0+> in your attachment do?
The pop-up has a feature to save expressions. Another option is you can use the Macroize… feature which will save the Find and the Replace as a unit command in a Nisus macro in the Macro menu.writerhoward wrote:Also, is there any way to save entries made in the Find/Replace popup?
PS: With the above expression I just found out that the Nisus Macro Reference has 6 instances of doubled the. They all seem to by typos

philip
- martin
- Official Nisus Person
- Posts: 5230
- Joined: 2002-07-11 17:14:10
- Location: San Diego, CA
- Contact:
Re: PowerFind question
That's exactly right. The easiest one to understand is the repetition bubble (1+), which means to match one or more of something. So if we had the expression: You can see that "1+" follows the "AnyDigit" bubble, so we are trying to find any bit of text that consists of one or more consecutive digits, eg: match one digit, two digits, three digits, etc.phspaelti wrote:The Repeat commands apply to the immediately preceding character. If you want them to apply to several characters at once, you need to use Match brackets.writerhoward wrote:What specifically do the <0+> in your attachment do?
So looking at the find expression: We can see that (Space) is followed by (0+), which means to match zero or more spaces. That means we'll match not only 1 space, 2 spaces, etc, but also *zero* spaces, ie: the situation where no space character exists. This is exactly the situation we're trying to fix: where the spaces are missing from either side of the dash.
Oops, we'll have to get those fixed. I also found some "of of" typos using the PowerFind expression: ..with the "whole word" option enabled.PS: With the above expression I just found out that the Nisus Macro Reference has 6 instances of doubled the. They all seem to by typos
-
- Posts: 50
- Joined: 2013-02-03 05:10:26
Re: PowerFind question
Martin,
Your latest explanation is quite helpful. What effect does the code in "capture( any word ) captured1" have?
Howard
Your latest explanation is quite helpful. What effect does the code in "capture( any word ) captured1" have?
Howard
Re: PowerFind question
Captured1 refers to the first capture group. So this has the effect of asking for "any word" and then a space followed by the same "any word". So the effect is of looking for places with the same word twice in a row (with a space between them).writerhoward wrote:Your latest explanation is quite helpful. What effect does the code in "capture( any word ) captured1" have?
This will match "the the", "of of", "Nisus Nisus", "Howard Howard", etc.
philip
Re: PowerFind question
Just to be clear here:
The Capture() brackets creates capture groups. You can have as many such groups as you want, and they can even be nested. But not overlapping.
The Captured1, Captured2, etc. are variables. They refer to the capture groups that you create with the capture brackets. You can use the "Captured" variables, either in the Find box or the Replace box. When you use them in the Find box, you match patterns that have identical/repeated bits. If you use them in the Replace box, you can rearrange the order of the bits you match.
The Capture() brackets creates capture groups. You can have as many such groups as you want, and they can even be nested. But not overlapping.
The Captured1, Captured2, etc. are variables. They refer to the capture groups that you create with the capture brackets. You can use the "Captured" variables, either in the Find box or the Replace box. When you use them in the Find box, you match patterns that have identical/repeated bits. If you use them in the Replace box, you can rearrange the order of the bits you match.
philip
-
- Posts: 50
- Joined: 2013-02-03 05:10:26
Re: PowerFind question
So with regard to the previous sentence, the capture code will "grab" the word "This" as it's followed by a space, add the word "This" to "This<space>" as "This" is now stored in Capture1, and search through the sentence for "This<space>This" -- If it finds it, it will replace "This<space>This" with whatever is in the Replace box. Then, Nisus will proceed to the second word in the sentence, which is "will" and search for "will<space>will" -- if found, that will be replaced by what's in the Replace box. This process continues until the last word in the sentence (in this case, "etc") is checked. Is that correct?phspaelti wrote:Captured1 refers to the first capture group. So this has the effect of asking for "any word" and then a space followed by the same "any word". So the effect is of looking for places with the same word twice in a row (with a space between them).writerhoward wrote:Your latest explanation is quite helpful. What effect does the code in "capture( any word ) captured1" have?
This will match "the the", "of of", "Nisus Nisus", "Howard Howard", etc.
Howard
- martin
- Official Nisus Person
- Posts: 5230
- Joined: 2002-07-11 17:14:10
- Location: San Diego, CA
- Contact:
Re: PowerFind question
You basically have the right of it. Technically speaking what you've outlaid is not entirely how it works inside the code, but it's a correct enough way to think about the process that it makes no difference.writerhoward wrote:So with regard to the previous sentence, the capture code will "grab" the word "This" as it's followed by a space, add the word "This" to "This<space>" as "This" is now stored in Capture1, and search through the sentence for "This<space>This" -- If it finds it, it will replace "This<space>This" with whatever is in the Replace box. Then, Nisus will proceed to the second word in the sentence, which is "will" and search for "will<space>will" -- if found, that will be replaced by what's in the Replace box. This process continues until the last word in the sentence (in this case, "etc") is checked. Is that correct?
The key difference is that at no point will the search engine "search through the sentence" (or rest of the text) for some double-word pair that you're imagining was constructed. Instead, once a single word has been matched (and captured), followed by a space, it's sufficient to merely continue the comparison at that single point to see if the captured word occurs immediately afterwards. If it does you have a full match, but if not, the search moves on and simply "forgets" the word that was captured. But that's all esoteric and essentially irrelevant to understanding the matches that will be found.
-
- Posts: 50
- Joined: 2013-02-03 05:10:26
Re: PowerFind question
Could "Found" have been used in the code example that found text duplication and replaced the duplicates with one copy? If it could, what would Found replace and where would Found appear in the code?
Howard
Howard
- martin
- Official Nisus Person
- Posts: 5230
- Joined: 2002-07-11 17:14:10
- Location: San Diego, CA
- Contact:
Re: PowerFind question
Not in this double-word matching example, because the (Found) bubble is a stand-in for the entire match. So in this case using (Found) in the replacement pattern would reinsert both matched words and the space, eg: "this this". If you wanted to use a replace pattern to fix these double-word typos you'd just replace the whole match with (Captured1), eg: replace "this this" with just "this".writerhoward wrote:Could "Found" have been used in the code example that found text duplication and replaced the duplicates with one copy? If it could, what would Found replace and where would Found appear in the code?
Unfortunately that strategy doesn't actually work so well in practice, at least not with the macro guide, because there are lots of cases where the double words are valid, eg:
Set Text Color color
-
- Posts: 50
- Joined: 2013-02-03 05:10:26
Re: PowerFind question
Could you provide an example that includes the (Found) bubble?martin wrote:Not in this double-word matching example, because the (Found) bubble is a stand-in for the entire match. So in this case using (Found) in the replacement pattern would reinsert both matched words and the space, eg: "this this". If you wanted to use a replace pattern to fix these double-word typos you'd just replace the whole match with (Captured1), eg: replace "this this" with just "this".writerhoward wrote:Could "Found" have been used in the code example that found text duplication and replaced the duplicates with one copy? If it could, what would Found replace and where would Found appear in the code?
Unfortunately that strategy doesn't actually work so well in practice, at least not with the macro guide, because there are lots of cases where the double words are valid, eg:Set Text Color color
- martin
- Official Nisus Person
- Posts: 5230
- Joined: 2002-07-11 17:14:10
- Location: San Diego, CA
- Contact:
Re: PowerFind question
Sure thing. As a contrived example let's add quotation marks around all words that start with the letter "A":writerhoward wrote:Could you provide an example that includes the (Found) bubble?