Reply to topic  [ 13 posts ] 
Powerfind Q - Contains a semicolon in dialog 
Author Message

Joined: 2013-03-19 16:22:50
Posts: 68
Hi,
I've spent an hour with the manual examples and read the Capture threads on the forum, but no luck so far figuring out my problem.

Simply put, I want to find all dialog that contains a semicolon. Should be simple but so far isn't.

A -- "This is dialog."
B -- "This is dialog also; but it contains a semicolon."

I want to make a PowerFind expression that selects B but not A. (I don't need replace, because I'll be moving through them one by one to decide how the replacement will be done).

So far, I successfully can find A using PowerFind bubble expression that says (in bubbles of course)
“ (AnyTextCharacter) (1+Shortest) ”
where the open and close double quotes are part of the expression also.

But I can't figure out how to get these selected only when they are B, the one that contains a semicolon.

I've tried various attempts inserting the semicolon, with Capture, and with Preceded and Followed by, and with CharacterInSet -- no joy.

It seems like CharacterInSet should be the simplest -- In other words, I want my existing Capture or Found to only be a valid result if it contains a semicolon. So the 'Found' would be the set, and it should contain a semicolon.

But I can't figure out how to do this.

Any help and further explanation appreciated. CharacterInSet only had two brief mentions in the manual and no examples, so I'm in the dark about that.

WF


2015-10-24 10:31:16
Profile

Joined: 2013-03-19 16:22:50
Posts: 68
withoutFeathers wrote:
Hi,
Simply put, I want to find all dialog that contains a semicolon. Should be simple but so far isn't.

A -- "This is dialog."
B -- "This is dialog also; but it contains a semicolon."

I want to make a PowerFind expression that selects B but not A. (I don't need replace, because I'll be moving through them one by one to decide how the replacement will be done).


Addition to my own post after thinking about it further:

Isn't what I'm looking for the simple Boolean 'AND' function?

I want to search for (Powerfind for A) AND (semicolon).

But there's no AND under the Match menu -- why is that? There's an OR, but no AND.

How is such an AND done?

Thanks...

WF

Thanks


2015-10-24 11:50:21
Profile

Joined: 2007-11-09 15:27:25
Posts: 86
This works for me, but I got tripped up because I had plain quotes in the text being searched and smart quotes in my Find expression...
Note Semicolon with space following.
I checked by copying/pasting what's below into the search box. The bubbles get rendered here as parentheses.
I put period, question mark, and exclamation into the Character Set. Maybe you need other things?

“(AnyTextCharacter)(1+ Shortest); (AnyTextCharacter)(1+ Shortest)(CharacterInSet[).?!(])”

HTH


2015-10-24 16:13:32
Profile

Joined: 2013-03-19 16:22:50
Posts: 68
jb wrote:
This works for me, but I got tripped up because I had plain quotes in the text being searched and smart quotes in my Find expression...
Note Semicolon with space following.
I checked by copying/pasting what's below into the search box. The bubbles get rendered here as parentheses.
I put period, question mark, and exclamation into the Character Set. Maybe you need other things?

“(AnyTextCharacter)(1+ Shortest); (AnyTextCharacter)(1+ Shortest)(CharacterInSet[).?!(])”

HTH


Thank you very much for doing this. It's good, but has the same problem one of my attempts did, which I'll explain below.

But first: so you've defined your own set for CharacterInSet? And I can put anything in there I like? If so, where did you learn this (it's not in the manual AFAIK). Second, could I put other things in there, like 'Capture1'?

Now, here's the problem with your search (and the one I tried):
When there's split dialog -- and this is common -- it flags everything even if the semicolon is in the description in between, or only in the second half.

Example A, semicolon in the description between:

"A nasty bit of work," he said; and scratched himself, "we'll have to go out there and find out more about this."

This entire sentence gets Found, but the semicolon isn't in the dialog part.

Example B:

"A nasty bit of work," he said, "we'll have to go out there; we need to find out more about this."

This entire sentence also gets Found, even though the semicolon is only in the second part of the dialog.

In short sentences it's not a problem because I can see quickly where the semicolon is or isn't, but sometimes there are long and complex paragraphs with the split occurring at a less obvious place -- and this makes it tricky to find where the semicolon is.

So, do you (or anyone) know if there's any way to limit the search so that only semicolons between dedicated opening-closing pair of quotes are Found?

As I said in my other post, it seems like the Boolean 'AND' function would be nice here -- because I've been able to set up a search that finds dialog between pairs of quotes:
“(AnyTextCharacter) (1+Shortest)”

So it seems I just need a way to say "and this string contains a semicolon".

In other words, I'm thinking that If I defined that expression as a Capture, maybe there's a way to then say Found = Capture1 AND semicolon (as a Boolean AND). Is there a way to do that?

WF


2015-10-24 20:37:34
Profile

Joined: 2014-02-08 12:57:03
Posts: 169
Location: Australia
G’day, WF et al

First of all, in my opinion it’s generally not a good idea to use straight quotes, particularly if you wish to manipulate text in any way. You can use Kino’s Straight to Curly Quotes macro to do the required conversion. (Always do these things on a duplicate file!)

Then the following is a PowerFind Pro Find expression that seems to do what you want:–

“[^;“”]+?;[^;“”]+?”

Hope this helps.

Cheers,
Adrian

_________________
MacBook Pro (mid-2014)
macOS Sierra 10.12.6
Nisus Writer user since 1996


2015-10-24 23:18:30
Profile

Joined: 2007-11-09 15:27:25
Posts: 86
Adrian’s solution is the best.


But since I don’t know how to write regex very well, I amused myself by trying to accomplish the same thing with mere PowerFind. This seems to work:

“(AnyTextCharacter)(CharacterNotInSet[)“(])(1+ Shortest); (AnyTextCharacter)(CharacterNotInSet[)“(])(1+ Shortest)”

As for CharacterInSet: I just poked at it. You can type in anything you want. Not sure about “Capture,” though.

For converting plain quotes to smart quotes, you can use the menu item Edit>Convert: Plain Quotes to Smart Quotes. Make sure to select text first.

I think I understand what Adrian did, aside from the double quote marks. And g**gling for this sort of thing is not my idea of fun.
Adrian, can you recommend a good source for this stuff? (The Nisus Macro reference isn’t a lot of help.)


2015-10-25 06:46:39
Profile

Joined: 2013-03-19 16:22:50
Posts: 68
Hi Adrian and jb

I thank both of your for your efforts, and we're getting there fast. :)

I wrote a too-long reply to Adrian before seeing jb's new contribution and luckily there was a power failure for three hours here and my reply was lost, so I can catch up to both of you. :roll:

And I'm very happy that jb has made the PowerFind because I find regex hard to get into.

Here's what happens when I compare the two solutions:

Both seem to correctly find a block of dialog -- even one side of split dialogs -- with a semicolon inside. Yay!

But, there's one quibble -- here jb's PowerFind solution seems superior: it flags also blocks of text that have two semicolons in them. But Adrian's skips those!

Test I ran: I searched sequentially through the first 100 pages or so of the book (600 page book) and this was the result:

Arian's regex flagged 12 dialog blocks on these pages:
28, 29, 37, 40, 44, 55, 66, 67, 75, 92, 94, 95

jb's PowerFind flagged 14 dialog blocks on these pages:
28, 29, 37, 38, 40, 44, 55, 65, 66, 67, 75, 92, 94, 95

Identical except that pages 38 and 65 are extra in jb's version. Both of these blocks contain two semicolons. The other twelve contain only one semicolon.

So that seems to be the difference.

jb wrote:
Adrian’s solution is the best.


It was, but I do need to find paragraphs that have 1 or more semicolons, not just one, so it looks like the jb PowerFind is the one.

@jb I agree wholeheartedly that it would be nice to have a smooth, well-rounded explanation of how to use PowerFind pro, for those of us not versed in technical terminology.

@Adrian, I agree about curlies, and have been translating everything into them for decades. :D . Perhaps you were misled by my use of straight quotes here in the forum post illustrative sentences. I'm sloppy about using them here sometimes because there's no automatic translation and it's cumbersome getting them direct from the keyboard -- and I can't even see the difference on my monitor at small text sizes that I read this forum at. My bad, sorry. Yes! ...“Let's all use curly quotes!” All the time! :wink:

@Adrian, I also apologize if my original question was misleading -- I only specified 'contains a semicolon'. And that's what your search finds. I ought to have said 'contains semicolons' or 'at least one semicolon'. My bad again!

WF


2015-10-25 09:57:22
Profile

Joined: 2007-11-09 15:27:25
Posts: 86
Glad you got it working and that the PowerFind version is helpful.
I know (?) there are things regular expressions can do that PowerFind can't, but I have no idea how to know what those things might be.

I should have said that Adrian's solution is more elegant ;-)


2015-10-25 13:21:22
Profile

Joined: 2014-02-08 12:57:03
Posts: 169
Location: Australia
G’day, WF, jb et al

If you increase the font size in your browser, you will see that the double quote marks in my Find expression are actually one set of smart opening double quotes and one set of smart closing double quotes.

Try one of these:–

PowerFind Pro
“([^;“”]+?;)+[^;“”]+?”

PowerFind
Note: The required expression looks like the following except that each occurrence of "1+ Shortest” has a “balloon” around it. I’m afraid I don’t know how to reproduce here the appearance of PowerFind expressions as they actually appear in the dialog box. Hopefully, you get the idea.
“Capture(CharacterNotInSet[;“”]1+ Shortest;)1+CharacterNotInSet[;“”]1+ Shortest”

These seem to do what you want, even with text such as:–

“A nasty bit of work; a very nasty bit of work; a very nasty bit of work indeed,” he said; and scratched himself; and scratched himself again, “we’ll have to go out there; we need to find out more about this; yes, even more.”

As far as regular expressions are concerned, you can use the PowerFind browser to construct PowerFind Pro expressions. Read them in conjunction with the table of “Characters with special meaning” on Page 325 of the Nisus Writer Pro User Guide.

The treatment of regular expressions in the Nisus Writer Pro User Guide is not bad, really. The BBEdit User Manual also has a good introduction. And there are other introductions on the Web. Not all implementations are exactly the same, though.

Complicated regular expressions can do funny things to your brain: it’s best to build them up bit by bit, testing as you go. If you don’t get the result you were expecting, simplify the expression until you do, then build up piecemeal from there. Pay attention to which checkboxes (eg, Ignore Case, Attribute Sensitive) are ticked in the Find/Replace dialog box. Remember that expressions can contain normal-looking words and phrases as well as the special characters. Parentheses are useful for delineating chunks of code, either to aid digestion or for future reference in Replace expressions; to this extent, they are treated like other “special characters”. To find actual occurrences of a special character (such as an opening parenthesis or a period, for example), you need to “escape” it by preceding it with a backslash in the Find expression.

Those are the basics. It’s really a matter of experimenting (on disposable text!) to become familiar with what is happening. If you get stuck, ask the Forum. Once you’ve got the hang of regular expressions, you’ll use PowerFind Pro (almost) all the time.

Cheers,
Adrian

_________________
MacBook Pro (mid-2014)
macOS Sierra 10.12.6
Nisus Writer user since 1996


2015-10-25 14:07:16
Profile

Joined: 2013-03-19 16:22:50
Posts: 68
adryan wrote:
G’day, WF, jb et al

If you increase the font size in your browser, you will see that the double quote marks in my Find expression are actually one set of smart opening double quotes and one set of smart closing double quotes.


Hi Adrian,

Thanks again! (I had noticed your curly quotes, yes.)

I tested both your new searches (regex and bubble), and both now find all 14 instances.
Thank you.

In terms of me figuring out how it works, one question at this point:
Comparing your two new versions -- regex and bubble -- it seems I now understand what 'Capture' means, but I'd like to check with you: when you 'Capture', then the next term after the end of the capture refers to the whole Capture, correct?

For example, taking the first half of the bubbl-ized search you provided:

Capture(CharacterNotInSet[;“”]1+ Shortest;)1+

Then what is 'captured' is now all repeated 1+ times, because that repeat falls after the final bracket. Correct?

If so, then capture is a way of grouping a string of terms and saying that the next term refers to the whole group -- sort of like Boolean parentheses: (Some expression) AND/Or (Some other expression). --Plus with Capture the group can be numbered and saved, to be used again later, or elsewhere. Yes?

WF


2015-10-25 15:13:14
Profile

Joined: 2014-02-08 12:57:03
Posts: 169
Location: Australia
G’day, WF et al

What you say is not quite right, I’m afraid. What follows the Capture expression is not necessarily an operator: it may be an independent expression in its own right.

Essentially, the Capture operator acts a bit like a mathematical function of the form f(x). In this case, we have Capture(x), where x is any expression. All that the Capture “function” says is: package up x and keep it for (possible) future reference.

Here are some expressions to consider:–
Capture(x)
Capture(x)Capture(y)
heliumCapture(x)
heliumCapture(x)gas

x and y can be anything you are interested in searching for. The point here is that the Capture expressions here are all standalone expressions that are not followed by operators. In the second of the above expressions, the Capture(x) part can be referenced subsequently by Captured1 and the Capture(y) part by Captured2.

“1+” is a reserved expression (unlike “ab” or “56”) which acts like an operator in Reverse Polish Notation or a Forth-like language. That is, it acts a bit like a mathematical function of the form (x)f, where x is any expression and f is now the 1+ “function”. The 1+ “function” says: concatenate multiple instances of the expression x. The Find routine would look for x, then xx, then xxx, and so on.

Now consider the expression: Capture(x)1+

The interpreter evaluates the Capture(x) part before offering it to the 1+ operator. It is not the case that 1+ operates on the x before offering it to the Capture operator.

To my mind, the PowerFind Pro syntax is in fact more straightforward in such cases than the PowerFind syntax. PowerFind’s Capture “function” is really just a (special) set of parentheses, but the “natural language” formulation tends to obscure this somewhat.

I hope this helps to explain things a bit for you.

By way of encouragement for you to become a PowerFind Pro convert, here is a quick explanation of my Find expression:–

“([^;“”]+?;)+[^;“”]+?”
That’s the whole thing. We’ll now build it up, step by step.

[^;“”]
Exclude from the search each of the characters following the ^ (up to but not including the closing ]). You need to use the square brackets here. We exclude the semicolon because we need to deal with it in a special way a little later. We exclude the quotation marks because we want to ensure the found text is enclosed by a set of opening and closing quotation marks. Note that these are smart quotation marks.

[^;“”]+
We are looking for a sequence of one or more of these characters. The + means “one or more of the preceding”.

[^;“”]+?;
We are looking for the shortest such sequence that is followed by a semicolon. The ? means “the shortest example of the preceding”.

([^;“”]+?;)
Package the whole expression. This could be for ease of reading, for subsequent reference in a Replace expression, or (as in the present case) to ensure all of it is acted on by an operator (which in this case comes after the expression).

([^;“”]+?;)+
Recall that the + means “one or more of the preceding”. If a sequence of characters followed by a semicolon is not followed immediately by another such sequence, we will take just this one (this is the “one” part); if it is followed by any number of such sequences (before we get to any quotation marks, remember), we want the lot (this is the “or more” part).

([^;“”]+?;)+[^;“”]
There will be at least some other character following the last semicolon before we get to the concluding quotation marks.

([^;“”]+?;)+[^;“”]+
We want all these characters.

([^;“”]+?;)+[^;“”]+?”
But we don’t want to go beyond the first set of closing quotation marks we encounter, so we’ll stop with the shortest such sequence.

“([^;“”]+?;)+[^;“”]+?”
And of course we only wanted those character sequences that were immediately preceded by opening quotation marks.

QED.

Cheers,
Adrian

_________________
MacBook Pro (mid-2014)
macOS Sierra 10.12.6
Nisus Writer user since 1996


2015-10-25 19:47:27
Profile

Joined: 2013-03-19 16:22:50
Posts: 68
Hi Adrian,
Thanks again, and I will study your Pro example explanation in more detail.

The one spot where I'm fuzzy is the differentiation between Capture in PowerFind bubbles and Capture in Pro.
My attempt at explaining Capture was based on the bubbles usage -- that they're like a glorified parenthesis there. But I don't (didn't) know how they work in Pro.

adryan wrote:
G’day, WF et al
What you say is not quite right, I’m afraid. What follows the Capture expression is not necessarily an operator: it may be an independent expression in its own right.
[snip...]
To my mind, the PowerFind Pro syntax is in fact more straightforward in such cases than the PowerFind syntax. PowerFind’s Capture “function” is really just a (special) set of parentheses, but the “natural language” formulation tends to obscure this somewhat.


Your reply seems split on this point, which is where I'm fuzzy. On the one hand, you explain in detail how Capture can be like f(x). Good...but does this explanation apply to the Pro version more fully than to the bubble version?

OTOH, you say about the PowerFind version that it's "just a (special) set of parentheses", -- which does sound like my explanation.

So what I'm taking from this is that Capture works very differently in PowerFind Pro compared to the regular bubble version. Is this correct?

WF


2015-10-25 20:51:51
Profile

Joined: 2014-02-08 12:57:03
Posts: 169
Location: Australia
G’day, WF et al

My description of Capture related to PowerFind. PowerFind Pro does not have Capture as such. A simple pair of parentheses does the same job. That is to say, the following are equivalent:–

PowerFind
Capture(x)

PowerFind Pro
(x)

So they work in the same way. What I was trying to alert you to is that a PowerFind expression such as Capture(x)1+ has to be dealt with very carefully. The Capture here has finished its job by the time the interpreter gets to the 1+. Furthermore, the 1+ acts on the whole Capture(x) expression, not on the (x). The equivalent PowerFind Pro expression, (x)1+, is less prone to misinterpretation, I think.

Cheers,
Adrian

_________________
MacBook Pro (mid-2014)
macOS Sierra 10.12.6
Nisus Writer user since 1996


2015-10-25 22:38:52
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 13 posts ] 

Who is online

Users browsing this forum: Google [Bot] and 7 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software