Hi,
I have got a text like this:
blabla blablaTABnumberTABblabla blabla bla?TABbla blaTABbla blaRETURN
I want to change the last TAB and the BLABLA and the RETURN by a RETURN
So I search for \t.+?\n
But this finds TABnumberTABblabla blabla bla?TABbla blaTABbla blaRETURN (without the first blabla blabla)
Why? The shortest text between a tab and the return is only the last blabla!
So what do I have to write for „text“? The text contains letters, numbers, punctuation and URL-relevant letters like . , ! „“ ? / : %
Please help.
Finding Shortest +1 doesn’t work as I think it should
Re: Finding Shortest +1 doesn’t work as I think it should
Hello, Matze.
PowerFind Pro is OK. Your regexp finds a string starting with a tab (\t), followed by a string which does not contain a return (.+?) followed by return (\t). By the way, you can do without the non-greedy question mark as period (.) means any character except return.
So, the correct regexp should be:
In other words: a string of characters (.+) followed by a tab (\t) followed by another string (.+) followed by return (\n).
Best, Henry.
PowerFind Pro is OK. Your regexp finds a string starting with a tab (\t), followed by a string which does not contain a return (.+?) followed by return (\t). By the way, you can do without the non-greedy question mark as period (.) means any character except return.
So, the correct regexp should be:
Code: Select all
.+\t.+\n
Best, Henry.
Re: Finding Shortest +1 doesn’t work as I think it should
The problem is that the "shortest" only becomes relevant after it has found the tab. Since it finds the first tab the remaining stuff is the shortest amount of text after that tab.Matze wrote:So I search for \t.+?\n
But this finds TABnumberTABblabla blabla bla?TABbla blaTABbla blaRETURN (without the first blabla blabla)
Why? The shortest text between a tab and the return is only the last blabla!
So what do I have to write for „text“? The text contains letters, numbers, punctuation and URL-relevant letters like . , ! „“ ? / : %
In my opinion the best option in cases like this is to use the 'not'-set. So you can do the following:
Code: Select all
\t[^\t]*\n
Code: Select all
(:?\t[^\t]*){3}\n
philip
Re: Finding Shortest +1 doesn’t work as I think it should
This string selects/finds a whole paragraph, Henry.Groucho wrote:Hello, Matze.
PowerFind Pro is OK. Your regexp finds a string starting with a tab (\t), followed by a string which does not contain a return (.+?) followed by return (\t). By the way, you can do without the non-greedy question mark as period (.) means any character except return.
So, the correct regexp should be:
In other words: a string of characters (.+) followed by a tab (\t) followed by another string (.+) followed by return (\n).Code: Select all
.+\t.+\n
Best, Henry.
Re: Finding Shortest +1 doesn’t work as I think it should
Cool! Thanks a lot, Philip. I seem to understand the meaning of "." as it was in NWC.phspaelti wrote:The problem is that the "shortest" only becomes relevant after it has found the tab. Since it finds the first tab the remaining stuff is the shortest amount of text after that tab.Matze wrote:So I search for \t.+?\n
But this finds TABnumberTABblabla blabla bla?TABbla blaTABbla blaRETURN (without the first blabla blabla)
Why? The shortest text between a tab and the return is only the last blabla!
So what do I have to write for „text“? The text contains letters, numbers, punctuation and URL-relevant letters like . , ! „“ ? / : %
In my opinion the best option in cases like this is to use the 'not'-set. So you can do the following:This approach has the advantage that you can use it to select any tab. So the following will pick the last three tabs (and any "blahblah" in-between).Code: Select all
\t[^\t]*\n
Code: Select all
(:?\t[^\t]*){3}\n
btw, dear Nisus-team: may I recommend, that you rethink the expressions/strings in the dropdownmenu of search?
Strings as you would use it as a common writer.
- I can search for upper or lower characters, but why can't I search for upper AND lower? I know I can, but I have to know the expression for it.
Why not give this option [[:alpha:]] as well?
- I’d like to have a string for the old NWC "." which did found letters, numbers, punctuationmarks and space, so all signs which are usually in a sentence - and only that.
- regarding a "word": the actual "word" finds single numbers, too. A number is not a word, is it?
- why not give a "number"-string in the dropdownmenu, below the Ziffer/digit?, which finds every number: 1 13 0,234 1 Mio 2 Billion -12 3/4 sqaure-expressions Pi and what not.
- regarding "sentences": the actual one finds "sentences" that contain tabs. It even finds parts of an URL as a sentence. Couldn't we have a sentence string, which finds everything inbetween to sentence endings but has no tabs, returns? So just a sentence. May it be descriptive or dialogue.
- regarding "paragraphs": a paragraph contains all sentences AND the return, that seperates it from the text below. The actual paragraph string (?:^.+$) finds it without the return. When I copy such a paragraph and then paste it into some text/paragraph, it becomes part of that text/paragraph and I have to hit return to make it the paragraph it has been before.
Edit: One more major request: In NWC there was a button in f/r, that created a context list of all what was found. Can we have that back, please?
Best regards, Matze
Re: Finding Shortest +1 doesn’t work as I think it should
I totally agree with this one. This should be added to the wildcard menuMatze wrote:btw, dear Nisus-team: may I recommend, that you rethink the expressions/strings in the dropdownmenu of search?
Strings as you would use it as a common writer.
- you can search for upper and lower, but why can't I search for upper and lower? I know I can, but I have to now the expression for it.
Why not give this option [[:alpha:]] as well?
This one is there. It's called "AnyTextCharacter".Matze wrote:- the old NWC "." which did found letters, numbers, punctuationmarks and space, so all signs which are usually in a sentence.
This is a well-known technical issue. "Word" here is used in the computer technical sense, which is a string of "word characters". From the computer technician's point of view word characters stand in opposition to "white-space", punctuation, and new-line/line-feed, so yes they include numbers. Nisus really can't do anything about this, since they don't really write the find/replace engine themselves (I believe). And it would break more things than it would fix.Matze wrote:- regarding a "word": the actual "word" finds single numbers, too. A number is not a word, is it?
But for yourself perhaps [[:alpha:]]+ would work as a definition of word? The real problem however is that 'word', as you think of it, is a linguistic notion, which really can't be defined in a generally satisfying way. Is "there's" one word or two? If two, is the second [s] or ['s]? In a list, A. B. C., etc. probably shouldn't be words but at the beginning of a sentence "A" is of course a word, and on and on.
Sounds like a good idea. Nisus should probably consider editing and expanding the list of predefined wild-cards. But remember that you can also save your own expressions, and you can even give them names. They will then be listed under "Saved expressions". (I keep forgetting this feature myself.)Matze wrote:- why not give a "number"-string in the dropdownmenu, below the Ziffer/digit?, which finds every number: 1 13 0,234 1 Mio 2 Billion -12 3/4 sqaure-expressions Pi and what not.
Sentences are a well known nightmare. Again they are a linguistic notion, so there is no satisfactory solution. But the idea of excluding tabs strikes me as a good one. Here is today's attempt at a better definition of sentence:Matze wrote:- regarding "sentences": the actual one finds "sentences" that contain tabs. It even finds parts of an URL as a sentence. Couldn't we have a sentence string, which finds everything inbetween to sentence endings but has no tabs, returns? So just a sentence. May it be descriptive or dialogue.
Code: Select all
(?:["“„'‘‚]?[[:upper:]][^\t]+?[\.\?\!]["”‟'’‛]?)(?= |$)
This opens another can of worms. Maybe they did it this way because of the change to non-contiguous copy (which now adds newlines)? Again define your own (?:^.+\n) and save it as an expression. But note that this one will not catch the last paragraph in a document (unless it has an actual newline).Matze wrote:- regarding "paragraphs": a paragraph contains all sentences AND the return, that seperates it from the text below. The actual paragraph string (?:^.+$) finds it without the return. When I copy such a paragraph and then paste it into some text/paragraph, it becomes part of that text/paragraph and I have to hit return to make it the paragraph it has been before.
Didn't we just go over this one recently? (Did you ever try my macro?Matze wrote:One more major request: In NWC there was a button in f/r, that created a context list of all what was found. Can we have that back, please?

philip
Re: Finding Shortest +1 doesn’t work as I think it should
Dear Philip,
I am just leaving the office. So only this for now: thanks for your thoughts, I will answer them tomorrow.
And: yes I am using your list macro intensivley! Thanks again. But I’d find it nice to have the option to activate a list when I already have found numerous hits. If there are for example 23 hits and I thought there should have been only two I would like to see them all in a list, just by clicking on a button or something.
Until tomorrow! Matze
I am just leaving the office. So only this for now: thanks for your thoughts, I will answer them tomorrow.
And: yes I am using your list macro intensivley! Thanks again. But I’d find it nice to have the option to activate a list when I already have found numerous hits. If there are for example 23 hits and I thought there should have been only two I would like to see them all in a list, just by clicking on a button or something.
Until tomorrow! Matze
Re: Finding Shortest +1 doesn’t work as I think it should
Matze wrote:- the old NWC "." which did found letters, numbers, punctuationmarks and space, so all signs which are usually in a sentence.
Did "." in NWC find a tab as well? I thought it didn'tphspaelti wrote:This one is there. It's called "AnyTextCharacter".
Matze wrote:- regarding a "word": the actual "word" finds single numbers, too. A number is not a word, is it?
Definitely! I will save that as my personal "word"-search-term. But maybe it is a good idee to add that to the wild card, too. I guess, most people just write text with a word processor and regardig that NWPs find/replace should be very easy to understand yet very powerfull to use. So if there is the word "word" in the wild card and it doesn't mean word, as most people, as I, would presume it should be, then the real NWP-meaning of "word" and all the other terms should be given, best in find/replace. (In Nisus' help file the meaning of Any Word is "any word" ...)phspaelti wrote:This is a well-known technical issue."Word" here is used in the computer technical sense, which is a string of "word characters". From the computer technician's point of view word characters stand in opposition to "white-space", punctuation, and new-line/line-feed, so yes they include numbers. Nisus really can't do anything about this, since they don't really write the find/replace engine themselves (I believe). And it would break more things than it would fix.
But for yourself perhaps [[:alpha:]]+ would work as a definition of word?
Moreover a AnyRealWord or something would help:
Actual NWP's Any Word finds there and s, and this is fine to me. But it shouldn't find "1" and "A.".phspaelti wrote:The real problem however is that 'word', as you think of it, is a linguistic notion, which really can't be defined in a generally satisfying way. Is "there's" one word or two? If two, is the second [s] or ['s]? In a list, A. B. C., etc. probably shouldn't be words but at the beginning of a sentence "A" is of course a word, and on and on.
Matze wrote:- why not give a "number"-string in the dropdownmenu, below the Ziffer/digit?, which finds every number: 1 13 0,234 1 Mio 2 Billion -12 3/4 square-expressions Pi and what not.
Yes, I do remember that, and I am savingphspaelti wrote:Sounds like a good idea. Nisus should probably consider editing and expanding the list of predefined wild-cards. But remember that you can also save your own expressions, and you can even give them names. They will then be listed under "Saved expressions". (I keep forgetting this feature myself.)

Well, if one can't define a sentence, then one shouldn't name a wild card a "sentence". But how to name it instead?phspaelti wrote:Sentences are a well known nightmare. Again they are a linguistic notion, so there is no satisfactory solution. But the idea of excluding tabs strikes me as a good one. Here is today's attempt at a better definition of sentence:But of course it will still catch things that aren't sentences.Code: Select all
(?:["“„'‘‚]?[[:upper:]][^\t]+?[\.\?\!]["”‟'’‛]?)(?= |$)
Kind of tricky. Sigh.