Page 1 of 1

Finding lines with coloured text in it

Posted: 2014-04-12 22:41:12
by useeger
Hello together,
I am looking for a Power Find Pro search expression which finds in a very large document coherent text which has in it one or more of the following colours: blue, green, cyan, orange, purple, grey. Maybe there is also black text in it, maybe not. I want those lines which are preceded by a Tab or a Soft Return (preceeding Tab and Soft Return should not be found). The text ends with the first Soft Return or a Hard Return that occurs (these final returns should be found).
Afterwards the found text lines should be copied in a new document (but that is not the problem).
I made several attempts to build a search expresion but every time Nisus ended in the neverending spinning wheel (MacOS 10.9.2 on a MacBook Air 1,7 GHz i7, 8 GB RAM). So perhaps my attempts where too complicated. Can anybody help?
Greetings
Ulrich
Ramallah, West Bank

Re: Finding lines with coloured text in it

Posted: 2014-04-13 02:02:28
by phspaelti
Hello Ulrich,
I'm going to have to admit that, despite your careful description, I'm a bit confused by what you mean. Especially confusing to me is the term 'line'. Many people use line to mean characters delimited by the newline (or 'return' but let's not get into that controversy.) The newline is '\n'. And the Powerfind Pro '.' finds all characters except for the newline. That means it does find the 'soft-returns' and of course tabs too. So the following finds all lines in your document, but it will 'find lines' right across any 'soft-returns'.

Code: Select all

Find All '.+\n', 'Ea'
Now I get the impression that you are thinking of lines as characters not only delimited by newlines, but also by 'soft-returns' So in that case you would want to use the following:

Code: Select all

Find All '[^\u2028|\n]+(\u2028|\n)', 'Ea'
Just to clarify again. Both of these expressions will select everything in your document (except for empty lines, that is). But the first will find fewer 'lines'.

And then you write:
useeger wrote:I want those lines which are preceded by a Tab or a Soft Return (preceeding Tab and Soft Return should not be found).
Now I'm confused again. Do you mean lines beginning with a tab? Or are you treating tab as a line delimiter as well. Do you have lines that include tabs as characters, or are you using tabs only as a form of indentation? As long as the latter is true, you can adjust the previous find expression like this:

Code: Select all

Find All '[^\u2028|\n\t]+(\u2028|\n)', 'Ea'
Now comes the color issue. Let's say you want only lines that include some blue text on them. The above could be adjusted like this:

Code: Select all

Find All '[^\u2028|\n\t]*[^\u2028|\n\t]+[^\u2028|\n\t]*(\u2028|\n)', 'Eau'
And of course you would need to apply blue color to the middle [^\u2028|\n\t]+ part, but also make sure that the rest has 'any color' and no other styles applied. To get multiple colours you would need to repeat that middle part for each color and combine them with OR.

Now having explained all of this, I have to say that myself wouldn't try to do this this way. (I actually tried it out, and didn't work. The OR expression with different styles on the pieces is probably at issue.) Attribute sensitive find is just too 'finicky' for my taste.

Instead I would have to say that I would write a macro for this. Basically I would locate the color bits (create a color 'map') and then use the map to select the lines I want. This is unfortunately not really easy, but it strikes me as the only reliable approach. I could provide some assistance, but it would help if you could provide a short snippet that shows what you are trying to find.

Re: Finding lines with coloured text in it

Posted: 2014-04-13 02:50:24
by useeger
Thanks, philip,

the term

Code: Select all

[^\u2028\n\t]*[^\u2028\n\t]+[^\u2028\n\t]*(\u2028|\n)
finds exactly what I called a "line". (Note: I had to remove the | ("or") in your expression, because it is part of the text). The search expression works well and fast.

But if I apply to the middle term

Code: Select all

[^\u2028\n\t]+
the attribute "blue", I get my neverending spinning wheel again. (There exist lines with blue text in it).

But, anyway, it would be not the final solution, because I have many colours and I want to find not only blue text, but all occurences of the mentioned colours (there exist even some more colours, but I don't want them to be found).

Thanks for thinking about my problem!

Re: Finding lines with coloured text in it

Posted: 2014-04-13 03:08:30
by phspaelti
Hi again,
here is a macro that does what I was trying to say. At this point it just locates (should locate?) bits with the relevant colours. Try it and tell me how fast it finishes (if it does)
Find by Attributes.nwm
(17.36 KiB) Downloaded 867 times

Re: Finding lines with coloured text in it

Posted: 2014-04-13 03:20:00
by phspaelti
Anyhow assuming the other macro works, the following should (hopefully) select the lines you want.
UseegerFindByAttributes.nwm
(17.97 KiB) Downloaded 844 times

Re: Finding lines with coloured text in it

Posted: 2014-04-13 03:25:48
by useeger
It does exactly what I was looking for and it is fast.

You are great, thank you very much!!

Re: Finding lines with coloured text in it

Posted: 2014-04-13 03:50:11
by useeger
Tried to understand the code — no chance.

Therefore, can you make a second macro for me, which finds all "lines" which have a { (curved bracket) in it?

Re: Finding lines with coloured text in it

Posted: 2014-04-13 03:59:29
by phspaelti
useeger wrote:Tried to understand the code — no chance.

Therefore, can you make a second macro for me, which finds all "lines" which have a { (curved bracket) in it?
Well that isn't so difficult since you don't need attribute sensitive find. So Powerfind (Pro) should work.

Code: Select all

Find All '[^\u2028\n\t]*\{[^\u2028\n\t]*(\u2028|\n)', 'Ea'

Re: Finding lines with coloured text in it

Posted: 2014-04-13 04:15:42
by useeger
hmmm ... if I copy

Code: Select all

[^\u2028\n\t]*\{[^\u2028\n\t]*(\u2028|\n)
in my Find/Replace window, it needs very long to find the next line with { in it.

If a copy

Code: Select all

Find All '[^\u2028\n\t]*\{[^\u2028\n\t]*(\u2028|\n)', 'Ea'
in a macro document, the macro results in a neverending spinning wheel.

I am now out of office and will be back in about 4-5 hours.

Re: Finding lines with coloured text in it

Posted: 2014-04-13 04:43:42
by phspaelti
useeger wrote:hmmm ... if I copy

Code: Select all

[^\u2028\n\t]*\{[^\u2028\n\t]*(\u2028|\n)
in my Find/Replace window, it needs very long to find the next line with { in it.
Does it improve the speed if you change it to this?

Code: Select all

(?<=^|\t|\u2028)[^\u2028\n\t]*\{[^\u2028\n\t]*(\u2028|\n)
One general problem with Powerfind is that if you use an expression that starts with 'something*' the amount of searching needed to be done grows exponentially, at least if there is a lot of the something. Come to think of it that was the problem with the previous attribute sensitive search too. So starting the search from some "fixed" position should improve the search. If it still doesn't work let me know. The other method can be adapted too.

Re: Finding lines with coloured text in it

Posted: 2014-04-13 08:07:23
by useeger
phspaelti wrote:Does it improve the speed if you change it to this?

Code: Select all

(?<=^|\t|\u2028)[^\u2028\n\t]*\{[^\u2028\n\t]*(\u2028|\n)
Yes it improves the speed. But if the next occurence is more than 10 pages away, it still needs really a long time. I do not dare to try it in a macro. Nisus crashed today very often.

Re: Finding lines with coloured text in it

Posted: 2014-04-13 08:18:47
by phspaelti
Okay, well here is an adaptation of the other approach for the braces.
UseegerFindBraces.nwm
(16.89 KiB) Downloaded 838 times
If you look in the code you'll see that the first line says:

Code: Select all

$findExpression = '{'
So you can change it look for other things.

Re: Finding lines with coloured text in it

Posted: 2014-04-13 08:27:36
by useeger
Perfect! And really very fast!

Thank you very, very much. This saves me a lot of time and work.

Greetings from Ramallah

Ulrich