Reply to topic  [ 13 posts ] 
Finding lines with coloured text in it 
Author Message
User avatar

Joined: 2004-06-28 00:03:01
Posts: 72
Location: Germany
Hello together,
I am looking for a Power Find Pro search expression which finds in a very large document coherent text which has in it one or more of the following colours: blue, green, cyan, orange, purple, grey. Maybe there is also black text in it, maybe not. I want those lines which are preceded by a Tab or a Soft Return (preceeding Tab and Soft Return should not be found). The text ends with the first Soft Return or a Hard Return that occurs (these final returns should be found).
Afterwards the found text lines should be copied in a new document (but that is not the problem).
I made several attempts to build a search expresion but every time Nisus ended in the neverending spinning wheel (MacOS 10.9.2 on a MacBook Air 1,7 GHz i7, 8 GB RAM). So perhaps my attempts where too complicated. Can anybody help?
Greetings
Ulrich
Ramallah, West Bank


2014-04-12 22:41:12
Profile
User avatar

Joined: 2007-02-07 00:58:12
Posts: 876
Location: Japan
Hello Ulrich,
I'm going to have to admit that, despite your careful description, I'm a bit confused by what you mean. Especially confusing to me is the term 'line'. Many people use line to mean characters delimited by the newline (or 'return' but let's not get into that controversy.) The newline is '\n'. And the Powerfind Pro '.' finds all characters except for the newline. That means it does find the 'soft-returns' and of course tabs too. So the following finds all lines in your document, but it will 'find lines' right across any 'soft-returns'.
Code:
Find All '.+\n', 'Ea'


Now I get the impression that you are thinking of lines as characters not only delimited by newlines, but also by 'soft-returns' So in that case you would want to use the following:
Code:
Find All '[^\u2028|\n]+(\u2028|\n)', 'Ea'


Just to clarify again. Both of these expressions will select everything in your document (except for empty lines, that is). But the first will find fewer 'lines'.

And then you write:
useeger wrote:
I want those lines which are preceded by a Tab or a Soft Return (preceeding Tab and Soft Return should not be found).

Now I'm confused again. Do you mean lines beginning with a tab? Or are you treating tab as a line delimiter as well. Do you have lines that include tabs as characters, or are you using tabs only as a form of indentation? As long as the latter is true, you can adjust the previous find expression like this:
Code:
Find All '[^\u2028|\n\t]+(\u2028|\n)', 'Ea'


Now comes the color issue. Let's say you want only lines that include some blue text on them. The above could be adjusted like this:
Code:
Find All '[^\u2028|\n\t]*[^\u2028|\n\t]+[^\u2028|\n\t]*(\u2028|\n)', 'Eau'


And of course you would need to apply blue color to the middle [^\u2028|\n\t]+ part, but also make sure that the rest has 'any color' and no other styles applied. To get multiple colours you would need to repeat that middle part for each color and combine them with OR.

Now having explained all of this, I have to say that myself wouldn't try to do this this way. (I actually tried it out, and didn't work. The OR expression with different styles on the pieces is probably at issue.) Attribute sensitive find is just too 'finicky' for my taste.

Instead I would have to say that I would write a macro for this. Basically I would locate the color bits (create a color 'map') and then use the map to select the lines I want. This is unfortunately not really easy, but it strikes me as the only reliable approach. I could provide some assistance, but it would help if you could provide a short snippet that shows what you are trying to find.

_________________
philip


2014-04-13 02:02:28
Profile
User avatar

Joined: 2004-06-28 00:03:01
Posts: 72
Location: Germany
Thanks, philip,

the term

Code:
[^\u2028\n\t]*[^\u2028\n\t]+[^\u2028\n\t]*(\u2028|\n)


finds exactly what I called a "line". (Note: I had to remove the | ("or") in your expression, because it is part of the text). The search expression works well and fast.

But if I apply to the middle term

Code:
[^\u2028\n\t]+


the attribute "blue", I get my neverending spinning wheel again. (There exist lines with blue text in it).

But, anyway, it would be not the final solution, because I have many colours and I want to find not only blue text, but all occurences of the mentioned colours (there exist even some more colours, but I don't want them to be found).

Thanks for thinking about my problem!


2014-04-13 02:50:24
Profile
User avatar

Joined: 2007-02-07 00:58:12
Posts: 876
Location: Japan
Hi again,
here is a macro that does what I was trying to say. At this point it just locates (should locate?) bits with the relevant colours. Try it and tell me how fast it finishes (if it does)

Attachment:
Find by Attributes.nwm [17.36 KiB]
Downloaded 170 times

_________________
philip


2014-04-13 03:08:30
Profile
User avatar

Joined: 2007-02-07 00:58:12
Posts: 876
Location: Japan
Anyhow assuming the other macro works, the following should (hopefully) select the lines you want.

Attachment:
UseegerFindByAttributes.nwm [17.97 KiB]
Downloaded 171 times

_________________
philip


2014-04-13 03:20:00
Profile
User avatar

Joined: 2004-06-28 00:03:01
Posts: 72
Location: Germany
It does exactly what I was looking for and it is fast.

You are great, thank you very much!!


2014-04-13 03:25:48
Profile
User avatar

Joined: 2004-06-28 00:03:01
Posts: 72
Location: Germany
Tried to understand the code — no chance.

Therefore, can you make a second macro for me, which finds all "lines" which have a { (curved bracket) in it?


2014-04-13 03:50:11
Profile
User avatar

Joined: 2007-02-07 00:58:12
Posts: 876
Location: Japan
useeger wrote:
Tried to understand the code — no chance.

Therefore, can you make a second macro for me, which finds all "lines" which have a { (curved bracket) in it?


Well that isn't so difficult since you don't need attribute sensitive find. So Powerfind (Pro) should work.

Code:
Find All '[^\u2028\n\t]*\{[^\u2028\n\t]*(\u2028|\n)', 'Ea'

_________________
philip


2014-04-13 03:59:29
Profile
User avatar

Joined: 2004-06-28 00:03:01
Posts: 72
Location: Germany
hmmm ... if I copy

Code:
[^\u2028\n\t]*\{[^\u2028\n\t]*(\u2028|\n)


in my Find/Replace window, it needs very long to find the next line with { in it.

If a copy

Code:
Find All '[^\u2028\n\t]*\{[^\u2028\n\t]*(\u2028|\n)', 'Ea'


in a macro document, the macro results in a neverending spinning wheel.

I am now out of office and will be back in about 4-5 hours.


2014-04-13 04:15:42
Profile
User avatar

Joined: 2007-02-07 00:58:12
Posts: 876
Location: Japan
useeger wrote:
hmmm ... if I copy
Code:
[^\u2028\n\t]*\{[^\u2028\n\t]*(\u2028|\n)

in my Find/Replace window, it needs very long to find the next line with { in it.


Does it improve the speed if you change it to this?

Code:
(?<=^|\t|\u2028)[^\u2028\n\t]*\{[^\u2028\n\t]*(\u2028|\n)


One general problem with Powerfind is that if you use an expression that starts with 'something*' the amount of searching needed to be done grows exponentially, at least if there is a lot of the something. Come to think of it that was the problem with the previous attribute sensitive search too. So starting the search from some "fixed" position should improve the search. If it still doesn't work let me know. The other method can be adapted too.

_________________
philip


2014-04-13 04:43:42
Profile
User avatar

Joined: 2004-06-28 00:03:01
Posts: 72
Location: Germany
phspaelti wrote:
Does it improve the speed if you change it to this?

Code:
(?<=^|\t|\u2028)[^\u2028\n\t]*\{[^\u2028\n\t]*(\u2028|\n)




Yes it improves the speed. But if the next occurence is more than 10 pages away, it still needs really a long time. I do not dare to try it in a macro. Nisus crashed today very often.


2014-04-13 08:07:23
Profile
User avatar

Joined: 2007-02-07 00:58:12
Posts: 876
Location: Japan
Okay, well here is an adaptation of the other approach for the braces.
Attachment:
UseegerFindBraces.nwm [16.89 KiB]
Downloaded 166 times


If you look in the code you'll see that the first line says:
Code:
$findExpression = '{'

So you can change it look for other things.

_________________
philip


2014-04-13 08:18:47
Profile
User avatar

Joined: 2004-06-28 00:03:01
Posts: 72
Location: Germany
Perfect! And really very fast!

Thank you very, very much. This saves me a lot of time and work.

Greetings from Ramallah

Ulrich


2014-04-13 08:27:36
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 13 posts ] 

Who is online

Users browsing this forum: Bing [Bot] and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software