Reply to topic  [ 5 posts ] 
extracting numerical data from a text file 
Author Message

Joined: 2008-12-28 01:14:28
Posts: 1
Hi, woud there be a simple way of selecting the numbers in this data at the far right of each line (i.e. the 45.20, 128.39, 200.00 etc - and being able to output those numbers to a new file or to reformat the text so those numbers appear in their own column whose sum could be totalled? (I'd be doing this on much larger, but similarlly structured files all extracted from PDF's.
Thanks for any suggestions for solving this!

11/07/2008 AMAZON.COM AMZN.COM/BILL WA 45.20
11/07/2008 OTHER WORLD COMPUTINWOODSTOCK IL 128.39
11/06/2008 DIRECT RELIEF INTL OGOLETA CA 200.00
11/04/2008 COSTCO GAS #00479 94CULVER CITY CA 36.93
11/04/2008 COSTCO WHSE #00479 9LOS ANGELES CA 262.88


2008-12-28 01:40:08
Profile

Joined: 2007-03-03 09:55:06
Posts: 494
Location: Europe
Hi,
This macro extracts the data you need and pastes them into a new document. (Note: shamelessly adapted from a macro of Martin's).
Code:
# gather all words
Find All '(\d+\.\d+$)', 'E'
$doc = Document.active
$sels = $doc.textSelections

# create new document with all words
New
ForEach $sel in $sels
   $word = $sel.subtext
   Type Text $word
End

And this converts the text into a table and puts your data in a column.
Code:
 Find and Replace '(\s)(\d+\.\d+$)', '\t\2', 'Ea'
Select All

Convert to Table


Any trouble let me know.

Greetings, Henry.


2008-12-28 02:40:31
Profile

Joined: 2007-03-03 09:55:06
Posts: 494
Location: Europe
Oops… there was an error in the first macro, sorry. This should work:
Code:
# gather all words
Find All '(\d+\.\d+\n)', 'E'
$doc = Document.active
$sels = $doc.textSelections

# create new document with all words
New
ForEach $sel in $sels
   $word = $sel.subtext
   Type Text $word
End

Best Regards, Henry.


2008-12-28 03:45:11
Profile

Joined: 2008-05-17 04:02:32
Posts: 400
For that purpose, you don't need a foreach loop as NW Pro supports non-contiguous selections. The following is sufficient.
Code:
Find All '\d+\.\d+(?:\n|$)', 'E'
Copy
New
Paste
The find expression contains $ in order to get the last data even if it is not followed by a newline char.


2008-12-28 05:43:21
Profile

Joined: 2007-03-03 09:55:06
Posts: 494
Location: Europe
That's what happens when I think to myself, "it can't be that easy, can it?"

Thank you
Henry


2008-12-28 06:47:13
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 5 posts ] 

Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software