Well the overall task is more or less the same we discussed once in this thread:
http://www.nisus.com/forum/viewtopic.php?f=17&t=5329
The way I do this is like this.
First I start with the Find/Replace dialog. I create a find expression which matches the parts of the data and checks for consistency. In your case the overall structure is:
Code: Select all
$$ [Book abbreviation] [Chapter number] : [Verse number] <new line> [paragraph]
So the find expression will look like this:
Code: Select all
Find all '\$\$[A-Z][a-z]+ \d+\:\d+\n.+', 'Ea-i'
I use the Find box to create this kind of expression, while testing it. When it works and selects everything I want, then I macroize it.
The same expression macroized can be used in a macro like this:
Code: Select all
$doc = Document.active
$doc.text.findAll '\$\$[A-Z][a-z]+ \d+\:\d+\n.+', 'Ea-i'
The advantage of using this format is that we can 'catch' the result in an array variable.
Code: Select all
$doc = Document.active
$verses = $doc.text.findAll '\$\$[A-Z][a-z]+ \d+\:\d+\n.+', 'Ea-i'
This
$verses variable is an array of text selections. So we can easily access its subtexts, ranges, etc. And since it's an array we can loop through and process the verses one by one, using a
foreach loop.
Now comes the tricky 'magic' part. We will need all those book/chapter/verse bits, and we'll need to check the book and chapter to see when we have a new book or chapter. So
$verse is a text selection, and
$verse.subtext is a text object that contains (only) the text of the verse. And we know that it has the format specified by the expression we created earlier. So
Code: Select all
$verse.subtext.find '\$\$[A-Z][a-z]+ \d+\:\d+\n.+', 'E-i'
will work and locate the info we need. (NB: Note that the option here no longer contains 'a'. There is no "Find all", since there is only one instance.)
To get the info into variables so we can work with them we can use the special '$' option. This will pack any captured bits into variables. Captures are done with
( … ), and will be numbered from left to right
$1,
$2, … , but I like to work with 'named' captures. So I would use this:
Code: Select all
$verse.subtext.find '\$\$(?<book>[A-Z][a-z]+) (?<chap>\d+)\:(?<ver>\d+)\n(?<verseText>.+)', 'E-i'
With that inside the loop, Nisus will extract those four bits of info from each verse in turn, and place them into the variables
$book,
$chap,
$ver, and
$verseText. You can then process them further, and reassemble any way you like. For example to discover if you have a new chapter, you carry a variable
$lastChapter and then check whether the extracted
$chap is the still the same. If it's different, you have a new chapter so you print it out. Otherwise you just ignore it.
I've added a macro which combines this all together below.