Page 1 of 1

Macro: combine paragraphs based on pilcrows

Posted: 2013-10-14 20:22:42
by NisusUser
Attached is a sample file that has text like this:

Code: Select all

$$ Ro 1:1
¶ Words are here
$$ Ro 1:2
More words are here
$$ Ro 1:3
¶ Still more words here
$$ Ro 2:1
¶  Chapter 2 words start here
$$ Ro 2:2
More chapter 2 words here.
I would like to make a macro that does the following things:

(*) places "CHAPTER #" above each new chapter (i.e. above the $$ Ro 1:1 verse, above the $$ Ro 2:1 verse, etc.), where "#" is the chapter number (number before colon; can be up to three digits long). This paragraph style I'd name "EC-Chapter Titles".
(*) replaces each pilcrow with a hard return and indent, thus starting a new paragraph there with indent
(*) removes the other hard returns, but leaves a space after the end of each verse
(*) removes the $$ Ro #:# and preserves only the verse number (digit(s) after colon), making that number bold

Some of these things I can do, e.g., replace pilcrows with hard returns. But much of it is over my head.

Thanks for ideas to help me plan this. I'm not even sure I've listed the tasks in the right order so one task doesn't mess up another, but I tried to.

Re: Macro: combine paragraphs based on pilcrows

Posted: 2013-10-14 22:48:24
by phspaelti
Well the overall task is more or less the same we discussed once in this thread:

The way I do this is like this.

First I start with the Find/Replace dialog. I create a find expression which matches the parts of the data and checks for consistency. In your case the overall structure is:

Code: Select all

$$ [Book abbreviation] [Chapter number] : [Verse number] <new line> [paragraph]
So the find expression will look like this:

Code: Select all

Find all '\$\$[A-Z][a-z]+ \d+\:\d+\n.+', 'Ea-i'
I use the Find box to create this kind of expression, while testing it. When it works and selects everything I want, then I macroize it.

The same expression macroized can be used in a macro like this:

Code: Select all

$doc =
$doc.text.findAll '\$\$[A-Z][a-z]+ \d+\:\d+\n.+', 'Ea-i'
The advantage of using this format is that we can 'catch' the result in an array variable.

Code: Select all

$doc =
$verses = $doc.text.findAll '\$\$[A-Z][a-z]+ \d+\:\d+\n.+', 'Ea-i'
This $verses variable is an array of text selections. So we can easily access its subtexts, ranges, etc. And since it's an array we can loop through and process the verses one by one, using a foreach loop.

Code: Select all

foreach $verse in $verses
Now comes the tricky 'magic' part. We will need all those book/chapter/verse bits, and we'll need to check the book and chapter to see when we have a new book or chapter. So $verse is a text selection, and $verse.subtext is a text object that contains (only) the text of the verse. And we know that it has the format specified by the expression we created earlier. So

Code: Select all

$verse.subtext.find '\$\$[A-Z][a-z]+ \d+\:\d+\n.+', 'E-i'
will work and locate the info we need. (NB: Note that the option here no longer contains 'a'. There is no "Find all", since there is only one instance.)
To get the info into variables so we can work with them we can use the special '$' option. This will pack any captured bits into variables. Captures are done with ( … ), and will be numbered from left to right $1, $2, … , but I like to work with 'named' captures. So I would use this:

Code: Select all

$verse.subtext.find '\$\$(?<book>[A-Z][a-z]+) (?<chap>\d+)\:(?<ver>\d+)\n(?<verseText>.+)', 'E-i'
With that inside the loop, Nisus will extract those four bits of info from each verse in turn, and place them into the variables $book, $chap, $ver, and $verseText. You can then process them further, and reassemble any way you like. For example to discover if you have a new chapter, you carry a variable $lastChapter and then check whether the extracted $chap is the still the same. If it's different, you have a new chapter so you print it out. Otherwise you just ignore it.

I've added a macro which combines this all together below.
NisusUser Reformat Verses.nwm
(18.62 KiB) Downloaded 224 times