Page 1 of 1

A macro converting numbered texts to endnotes

Posted: 2019-02-25 17:48:58
by Nobumi Iyanaga
Hello,

I need a macro that would convert a plain text in which are inserted numbers, followed by a text with paragraphs preceded by numbers, to a text with endnotes. Here is how the current text is:

Lorem ipsum dolor sit amet1, consectetuer adipiscing elit2. Etiam lobortis facilisis sem3. ...
Nullam nec mi4 et neque pharetra sollicitudin5. ...
1 my note 1
another paragraph of my note 1
2 my note 2
3 my note 3
4 my note 4
5 my note 5
another paragraph of my note 5
...

Fortunately, there is no digit in the original text other than note references.

Thank you very much in advance for any insight.

Re: A macro converting numbered texts to endnotes

Posted: 2019-02-26 09:28:48
by phspaelti
Hello Nobumi,
The basic steps will be:
  1. Find the numbers in your text
  2. Use the information to copy the note text
  3. Create endnotes using the copied text
  4. Remove the text endnotes
The exact way to do this is driven by the commands of the Nisus macro language. So to create an endnote you'll need to use the following command:

Code: Select all

Note.insertEndnoteInTextAtIndex $text, $index, $noteText
As you can see, this command will require a text object, an index and the text of the endnote. The text object will be the document text, the index will be the index of the note reference, and the text of the endnote will be "subtext" of the document text that is in the range between the corresponding In note reference numbers. So all the necessary information will be known once you find the numbers. You can do that like this:

Code: Select all

$doc = Document.active
$text = $doc.text
$numSels = $text.find '\d+', 'Ea'
The macro find command will return (text) selections. In this case, since it's a find all command it will return an array of such selections. A selection, as we've discussed before, is a pair consisting of a text object and a range. The text object in this case should always be the document text and the range tells you where in the document the number is located.
You say that the only numbers in your document are the reference numbers. So if you have 5 notes, that should return 10 selections; 5 for the note references, and another 5 for the in-note references. You can use the former as insertion points for your endnotes and the latter to identify the endnote text bits. Split the array in two, which should now match up. Loop through one and use the other for the matching info.

I'll attach a macro that implements this, and works for the simple case that I tested it on.

Re: A macro converting numbered texts to endnotes

Posted: 2019-02-26 09:43:19
by martin
Aha Philip beat me to it! 🐌 But I was writing up a macro before I saw his post so I'll post my macro as well.

It also looks like Philip's macro has some errors. I tried it on Nobumi's sample text and it failed. That's probably because Philip's macro tries to be more efficient in exactly calculating all the notes upfront. That process is going to be faster, but more brittle.

Re: A macro converting numbered texts to endnotes

Posted: 2019-02-26 19:28:21
by phspaelti
martin wrote: 2019-02-26 09:43:19It also looks like Philip's macro has some errors. I tried it on Nobumi's sample text and it failed. That's probably because Philip's macro tries to be more efficient in exactly calculating all the notes upfront. That process is going to be faster, but more brittle.
I did only one quick test on Nobumi's text and it worked for me, but no question that the approach is brittle.

So here's another approach. This uses the fact that with Nisus' Find/Replace you can delete stuff and still get an array of selections/locations. But you still need to work backwards inserting the notes.

Here I gather the note texts first and then delete them. Then I find the note refs (and delete them) and finally insert the saved note texts at the correct locations.

Code: Select all

# Get the document
$doc = Document.active
$text = $doc.text

# Get the note texts and put them in an array
$noteSels = $text.find '^\d+ .+(?:\n[^\d].*)*', 'Ea'
$noteCount = $noteSels.count
$noteTexts = Array.newWithCount $noteCount + 1
foreach $noteSel in $noteSels
    $noteText = $noteSel.subtext
    $noteText.findAndReplace '(?<noteNum>\d+)', '', 'E¢'
    $noteTexts[$noteNum] = $noteText
end

# Delete the note texts from the document
$noteTextRange = Range.newWithLocationAndBound $noteSels.firstValue.location, $noteSels.lastValue.bound
$text.deleteInRange $noteTextRange

# Find the note references in the text
# Get their numbers and locations, and delete them
$noteRefNums = $text.find('\d+', 'Ea').arrayByMakingValuesDoCommand 'substring'
$noteRefLocs = $text.findAndReplace('\d+', '', 'Ea').arrayByMakingValuesDoCommand 'location'

# Add the appropriate endnote at each of the locations
foreach $i, $loc in reversed $noteRefLocs
    $noteNum = $noteRefNums[$i]
    Note.insertEndnoteInTextAtIndex $text, $loc, $noteTexts[$noteNum]
end

Re: A macro converting numbered texts to endnotes

Posted: 2019-02-28 22:14:41
by Nobumi Iyanaga
Thank you very much, Philip and Martin, for your macros.

I tested them on my problematic file. None of them worked the first time, but I could get Philip's last macro work with a slight modification.

I think the main problem was that the example that I gave in my first post was not good -- not sufficiently explained. I think the first macro by Philip assumes that all digits in the text are note references, either in the main text or in the notes (the sixth line has: "$numSels = $text.find '\d+', 'Ea':). But my file has many other digits either in the main text or in the notes... Anyway, it stops at the tenth line: "$noteRefInNoteSels = $numSels.subarrayAtIndex($noteCount,$noteCount)", with the error "The given index (864) is out of bounds for the array (count 863)".

Martin's macro runs but ends with this error message: "Converted 0 notes but had 539 errors! First Error: Could not find a matching note for number: 1" I don't understand the number "539" -- and I don't understand the meaning of the find command in line 27: "Find Next @String<(?<!\d|^)\d+>, 'E¢-W'" -- but anyway, it failed on my file.

Finally, I could get the last macro by Philip by modifying the find command "$noteSels = $text.find '^\d+ .+(?:\n[^\d].*)*', 'Ea'" to "$noteSels = $text.find '^\d+\s*.+(?:\n[^\d].*)*', 'Ea'", because each note reference in note was followed not by a space but by a tab...

Here is the macro which works on my file:

Code: Select all

# Get the document
$doc = Document.active
# operate on a copy of the document
$doc = $doc.copy
$doc.clearAndDisableUndoHistory
$text = $doc.text

# Get the note texts and put them in an array
# $noteSels = $text.find '^\d+ .+(?:\n[^\d].*)*', 'Ea'
$noteSels = $text.find '^\d+\s*.+(?:\n[^\d].*)*', 'Ea'
$noteCount = $noteSels.count
$noteTexts = Array.newWithCount $noteCount + 1
foreach $noteSel in $noteSels
    $noteText = $noteSel.subtext
    $noteText.findAndReplace '(?<noteNum>\d+)', '', 'E¢'
    $noteTexts[$noteNum] = $noteText
end

# Delete the note texts from the document
$noteTextRange = Range.newWithLocationAndBound $noteSels.firstValue.location, $noteSels.lastValue.bound
$text.deleteInRange $noteTextRange

# Find the note references in the text
# Get their numbers and locations, and delete them
$noteRefNums = $text.find('\d+', 'Ea').arrayByMakingValuesDoCommand 'substring'
$noteRefLocs = $text.findAndReplace('\d+', '', 'Ea').arrayByMakingValuesDoCommand 'location'

# Add the appropriate endnote at each of the locations
foreach $i, $loc in reversed $noteRefLocs
    $noteNum = $noteRefNums[$i]
    Note.insertEndnoteInTextAtIndex $text, $loc, $noteTexts[$noteNum]
end
Anyway, thank you so much for your kindness.

Re: A macro converting numbered texts to endnotes

Posted: 2019-03-01 01:43:51
by phspaelti
Hello Nobumi,
I'm glad you got it to work. So I guess it was worth providing more than one macro! Also it shows that with macros it's really best not just to hope someone can write one for you, but to at least have some idea how they work so you can get the result you want.
Nobumi Iyanaga wrote: 2019-02-28 22:14:41 I don't understand the meaning of the find command in line 27:

Code: Select all

"Find Next @String<(?<!\d|^)\d+>, 'E¢-W'" 
Yeah, I guess that's kind of cryptic. This is of course one of the crucial bits of Martin's macro, since it finds the Note reference numbers in the text.

First the @String is the newfangled way to write string literals in Nisus Macro language. It means that the following stuff in whatever brackets is to be taken as a string. The crucial bit is the whatever brackets. Basically the character right after the @String forms the opening bracket and the closing bracket is the next matching same character, except that for some characters like "<" the matching bracket will be the opposite facing one. So here the closing bracket is the following ">". So this makes the find string: '(?<!\d|^)\d+'. The easy part is the '\d+', which finds the number. So what's the '(?<!\d|^)' ? Here the complicated parenthesis is '(?< … )' This is the preceded by parenthesis. inside that is the following '!\d|^' The pipe '|' is the OR symbol, and the exclamation point '!' is the NOT symbol. So the whole thing reads as: "Find a number that is preceded by either a non-digit or a paragraph start."

Finally the last special thing is the '¢' option. That allows the macro to later use the found bits from the expression as variables. You'll see in the code that a few lines later, Martin uses a variable '$0'. That refers to the found bit of this expression, i.e., the putative footnote/endnote reference number.

Thus ends the macro lesson for today. :wink:

Re: A macro converting numbered texts to endnotes

Posted: 2019-03-01 06:55:08
by Nobumi Iyanaga
Hello Philip,
Thank you for your reply and your lesson. Thanks to it, I could understand the cryptic find pattern. I tried the formula '(?<!\d|^)\d+' on my document and could find that it matches all the reference numbers in the notes text. But then I don't understand the find pattern at the line 33: '^$0(.+$0\n)+'; if '$0' stands for the found reference number, the first one should be '^1(.+1\n)+'. But such a formula does not match any string... Martin comments for this part of the macro "# find the corresponding note text". If I understand well, that should match the first note text, which may or not contain several lines... So anyway, I am lost.

On the other hand, I realized that in my first post, I stated that 'Fortunately, there is no digit in the original text other than note references'. This was not true at all! I am very sorry to induce you to an error...

Anyway, I could get your last macro work, so I am satisfied, and I thank you very much again.

Re: A macro converting numbered texts to endnotes

Posted: 2019-03-01 08:21:38
by phspaelti
Nobumi Iyanaga wrote: 2019-03-01 06:55:08But then I don't understand the find pattern at the line 33: '^$0(.+$0\n)+'; if '$0' stands for the found reference number, the first one should be '^1(.+1\n)+'. But such a formula does not match any string... Martin comments for this part of the macro "# find the corresponding note text". If I understand well, that should match the first note text, which may or not contain several lines... So anyway, I am lost.
No, no. You understand it all too well. I haven't studied Martin's macro, but the expression you give will match exactly only Notes which have the same number at the beginning of the note and then at the end of the line, and if they have multiple lines they must have that same number at the end of every line. Just consider your sample:
Nobumi Iyanaga wrote: 2019-02-25 17:48:58Lorem ipsum dolor sit amet1, consectetuer adipiscing elit2. Etiam lobortis facilisis sem3. ...
Nullam nec mi4 et neque pharetra sollicitudin5. ...
1 my note 1
another paragraph of my note 1
2 my note 2
3 my note 3
4 my note 4
5 my note 5
another paragraph of my note 5
...
So it seems Martin was taking this pattern quite literally :o

Re: A macro converting numbered texts to endnotes

Posted: 2019-03-01 20:57:32
by Nobumi Iyanaga
Hello Philip,

Thank you for your reply.

> So it seems Martin was taking this pattern quite literally.

Ah, now I understand...!

I must be more careful when I give an example text!

Re: A macro converting numbered texts to endnotes

Posted: 2019-03-06 15:51:41
by martin
I did indeed take the example text exactly literally :o My macro code expected the note text to be bookended by reference numbers. In thinking about it longer, I can see why this was an unrealistic assumption. But it's good advice that whenever a person requests a macro they should provide a real sample document, even just a small one, so the macro author can see what is actually required.

Thanks to Philip for coming through, both with a working macro and for explaining the code in my own misinformed macro :D
phspaelti wrote: 2019-03-01 01:43:51Finally the last special thing is the '¢' option. That allows the macro to later use the found bits from the expression as variables. You'll see in the code that a few lines later, Martin uses a variable '$0'. That refers to the found bit of this expression, i.e., the putative footnote/endnote reference number.
This is one of my favorite macro features. It's great to be able to easily refer to matches using automatically created macro variables. It works very nicely with named captures:

Code: Select all

Find Next @String[(?<digits>\d+)\s+(?<item>\w+)], 'E¢'
Prompt $digits, $item
I should say that the ¢ option is like a little brother to the similar $ option, eg:

Code: Select all

Find Next @String[(?<digits>\d+)\s+(?<item>\w+)], 'E$'
The $ option captures the matches as formatted text, while the ¢ option captures the matches as strings (plain text). This is slightly more efficient.