Actually what you will find is that the deviation is
not proportional. The deviation where the macro selects 3 words instead of 2 is actually a "bug". Note that my original macro says
$word[6500]. Since arrays are indexed from 0 this actually means that the macro should select 6501 words, not 6500. So as written the macro will always select one "word" extra. But any other deviations that you will find have to do with issues of the definition of "word", but also other technical differences which I am not even entirely sure about.
Consider the following example:
The above file has 30 completely regular paragraphs with 5 * 100 word sentences of the form "Text text … text." for a word count total of 15,000. If you use the built-in document statistics you might find a word count of 14,910! If you try the "Select 6500 first words" macro then you should see that it selects 13 full paragraphs plus the first word of the 14th. The built-in stats report "6,462 words and 13 paragraphs", while my "Document Statistics" macro reports "6,501 words and 14 paragraphs".
Finally as another test, let's try the Nisus Macro Language Reference.
The "Select 6500 first words" macro will select all the way up to:
Set Fixed Line Height points v2.0.7
Changes the paragraph’s line height to
in the section "Commands > Formatting Text".
The built-in stats will report "5,675 words 533 paragraphs", the "Document Statistics" macro "6,430 words 647 paragraphs". Obviously the built-in stats live in a different universe, but why the disagreement with the "Document Statistics" macro? Who knows? The file we're checking contains many tables, and other types of 'broken-up' text as well as many 'strange' words. If you want to know exactly what's going on you can try the following modified version of the "Select 6500 first words" macro:
Code: Select all
$doc = Document.active
$words = $doc.text.findAll @Text<\w+>, 'E'
if $words.count > 6500
$words6500 = $words.subarrayInRange Range.new(0,6500)
$doc.setSelection $words6500
else
Prompt 'This document has only ' & $words.count & ' words'
end
This macro will select the "words" individually, and you see that the built-in stats now comes much closer. (For me it shows 6,499 words). One thing that I find baffling is that it selects some of the content of the table in the section "Indexing" which is beyond the main body text that has been selected. Don't understand this.
At this point I have to say that this is beyond the point that I care about.