How to display Character Count without multiple clicks

Everything related to our flagship word processor.
User avatar
phspaelti
Posts: 912
Joined: 2007-02-07 00:58:12
Location: Japan

Re: How to display Character Count without multiple clicks

Post by phspaelti » 2015-12-06 16:31:07

Actually a bit more perusing the result from the Nisus Macro Language Reference clearly shows that the disagreement is indeed due to the treatment of tables. Each cell in each table is its own text object, and when Nisus searches through all these objects the results come in a highly jumbled order. This order is a type of "hash" ordering. That is the text objects of the document are not in document order, but rather an internal order that allows Nisus to work faster.

If it were truly important to get this "exactly right" you would have to first order all the words by document order, i.e., in such a way that all the words in a table were ordered between the last word before the table and the first word after the table. Depending on your needs, you might have to do the same for footnotes as well.
philip

JapanRich
Posts: 17
Joined: 2007-09-30 21:25:42

Re: How to display Character Count without multiple clicks

Post by JapanRich » 2015-12-06 23:32:04

Would it be getting greedy to ask if this Document Statistics macro could be extended to do Word/Character Count in all open documents?
It's fantastic that the macro counts now text within tables!

Japan Rich

User avatar
phspaelti
Posts: 912
Joined: 2007-02-07 00:58:12
Location: Japan

Re: How to display Character Count without multiple clicks

Post by phspaelti » 2015-12-07 00:37:23

I'm too lazy right now to work out any better interface. This piles all the info for all the files into a table in a new document.
Attachments
Document Statistics (All Open Docs).nwm
(4.61 KiB) Downloaded 49 times
philip

Þorvarður
Posts: 248
Joined: 2012-12-19 05:02:52

Re: How to display Character Count without multiple clicks

Post by Þorvarður » 2015-12-07 06:04:49

phspaelti wrote:The deviation where the macro selects 3 words instead of 2 is actually a "bug".
I'm very pleased to say, that with your help I think I have now got exactly what I wanted.
I discovered that my "Lorem Ipsum" dummy text (the beginning of Conrad's Heart of Darkness), which I have always used for testing, contained several Acute Accents #180 [00B4], and they were, so it seems, responsible for the deviation I mentioned. The text I used contained, for example, "other´s" instead of "other's". The Statistics Palette counts "other´s" as 3 words!

I am glad that I asked, because otherwise I would never have noticed this.
I added a few lines to your macro, and now it works perfectly for me.

Thanks again, Philip.

Code: Select all

# This macro asks how many words should be selected from the beginning of the frontmost document
$my_choice = Prompt Options "How many words should I select from the beginning of the document?", "", "OK", "12000", "6500", "1500", "800", "650", "300", "150"

# Another way to ask the user how many words (s)he wants
#$my_choice = Prompt Input ‚How many words do you want from the beginning of the document?‘, ‚Enter the number you want…‘, ‚OK‘, ‚6500‘

$doc = Document.active
$words = $doc.text.findAll @Text<\w+>, 'E'
if $words.count > $my_choice
	$words6500 = $words.subarrayInRange Range.new(0,$my_choice)
	$doc.setSelection $words6500
else
	Prompt 'This document has only ' & $words.count & ' words'
End

$loc1 = 1
Select End
$loc2 = Selection Location
$lengd = $loc2-$loc1
Set selection $loc1,$lengd
Red
Select End

Þorvarður
Posts: 248
Joined: 2012-12-19 05:02:52

Re: How to display Character Count without multiple clicks

Post by Þorvarður » 2015-12-07 07:17:21

Regarding the macro [color=#0040FF]Document Statistics (All Open Docs) [/color]phspaelti wrote:I'm too lazy right now to work out any better interface.
This is absolutely wonderful, Philip.

These two lines at the end make it perhaps a bit easier for the eye.

Table:Fit to Contents
Table:Align Cells:Center

JapanRich
Posts: 17
Joined: 2007-09-30 21:25:42

Re: How to display Character Count without multiple clicks

Post by JapanRich » 2015-12-09 04:32:20

Many thanks! The table output works nicely with the two added lines.

I realize that what I meant was to SUM the total words in all open files.

It was easy enough to copy/paste into an Excel sheet to add up the output of words from all open files, but I wonder if a Nisus macro could also add up the words counted from multiple open files.

Apologies in advance, but I also realized a different problem that would be wonderful to have solved:

Is it possible to display a count of only English words (single-byte characters) when a document contains a table or tables with separate columns for both English and Japanese (double-byte characters)?

At the moment I need to select by hand the English column to get a count.
This can get laborious when a document has multiple tables and the adding to a total has to be done by hand.

I realize I have already asked way too much, but perhaps there are other people who can also benefit!

Japan Rich

User avatar
martin
Official Nisus Person
Posts: 4261
Joined: 2002-07-11 17:14:10
Location: San Diego, CA
Contact:

Re: How to display Character Count without multiple clicks

Post by martin » 2015-12-09 12:54:44

JapanRich wrote:Is it possible to display a count of only English words (single-byte characters) when a document contains a table or tables with separate columns for both English and Japanese (double-byte characters)?

At the moment I need to select by hand the English column to get a count.
Perhaps Philip will oblige, since he is so very generous in sharing his time and macro talents, but here's another idea: use two macros to accomplish this task. The first macro would select just Japanese or English text, and then Philip's macro will give you stats for the selection.

I don't know much about Japanese, but here's an attempt to provide a macro that selects just Japanese or English text. The regular expressions (regex) might need some adjusting. I tried to include all the Japanese character ranges (eg: Hiragana, Katakana, Han, etc) but I don't doubt that I missed something. The macro also treats some neutral characters (like numbers) next to Japanese text as though they were also Japanese.

I should also say there's no easy technical solution to select just English, since Latin characters form the basis for so many languages. There's no sure way for a macro to know the language of a word comprised of Latin characters, eg: "hello" and "hallo" use nearly the same characters but come from different languages. This could be resolved if your documents consistently have correctly applied language attributes, but instead this macro just gives the option to select all Japanese or all Non-Japanese.

I hope that helps!
Attachments
Select Japanese or Inverse.nwm
(3.71 KiB) Downloaded 53 times

User avatar
phspaelti
Posts: 912
Joined: 2007-02-07 00:58:12
Location: Japan

Re: How to display Character Count without multiple clicks

Post by phspaelti » 2015-12-10 07:20:38

Hello Martin,
I would have to say that this macro wouldn't make me very happy with what it selects. The main problem is that it is missing character U+30FC. (I also find the attempt to capture "nearby punctuation" not very successful.)

But this story with U+30FC is something I really don't get. Why is this character not included in the \p{Katakana} wildcard? Who decided this? Is this part of the Unicode specification? The character seems to be inside the Katakana block, which makes sense, since that it is its only valid purpose.

NWP has the script blocks and for "Katakana" it tries to make up for the situation by providing (?:\p{Katakana}|(?<=\p{Katakana})\u30FC) which works well enough. Meanwhile for "Hiragana" it provides (?:\p{Hiragana}|(?<=\p{Hiragana})\u30FC) which doesn't really seem justified, since U+30FC isn't used for Hiragana. (Except maybe in unusual contexts like Manga. But such marginal contexts could be adequately covered by more technical solutions.) Of course I wouldn't be surprised if many Japanese use U+30FC for some idiotic purposes such as instead of Western hyphens, or for decoration, etc.

As it stands the \p{Katakana} wildcard is completely inadequate to capture actual Katakana.

Anyhow here is the macro that I use to select Japanese:
Find Japanese.nwm
(21.52 KiB) Downloaded 43 times
It's just one line, but has lot of lines of explanation.
philip

User avatar
martin
Official Nisus Person
Posts: 4261
Joined: 2002-07-11 17:14:10
Location: San Diego, CA
Contact:

Re: How to display Character Count without multiple clicks

Post by martin » 2015-12-10 15:25:18

Thanks for sharing your Japanese text selection macro Philip. I'm sure your macro's Japanese character coverage is much better than my own macro's, as I have no knowledge of the language. I've incorporated your character ranges into a modified version of my original macro, in case anyone finds it useful.
Attachments
Select All Japanese or Inverse.nwm
(3.72 KiB) Downloaded 43 times

Post Reply