Getting the number of characters of a found text string
Getting the number of characters of a found text string
If a macro finds a textstring, and you need to know the number of characters of that string, what is the easiest way to obtain it?
- martin
- Official Nisus Person
- Posts: 5228
- Joined: 2002-07-11 17:14:10
- Location: San Diego, CA
- Contact:
Re: Getting the number of characters of a found text string
If you just want the number of characters in the active selection, this will do:
If you want the number of characters in a text/string variable, you can use:
One note: both of these do not return a strict character count. From the macro reference:
Code: Select all
$selection = TextSelection.active
$characterCount = $selection.length
Code: Select all
$text = "whatever"
$characterCount = $text.length
Note: the return value is actually the number of 16 bit (2 byte) pieces required to represent the text in the UTF-16 encoding. In other words, characters whose code point is greater than U+FFFF (those that require surrogate pairs) will count as more than a single character. For example, the musical double-flat symbol (U+1D12B) counts as two characters. This is uncommon and does not affect most letters used by most languages (eg: English, Hebrew, Arabic, Japanese, etc).
Re: Getting the number of characters of a found text string
I see how it works in principle, thank you. But I am still not sure what is most economical to do the following:
Search the next string of Chinese characters
Select as many (English) before the string as characters have been found in it. (To put them afterwards into italics or the like).
I can see that an ordinary search with \p{Han} is doing step 1. Should I then read the number of characters by counting the selection done by the find process or is there a better way? The second step can be achieved by one of these new "send selector" commands, I guess.
Search the next string of Chinese characters
Select as many (English) before the string as characters have been found in it. (To put them afterwards into italics or the like).
I can see that an ordinary search with \p{Han} is doing step 1. Should I then read the number of characters by counting the selection done by the find process or is there a better way? The second step can be achieved by one of these new "send selector" commands, I guess.
Re: Getting the number of characters of a found text string
"Select as many (English)" should read as: "Select as many (English) words". Sorry for this typo.
- martin
- Official Nisus Person
- Posts: 5228
- Joined: 2002-07-11 17:14:10
- Location: San Diego, CA
- Contact:
Re: Getting the number of characters of a found text string
As they say, there's many ways to skin this cat. But probably this is most efficient:
If you want any part of that second Find expression explained, I'd be happy to.
Code: Select all
# Find next group of Han characters
If Find '\p{Han}+', 'E-W'
$selection = TextSelection.active
$count = $selection.length
# process the following English text
If Find "(?:\\W+\\p{Latin}+){$count,$count}", 'E-W'
Italic
Else
Prompt "Could not match $count following English words."
End
End
Re: Getting the number of characters of a found text string
As it is there are two problems with this macro. The first is, the selection of English words should be before, not afterwards: you need to look backwards. This is for practical reasons: Take an English text with interspersed terms in Chinese. You want the English reader to first know how to pronounce what follows. The second problem is: no commas, parenthesis etc. should be selected, only the letters of the words.
- martin
- Official Nisus Person
- Posts: 5228
- Joined: 2002-07-11 17:14:10
- Location: San Diego, CA
- Contact:
Re: Getting the number of characters of a found text string
Sorry, I missed the part about selecting the preceding text and not the following text. This should do what you need:
Code: Select all
# Find next group of Han characters
If Find '\p{Han}+', 'E-W'
$selection = TextSelection.active
$count = $selection.length
# process the preceding English text
While $count > 0
$found = Find '\p{Latin}+', 'Eb-W'
If ! $found
Exit 'Missing preceding English text, aborting!'
End
$count -= 1
Italic
End
End
Re: Getting the number of characters of a found text string
Thanks a lot, Martin, this is very helpful and I see that it is a more elegant procedure than the one I had in mind.
To become really perfect there are still two little problems to be solved, and I think I could find out how to. But in case you want to show me, the problems are these:
1) Just in case the text to be put into Italics has already been put into Italics, the macro puts it back to normal, which it should not.
2) The macro is meant to be used over and over again (but permitting visual control at each go). So the cursor on completion should be at the end of the first string it found.
PS The revised macro does not have the line any more you had proposed to explain if necessary. Would you explain it any way?
To become really perfect there are still two little problems to be solved, and I think I could find out how to. But in case you want to show me, the problems are these:
1) Just in case the text to be put into Italics has already been put into Italics, the macro puts it back to normal, which it should not.
2) The macro is meant to be used over and over again (but permitting visual control at each go). So the cursor on completion should be at the end of the first string it found.
PS The revised macro does not have the line any more you had proposed to explain if necessary. Would you explain it any way?
- martin
- Official Nisus Person
- Posts: 5228
- Joined: 2002-07-11 17:14:10
- Location: San Diego, CA
- Contact:
Re: Getting the number of characters of a found text string
This revised macro should do the trick:js wrote:1) Just in case the text to be put into Italics has already been put into Italics, the macro puts it back to normal, which it should not.
2) The macro is meant to be used over and over again (but permitting visual control at each go). So the cursor on completion should be at the end of the first string it found.
Code: Select all
$doc = Document.active
# Find next group of Han characters
If Find '\p{Han}+', 'E-W'
$selection = $doc.textSelection
$count = $selection.length
# process the preceding English text
While $count > 0
$found = Find '\p{Latin}+', 'Eb-W'
If ! $found
Exit 'Missing preceding English text, aborting!'
End
$count -= 1
# if not already italic, then apply
$isItalic = Menu State ':Format:Italic'
If ! $isItalic
Italic
End
End
# place selection just after Han characters we first found
$doc.setSelection($selection)
Select End
End
Sure, so we have this PowerFind Pro (regular expression):PS The revised macro does not have the line any more you had proposed to explain if necessary. Would you explain it any way?
Code: Select all
Find "(?:\\W+\\p{Latin}+){$count,$count}", 'E-W'
Code: Select all
Find "(\\W+\\p{Latin}+){$count,$count}", 'E-W'
Find "(?:\\W+\\p{Latin}+){$count,$count}", 'E-W'
Code: Select all
(?:\\W+\\p{Latin}+)
The reason the backslashes are doubled-up is because we use double-quote string literals, which is first interpreted by the macro language (eg: "$count" is replaced by an actual number, etc). So by the time the Find command sees the expression, the backslashes have been reduced by one.
Hopefully that's somewhat clear. Let me know if you have any questions.
Re: Getting the number of characters of a found text string
Thanks for the macro, with the test of a menu state, Martin. And also for the explanations that are easy to follow. I guess {$count,$count} could be abbreviated to {$count}, couldn't it?
- martin
- Official Nisus Person
- Posts: 5228
- Joined: 2002-07-11 17:14:10
- Location: San Diego, CA
- Contact:
Re: Getting the number of characters of a found text string
Yes, I forgot about that shorthand, that would definitely work!