Page 1 of 1
Context Extraction Marco?
Posted: 2010-02-12 06:55:05
by bjast
I'm new to Nisus Pro, and I have a marco idea that would be most helpful for writers. But I'm sure, at this point, it is beyond my ability - although it is somewhat similar to some others I've seen.
I would like to have NWP do a Context Extraction:
Search for a word and have all of the relevant passages copied to another document, with the sentences that they appeared in.
Is this even possible?
Re: Context Extraction Marco?
Posted: 2010-02-12 07:19:09
by Kino
Re: Context Extraction Marco?
Posted: 2010-02-12 10:25:51
by bjast
Bingo!
That's it exactly.
Much appreciated.
Re: Context Extraction Marco?
Posted: 2010-02-12 20:02:05
by bjast
Hello Kino
Thanks again for your help with the Context Extraction macro.
I know that I said this was "exactly what I was looking for". And that beggars shouldn't be choosers. But I'm wondering if it would be too much to ask how to modify this marco to list the complete sentence each occurrence of the word appeared in - rather than just the line the word appeared on? And - hopefully that wasn't you gasping I just heard - to place a carriage return after each entry?
This two additions would make this macro even more useful.
Thanks
Re: Context Extraction Marco?
Posted: 2010-02-13 03:52:16
by Kino
Because of its limitations, the Macro Prompt dialog box cannot show the results in such a way. So I modified the macro so that, when ‘Copy All Matches to New File’ is chosen, it produces a file containing all sentences in which the found string is situated.
Code: Select all
$outputLength = 70 # length of characters for a result shown in the Macro Prompt dialog box
$displayItemNumber = 30
$pageLineSep = Cast to String '.'
$contextSep = Cast to String ': '
$foundPrefix = Cast to String '«'
$foundSuffix = Cast to String '»'
$outputSep = Cast to String "\n\n"
$bookmarkPrefix = '## ' # add a prefix so that bookmarks created by this macro are distinctive and sorted first
$bookmarkSuffix = '' # put anything you like between the single quotes or leave it blank
$contentTypeName = Hash.new
$contentTypeName{'table'} = Cast to String ' (table)'
$contentTypeName{'header'} = Cast to String ' (header)'
$contentTypeName{'footer'} = Cast to String ' (footer)'
$contentTypeName{'comment'} = Cast to String ' (comment)'
$contentTypeName{'footnotes'} = Cast to String ' (note)'
$contentTypeName{'endnotes'} = Cast to String ' (note)'
$contentTypeName{'sectionnotes'} = Cast to String ' (note)'
Require Pro Version 1.3
Debug.setCodeProfilingEnabled false
$doc = Document.active
if $doc == undefined
exit # exit silently
end
$selsOrig = $doc.textSelections
$findOpt = Read Find Options
$findExp = Read Find Expression
$isPowerFindOrAttrSensitive = $findOpt.find 'e|u', 'E-i'
if $isPowerFindOrAttrSensitive == undefined
$message = 'What to find?'
$detail = 'Find options are those you have set in the Find & Replace window.'
$findExp = Prompt Input $message, $detail, '', $findExp
end
if $findExp == ''
Set Find Shown true
exit 'Find expression is empty, aborting.'
end
$numFound = Find All $findExp, '*'
if ! $numFound
exit 'No matches, exiting...'
end
$sels = $doc.textSelections
$doc.setSelections $selsOrig # restore the original selections
$map = Hash.new # newline, soft return, break will be shown as | and tab as space
$map{0x000A} = $map{0x000C} = $map{0x2028} = '/'
$map{0x0009} = 0x0020
$outputArray = Array.new
$outputHash = $pageLine = Hash.new
$outputLength -= $foundPrefix.length + $foundSuffix.length
foreach $sel in $sels
$text = $sel.text
$found = $sel.substring
$outputTemp = $sel.text.pageNumberAtIndex $sel.location, true
$lineNum = $sel.text.lineNumberAtIndex $sel.location, true
if $lineNum == undefined
$outputTemp &= $contentTypeName{$text.documentContentType}
else
$outputTemp &= $pageLineSep & $lineNum
end
$pageLine{$sel} = $outputTemp
$outputTemp &= $contextSep
$preFoundLen = $outputLength - $outputTemp.length
$preFoundLen -= $found.length
$preFoundLen = Cast to Int $preFoundLen / 2, false
if $preFoundLen > $sel.location
$preFoundLen = $sel.location
end
$preFoundSel = TextSelection.newWithLocationAndBound $text, $sel.location - $preFoundLen, $sel.location
$postFoundLen = $outputLength - $preFoundLen
$postFoundLen -= $found.length
$contextBound = $sel.bound + $postFoundLen
if $contextBound > $sel.text.length
$postFoundLen = $sel.text.length - $sel.bound
end
$postFoundSel = TextSelection.newWithLocationAndBound $text, $sel.bound, $sel.bound + $postFoundLen
$outputTemp &= $preFoundSel.substring
$outputTemp &= $foundPrefix & $found
$outputTemp &= $foundSuffix & $postFoundSel.substring
$range = Range.new 0, $outputTemp.length
$outputTemp.transliterateInRange $range, $map
$outputArray.appendValue $outputTemp
$outputHash{$outputTemp} = $sel
end
$message = 'Show which?'
$detail = $sels.count & ' found (page'
$detail &= $pageLineSep & 'line'
$detail &= $contextSep & $foundPrefix
$detail &= 'found' & $foundSuffix
$detail &= ' with context)'
$clipboard = '➪ Copy All Matches to Clipboard'
$newFile = '➪ Copy All Matches to New File'
$bookmark = '➪ Bookmark All Matches'
$actions = Array.new $clipboard, $newFile, $bookmark
if $outputArray.count > $displayItemNumber
$continue = true
$previous = '[↑ SEE PREVIOUS MATCHES]'
$next = '[↓ SEE NEXT MATCHES]'
$index = 0
$count = $displayItemNumber
$lastItemIndex = $sels.count
while $continue == true
$displayArray = $outputArray.subarrayAtIndex $index, $count
foreach $action in $actions
$displayArray.prependValue $action
end
$indexBound = $index + $count
if $lastItemIndex > $indexBound
$displayArray.prependValue $next
end
if $index > 0
$displayArray.prependValue $previous
end
$userInput = Prompt Options $message, $detail, 'OK', $displayArray
if $userInput == $previous
$index -= $displayItemNumber
$count = $displayItemNumber
elsif $userInput == $next
$index += $count
$count = $displayItemNumber
$indexBound = $index + $count
if $indexBound > $lastItemIndex
$count = $sels.count - $index
end
else
$continue = false
end
end
else
$displayArray = $outputArray
foreach $action in $actions
$displayArray.prependValue $action
end
$userInput = Prompt Options $message, $detail, 'OK', $displayArray
end
$isAction = $actions.indexOfValue $userInput
if $isAction != -1
if $userInput == $bookmark
$ZWSP = Cast to String "\x{200B}"
foreach $sel in $sels
$bookmarkName = $bookmarkPrefix & $sel.substring
$bookmarkName &= $contextSep & $pageLine{$sel}
$bookmarkName &= $bookmarkSuffix
$doc.setSelection $sel
while Bookmark Exists $bookmarkName
$bookmarkName &= $ZWSP
end
Add Bookmark As $bookmarkName
end
Set Navigator Shown true
Set Navigator Mode 'Bookmarks by Alpha'
elsif $userInput == $clipboard
$output = $outputArray.join $outputSep
Write Clipboard $output
else # i.e. $userInput == $newFile
$output = Array.new
$SelectSentence = Hash.new
$SelectSentence{'Danish'} = ':Rediger:Vælg:Vælg sætning'
$SelectSentence{'English'} = ':Edit:Select:Select Sentence'
$SelectSentence{'French'} = ':Édition:Sélectionner:Sélectionner Phrase'
$SelectSentence{'German'} = ':Bearbeiten:Auswählen:Satz auswählen'
$SelectSentence{'Italian'} = ':Modifica:Seleziona:Seleziona periodo'
$SelectSentence{'Japanese'} = ':編集:選択:文章選択'
$SelectSentence{'Polish'} = ':Edytuj:Zaznacz:Zaznacz Zdanie'
$SelectSentence{'Spanish'} = ':Editar:Seleccionar:Seleccionar frase'
$lproj = Application Property 'localization'
foreach $sel in $sels
$outputTemp = $sel.text.pageNumberAtIndex $sel.location, true
$lineNum = $sel.text.lineNumberAtIndex $sel.location, true
if $lineNum == undefined
$outputTemp &= $contentTypeName{$text.documentContentType}
else
$outputTemp &= $pageLineSep & $lineNum
end
$pageLine{$sel} = $outputTemp
$outputTemp &= $contextSep
$doc.setSelection $sel
Document.setActive $doc
Menu $SelectSentence{$lproj}
$range = TextSelection.activeRange
$substring = Cast to String $doc.selectedSubtext, false # false: fix cross-references
$substring.insertAtIndex $sel.bound - $range.location, $foundSuffix
$substring.insertAtIndex $sel.location - $range.location, $foundPrefix
$outputTemp &= $substring
$output.appendValue $outputTemp
end
$doc.setSelections $selsOrig # restore the original selections
$output = $output.join $outputSep
Document.newWithText $output
end
else
$doc.setSelection $outputHash{$userInput}
end
Re: Context Extraction Marco?
Posted: 2010-02-13 07:59:52
by bjast
Thanks Kino, this is quite a extensive macro. Reading it gives one an appreciation for how much you put into these projects.
Since I will be using it extensively to copy text from the resulting file, I was able to remove the << >> marks around each found word. This got me wondering if the original found text could be used to highlight all of the found items in the new file. Similar to when you do a Find All from the Find dialog.
I would be glad to add this on my own, but am afraid I would end up hammer what you have already accomplished.
Are you up for this, or have I already asked too much of you?
Either way I'm grateful for what you've offered.
Re: Context Extraction Marco?
Posted: 2010-02-13 08:41:54
by Kino
If I don’t misunderstand, you can achieve that by adding the following lines just after
Document.newWithText $output near the end of the macro.
Code: Select all
$findFound = '\Q' & $foundPrefix
$findFound &= '\E(\p{Any}+?)\Q' & $foundSuffix
$findFound &= '\E'
Replace All $findFound, '\1', 'E'
$highlight = Color.newWithHexTriplet 0xFFCC00
Set Highlight Color $highlight
Select Start
This code assumes that « and » are not used anywhere in your documents. If it is not the case, use other characters (or character sequences) as
$foundPrefix and
$foundSuffix defined near the beginning of the macro, e.g. { and }, ` and ^, etc.
To change the highlight colour, replace 0xFFCC00 with the hex RGB value of your favourite colour.
http://en.wikipedia.org/wiki/Web_colors
Re: Context Extraction Marco?
Posted: 2010-02-13 09:22:51
by bjast
Thanks for this addition to your macro. Unfortunately, when I added the additional code to the macro it ended up not just selecting all of the text in the file, but coloring the entire page - even when nothing was selected.
Perhaps I added the code in the wrong place. Here is a sample of where I placed it:
$doc.setSelections $selsOrig # restore the original selections
$output = $output.join $outputSep
Document.newWithText $output
#The following was added to select all of the searched for values that appear in the document.
$findFound = '\Q' & $foundPrefix
$findFound &= '\E(\p{Any}+?)\Q' & $foundSuffix
$findFound &= '\E'
Replace All $findFound, '\1', 'E'
$highlight = Color.newWithHexTriplet 0xFFCC00
Set Highlight Color $highlight
Select Start
end
else
$doc.setSelection $outputHash{$userInput}
end
Re: Context Extraction Marco?
Posted: 2010-02-13 09:37:48
by Kino
Strange. That is the right place. Anyway, please try the attached macro file.
I’m going to the bed, so I cannot answer your future questions until tomorrow afternoon in JST.
Re: Context Extraction Marco?
Posted: 2010-02-13 09:44:56
by bjast
Thanks for this addition. It worked just fine. I will compare this your macro to the one I have modified and figure out what I did differently, then use your adjustments to make my variation work.
I'll let you know how it goes.
Thanks for all of your assistance.
Hope you can get some sleep ...
Re: Context Extraction Marco?
Posted: 2010-02-13 19:17:32
by Kino
kino wrote:This code assumes that « and » are not used anywhere in your documents.
I don’t like that kind of limitations. So I modified the macro again to remove the restriction by making it use calculated location values to apply a highlight colour on found strings.
1. I moved
$highlight = Color.newWithHexTriplet 0xFFCC00 into the top group of macro commands for it is customizable.
2. The last portion of the macro has become as follows:
Code: Select all
else # i.e. $userInput == $newFile
$output = $founds = Array.new
$SelectSentence = Hash.new
$SelectSentence{'Danish'} = ':Rediger:Vælg:Vælg sætning'
$SelectSentence{'English'} = ':Edit:Select:Select Sentence'
$SelectSentence{'French'} = ':Édition:Sélectionner:Sélectionner Phrase'
$SelectSentence{'German'} = ':Bearbeiten:Auswählen:Satz auswählen'
$SelectSentence{'Italian'} = ':Modifica:Seleziona:Seleziona periodo'
$SelectSentence{'Japanese'} = ':編集:選択:文章選択'
$SelectSentence{'Polish'} = ':Edytuj:Zaznacz:Zaznacz Zdanie'
$SelectSentence{'Spanish'} = ':Editar:Seleccionar:Seleccionar frase'
$lproj = Application Property 'localization'
$offset = 0
foreach $sel in $sels
$outputTemp = $sel.text.pageNumberAtIndex $sel.location, true
$lineNum = $sel.text.lineNumberAtIndex $sel.location, true
if $lineNum == undefined
$outputTemp &= $contentTypeName{$text.documentContentType}
else
$outputTemp &= $pageLineSep & $lineNum
end
$outputTemp &= $contextSep
$offset += $outputTemp.length
$doc.setSelection $sel
Document.setActive $doc
Menu $SelectSentence{$lproj}
$range = TextSelection.activeRange
$substring = Cast to String $doc.selectedSubtext, false # false: fix cross-references
$loc = $sel.location - $range.location
$found = Range.new $loc + $offset, $sel.length
$founds.appendValue $found
$outputTemp &= $substring
$output.appendValue $outputTemp
$offset += $substring.length + $outputSep.length
end
$doc.setSelections $selsOrig # restore the original selections
$output = $output.join $outputSep
$output = Document.newWithText $output
$output.clearAndDisableUndoHistory
$sels = Array.new
foreach $found in $founds
$sel = TextSelection.new $output.text, $found
$sels.appendValue $sel
end
Push Target Selection $sels
Set Highlight Color $highlight
Pop Target Selection
end
else
$doc.setSelection $outputHash{$userInput}
end
Re: Context Extraction Marco?
Posted: 2010-02-14 07:09:50
by bjast
This last release completely resolved the highlighting issues. I too, like you, don't like it when there are assumed limitations to a macro. This will now be a very useful tool for me in my research.
Thanks!