Reply to topic  [ 12 posts ] 
Context Extraction Marco? 
Author Message

Joined: 2010-01-06 10:58:07
Posts: 39
I'm new to Nisus Pro, and I have a marco idea that would be most helpful for writers. But I'm sure, at this point, it is beyond my ability - although it is somewhat similar to some others I've seen.

I would like to have NWP do a Context Extraction:

Search for a word and have all of the relevant passages copied to another document, with the sentences that they appeared in.

Is this even possible?


2010-02-12 06:55:05
Profile

Joined: 2008-05-17 04:02:32
Posts: 400
Please try this one.
http://nisus.com/files/GetMacroRedirect.php?file=FindAllAndShowContext


2010-02-12 07:19:09
Profile

Joined: 2010-01-06 10:58:07
Posts: 39
Bingo!

That's it exactly.

Much appreciated.


2010-02-12 10:25:51
Profile

Joined: 2010-01-06 10:58:07
Posts: 39
Hello Kino

Thanks again for your help with the Context Extraction macro.

I know that I said this was "exactly what I was looking for". And that beggars shouldn't be choosers. But I'm wondering if it would be too much to ask how to modify this marco to list the complete sentence each occurrence of the word appeared in - rather than just the line the word appeared on? And - hopefully that wasn't you gasping I just heard - to place a carriage return after each entry?

This two additions would make this macro even more useful.

Thanks


2010-02-12 20:02:05
Profile

Joined: 2008-05-17 04:02:32
Posts: 400
Because of its limitations, the Macro Prompt dialog box cannot show the results in such a way. So I modified the macro so that, when ‘Copy All Matches to New File’ is chosen, it produces a file containing all sentences in which the found string is situated.
Code:
$outputLength = 70  # length of characters for a result shown in the Macro Prompt dialog box
$displayItemNumber = 30
$pageLineSep = Cast to String '.'
$contextSep = Cast to String ': '
$foundPrefix = Cast to String '«'
$foundSuffix = Cast to String '»'
$outputSep = Cast to String "\n\n"
$bookmarkPrefix = '## '  # add a prefix so that bookmarks created by this macro are distinctive and sorted first
$bookmarkSuffix = ''  # put anything you like between the single quotes or leave it blank


$contentTypeName = Hash.new
$contentTypeName{'table'} = Cast to String ' (table)'
$contentTypeName{'header'} = Cast to String ' (header)'
$contentTypeName{'footer'} = Cast to String ' (footer)'
$contentTypeName{'comment'} = Cast to String ' (comment)'
$contentTypeName{'footnotes'} = Cast to String ' (note)'
$contentTypeName{'endnotes'} = Cast to String ' (note)'
$contentTypeName{'sectionnotes'} = Cast to String ' (note)'

Require Pro Version 1.3
Debug.setCodeProfilingEnabled false

$doc = Document.active
if $doc == undefined
   exit  # exit silently
end

$selsOrig = $doc.textSelections

$findOpt = Read Find Options
$findExp = Read Find Expression
$isPowerFindOrAttrSensitive = $findOpt.find 'e|u', 'E-i'

if $isPowerFindOrAttrSensitive == undefined
   $message = 'What to find?'
   $detail = 'Find options are those you have set in the Find & Replace window.'
   $findExp = Prompt Input $message, $detail, '', $findExp
end

if $findExp == ''
   Set Find Shown true
   exit 'Find expression is empty, aborting.'
end

$numFound = Find All $findExp, '*'

if ! $numFound
   exit 'No matches, exiting...'
end

$sels = $doc.textSelections
$doc.setSelections $selsOrig  # restore the original selections

$map = Hash.new  # newline, soft return, break will be shown as | and tab as space
$map{0x000A} = $map{0x000C} = $map{0x2028} = '/'
$map{0x0009} = 0x0020

$outputArray = Array.new
$outputHash = $pageLine = Hash.new
$outputLength -= $foundPrefix.length + $foundSuffix.length

foreach $sel in $sels
   $text = $sel.text
   $found = $sel.substring
   $outputTemp = $sel.text.pageNumberAtIndex $sel.location, true
   $lineNum = $sel.text.lineNumberAtIndex $sel.location, true
   if $lineNum == undefined
      $outputTemp &= $contentTypeName{$text.documentContentType}
   else
      $outputTemp &= $pageLineSep & $lineNum
   end
   $pageLine{$sel} = $outputTemp
   $outputTemp &= $contextSep
   $preFoundLen = $outputLength - $outputTemp.length
   $preFoundLen -= $found.length
   $preFoundLen = Cast to Int $preFoundLen / 2, false
   if $preFoundLen > $sel.location
      $preFoundLen = $sel.location
   end
   $preFoundSel = TextSelection.newWithLocationAndBound $text, $sel.location - $preFoundLen, $sel.location
   $postFoundLen = $outputLength - $preFoundLen
   $postFoundLen -= $found.length
   $contextBound = $sel.bound + $postFoundLen
   if $contextBound  > $sel.text.length
      $postFoundLen = $sel.text.length - $sel.bound
   end
   $postFoundSel = TextSelection.newWithLocationAndBound $text, $sel.bound, $sel.bound + $postFoundLen
   $outputTemp &= $preFoundSel.substring
   $outputTemp &= $foundPrefix & $found
   $outputTemp &= $foundSuffix & $postFoundSel.substring
   $range = Range.new 0, $outputTemp.length
   $outputTemp.transliterateInRange $range, $map
   $outputArray.appendValue $outputTemp
   $outputHash{$outputTemp} = $sel
end

$message = 'Show which?'
$detail = $sels.count & ' found (page'
$detail &= $pageLineSep & 'line'
$detail &= $contextSep & $foundPrefix
$detail &= 'found' & $foundSuffix
$detail &= ' with context)'

$clipboard = '➪ Copy All Matches to Clipboard'
$newFile = '➪ Copy All Matches to New File'
$bookmark = '➪ Bookmark All Matches'
$actions = Array.new $clipboard, $newFile, $bookmark

if $outputArray.count > $displayItemNumber
   $continue = true
   $previous = '[↑ SEE PREVIOUS MATCHES]'
   $next = '[↓ SEE NEXT MATCHES]'
   $index = 0
   $count = $displayItemNumber
   $lastItemIndex = $sels.count
   while $continue == true
      $displayArray = $outputArray.subarrayAtIndex $index, $count
      foreach $action in $actions
         $displayArray.prependValue $action
      end
      $indexBound = $index + $count
      if $lastItemIndex > $indexBound
         $displayArray.prependValue $next
      end
      if $index > 0
         $displayArray.prependValue $previous
      end
      $userInput = Prompt Options $message, $detail, 'OK', $displayArray
      if $userInput == $previous
         $index -= $displayItemNumber
         $count = $displayItemNumber
      elsif $userInput == $next
         $index += $count
         $count = $displayItemNumber
         $indexBound = $index + $count
         if $indexBound > $lastItemIndex
            $count = $sels.count - $index
         end
      else
         $continue = false
      end
   end
else
   $displayArray = $outputArray
   foreach $action in $actions
      $displayArray.prependValue $action
   end
   $userInput = Prompt Options $message, $detail, 'OK', $displayArray
end

$isAction = $actions.indexOfValue $userInput
if $isAction != -1
   if $userInput == $bookmark
      $ZWSP = Cast to String "\x{200B}"
      foreach $sel in $sels
         $bookmarkName = $bookmarkPrefix & $sel.substring
         $bookmarkName &= $contextSep & $pageLine{$sel}
         $bookmarkName &= $bookmarkSuffix
         $doc.setSelection $sel
         while Bookmark Exists $bookmarkName
            $bookmarkName &= $ZWSP
         end
         Add Bookmark As $bookmarkName
      end
      Set Navigator Shown true
      Set Navigator Mode 'Bookmarks by Alpha'
   elsif $userInput == $clipboard
      $output = $outputArray.join $outputSep
      Write Clipboard $output
   else  # i.e. $userInput == $newFile
      $output = Array.new
      $SelectSentence = Hash.new
      $SelectSentence{'Danish'} = ':Rediger:Vælg:Vælg sætning'
      $SelectSentence{'English'} = ':Edit:Select:Select Sentence'
      $SelectSentence{'French'} = ':Édition:Sélectionner:Sélectionner Phrase'
      $SelectSentence{'German'} = ':Bearbeiten:Auswählen:Satz auswählen'
      $SelectSentence{'Italian'} = ':Modifica:Seleziona:Seleziona periodo'
      $SelectSentence{'Japanese'} = ':編集:選択:文章選択'
      $SelectSentence{'Polish'} = ':Edytuj:Zaznacz:Zaznacz Zdanie'
      $SelectSentence{'Spanish'} = ':Editar:Seleccionar:Seleccionar frase'
      $lproj = Application Property 'localization'
      foreach $sel in $sels
         $outputTemp = $sel.text.pageNumberAtIndex $sel.location, true
         $lineNum = $sel.text.lineNumberAtIndex $sel.location, true
         if $lineNum == undefined
            $outputTemp &= $contentTypeName{$text.documentContentType}
         else
            $outputTemp &= $pageLineSep & $lineNum
         end
         $pageLine{$sel} = $outputTemp
         $outputTemp &= $contextSep
         $doc.setSelection $sel
         Document.setActive $doc
         Menu $SelectSentence{$lproj}
         $range = TextSelection.activeRange
         $substring = Cast to String $doc.selectedSubtext, false  # false: fix cross-references
         $substring.insertAtIndex $sel.bound - $range.location, $foundSuffix
         $substring.insertAtIndex $sel.location - $range.location, $foundPrefix
         $outputTemp &= $substring
         $output.appendValue $outputTemp
      end
      $doc.setSelections $selsOrig  # restore the original selections
      $output = $output.join $outputSep
      Document.newWithText $output
   end
else
   $doc.setSelection $outputHash{$userInput}
end


2010-02-13 03:52:16
Profile

Joined: 2010-01-06 10:58:07
Posts: 39
Thanks Kino, this is quite a extensive macro. Reading it gives one an appreciation for how much you put into these projects.

Since I will be using it extensively to copy text from the resulting file, I was able to remove the << >> marks around each found word. This got me wondering if the original found text could be used to highlight all of the found items in the new file. Similar to when you do a Find All from the Find dialog.

I would be glad to add this on my own, but am afraid I would end up hammer what you have already accomplished.

Are you up for this, or have I already asked too much of you?

Either way I'm grateful for what you've offered.


2010-02-13 07:59:52
Profile

Joined: 2008-05-17 04:02:32
Posts: 400
If I don’t misunderstand, you can achieve that by adding the following lines just after Document.newWithText $output near the end of the macro.
Code:
      $findFound = '\Q' & $foundPrefix
      $findFound &= '\E(\p{Any}+?)\Q' & $foundSuffix
      $findFound &= '\E'
      Replace All $findFound, '\1', 'E'
      $highlight = Color.newWithHexTriplet 0xFFCC00
      Set Highlight Color $highlight
      Select Start

This code assumes that « and » are not used anywhere in your documents. If it is not the case, use other characters (or character sequences) as $foundPrefix and $foundSuffix defined near the beginning of the macro, e.g. { and }, ` and ^, etc.

To change the highlight colour, replace 0xFFCC00 with the hex RGB value of your favourite colour.
http://en.wikipedia.org/wiki/Web_colors


2010-02-13 08:41:54
Profile

Joined: 2010-01-06 10:58:07
Posts: 39
Thanks for this addition to your macro. Unfortunately, when I added the additional code to the macro it ended up not just selecting all of the text in the file, but coloring the entire page - even when nothing was selected.

Perhaps I added the code in the wrong place. Here is a sample of where I placed it:

$doc.setSelections $selsOrig # restore the original selections
$output = $output.join $outputSep
Document.newWithText $output

#The following was added to select all of the searched for values that appear in the document.
$findFound = '\Q' & $foundPrefix
$findFound &= '\E(\p{Any}+?)\Q' & $foundSuffix
$findFound &= '\E'
Replace All $findFound, '\1', 'E'
$highlight = Color.newWithHexTriplet 0xFFCC00
Set Highlight Color $highlight
Select Start

end
else
$doc.setSelection $outputHash{$userInput}
end


2010-02-13 09:22:51
Profile

Joined: 2008-05-17 04:02:32
Posts: 400
Strange. That is the right place. Anyway, please try the attached macro file.
Attachment:
FindAllandShowContext_nwm.zip [7.01 KiB]
Downloaded 268 times

I’m going to the bed, so I cannot answer your future questions until tomorrow afternoon in JST.


2010-02-13 09:37:48
Profile

Joined: 2010-01-06 10:58:07
Posts: 39
Thanks for this addition. It worked just fine. I will compare this your macro to the one I have modified and figure out what I did differently, then use your adjustments to make my variation work.

I'll let you know how it goes.

Thanks for all of your assistance.

Hope you can get some sleep ...


2010-02-13 09:44:56
Profile

Joined: 2008-05-17 04:02:32
Posts: 400
kino wrote:
This code assumes that « and » are not used anywhere in your documents.
I don’t like that kind of limitations. So I modified the macro again to remove the restriction by making it use calculated location values to apply a highlight colour on found strings.
Attachment:
FindAllandShowContext_nwm.zip [7.19 KiB]
Downloaded 270 times

1. I moved $highlight = Color.newWithHexTriplet 0xFFCC00 into the top group of macro commands for it is customizable.

2. The last portion of the macro has become as follows:
Code:
   else  # i.e. $userInput == $newFile
      $output = $founds = Array.new
      $SelectSentence = Hash.new
      $SelectSentence{'Danish'} = ':Rediger:Vælg:Vælg sætning'
      $SelectSentence{'English'} = ':Edit:Select:Select Sentence'
      $SelectSentence{'French'} = ':Édition:Sélectionner:Sélectionner Phrase'
      $SelectSentence{'German'} = ':Bearbeiten:Auswählen:Satz auswählen'
      $SelectSentence{'Italian'} = ':Modifica:Seleziona:Seleziona periodo'
      $SelectSentence{'Japanese'} = ':編集:選択:文章選択'
      $SelectSentence{'Polish'} = ':Edytuj:Zaznacz:Zaznacz Zdanie'
      $SelectSentence{'Spanish'} = ':Editar:Seleccionar:Seleccionar frase'
      $lproj = Application Property 'localization'
      $offset = 0
      foreach $sel in $sels
         $outputTemp = $sel.text.pageNumberAtIndex $sel.location, true
         $lineNum = $sel.text.lineNumberAtIndex $sel.location, true
         if $lineNum == undefined
            $outputTemp &= $contentTypeName{$text.documentContentType}
         else
            $outputTemp &= $pageLineSep & $lineNum
         end
         $outputTemp &= $contextSep
         $offset += $outputTemp.length
         $doc.setSelection $sel
         Document.setActive $doc
         Menu $SelectSentence{$lproj}
         $range = TextSelection.activeRange
         $substring = Cast to String $doc.selectedSubtext, false  # false: fix cross-references
         $loc = $sel.location - $range.location
         $found = Range.new $loc + $offset, $sel.length
         $founds.appendValue $found
         $outputTemp &= $substring
         $output.appendValue $outputTemp
         $offset += $substring.length + $outputSep.length
      end
      $doc.setSelections $selsOrig  # restore the original selections
      $output = $output.join $outputSep
      $output = Document.newWithText $output
      $output.clearAndDisableUndoHistory
      $sels = Array.new
      foreach $found in $founds
         $sel = TextSelection.new $output.text, $found
         $sels.appendValue $sel
      end
      Push Target Selection $sels
         Set Highlight Color $highlight
      Pop Target Selection
   end
else
   $doc.setSelection $outputHash{$userInput}
end


2010-02-13 19:17:32
Profile

Joined: 2010-01-06 10:58:07
Posts: 39
This last release completely resolved the highlighting issues. I too, like you, don't like it when there are assumed limitations to a macro. This will now be a very useful tool for me in my research.

Thanks!


2010-02-14 07:09:50
Profile
Display posts from previous:  Sort by  
Reply to topic   [ 12 posts ] 

Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software