Context Extraction Marco?

Get help using and writing Nisus Writer Pro macros.
Post Reply
bjast
Posts: 39
Joined: 2010-01-06 10:58:07

Context Extraction Marco?

Post by bjast » 2010-02-12 06:55:05

I'm new to Nisus Pro, and I have a marco idea that would be most helpful for writers. But I'm sure, at this point, it is beyond my ability - although it is somewhat similar to some others I've seen.

I would like to have NWP do a Context Extraction:

Search for a word and have all of the relevant passages copied to another document, with the sentences that they appeared in.

Is this even possible?

Kino
Posts: 400
Joined: 2008-05-17 04:02:32

Re: Context Extraction Marco?

Post by Kino » 2010-02-12 07:19:09


bjast
Posts: 39
Joined: 2010-01-06 10:58:07

Re: Context Extraction Marco?

Post by bjast » 2010-02-12 10:25:51

Bingo!

That's it exactly.

Much appreciated.

bjast
Posts: 39
Joined: 2010-01-06 10:58:07

Re: Context Extraction Marco?

Post by bjast » 2010-02-12 20:02:05

Hello Kino

Thanks again for your help with the Context Extraction macro.

I know that I said this was "exactly what I was looking for". And that beggars shouldn't be choosers. But I'm wondering if it would be too much to ask how to modify this marco to list the complete sentence each occurrence of the word appeared in - rather than just the line the word appeared on? And - hopefully that wasn't you gasping I just heard - to place a carriage return after each entry?

This two additions would make this macro even more useful.

Thanks

Kino
Posts: 400
Joined: 2008-05-17 04:02:32

Re: Context Extraction Marco?

Post by Kino » 2010-02-13 03:52:16

Because of its limitations, the Macro Prompt dialog box cannot show the results in such a way. So I modified the macro so that, when ‘Copy All Matches to New File’ is chosen, it produces a file containing all sentences in which the found string is situated.

Code: Select all

$outputLength = 70  # length of characters for a result shown in the Macro Prompt dialog box
$displayItemNumber = 30
$pageLineSep = Cast to String '.'
$contextSep = Cast to String ': '
$foundPrefix = Cast to String '«'
$foundSuffix = Cast to String '»'
$outputSep = Cast to String "\n\n"
$bookmarkPrefix = '## '  # add a prefix so that bookmarks created by this macro are distinctive and sorted first
$bookmarkSuffix = ''  # put anything you like between the single quotes or leave it blank


$contentTypeName = Hash.new
$contentTypeName{'table'} = Cast to String ' (table)'
$contentTypeName{'header'} = Cast to String ' (header)'
$contentTypeName{'footer'} = Cast to String ' (footer)'
$contentTypeName{'comment'} = Cast to String ' (comment)'
$contentTypeName{'footnotes'} = Cast to String ' (note)'
$contentTypeName{'endnotes'} = Cast to String ' (note)'
$contentTypeName{'sectionnotes'} = Cast to String ' (note)'

Require Pro Version 1.3
Debug.setCodeProfilingEnabled false

$doc = Document.active
if $doc == undefined
	exit  # exit silently
end

$selsOrig = $doc.textSelections

$findOpt = Read Find Options
$findExp = Read Find Expression
$isPowerFindOrAttrSensitive = $findOpt.find 'e|u', 'E-i'

if $isPowerFindOrAttrSensitive == undefined
	$message = 'What to find?'
	$detail = 'Find options are those you have set in the Find & Replace window.'
	$findExp = Prompt Input $message, $detail, '', $findExp
end

if $findExp == ''
	Set Find Shown true
	exit 'Find expression is empty, aborting.'
end

$numFound = Find All $findExp, '*'

if ! $numFound
	exit 'No matches, exiting...'
end

$sels = $doc.textSelections
$doc.setSelections $selsOrig  # restore the original selections

$map = Hash.new  # newline, soft return, break will be shown as | and tab as space
$map{0x000A} = $map{0x000C} = $map{0x2028} = '/'
$map{0x0009} = 0x0020

$outputArray = Array.new
$outputHash = $pageLine = Hash.new
$outputLength -= $foundPrefix.length + $foundSuffix.length

foreach $sel in $sels
	$text = $sel.text
	$found = $sel.substring
	$outputTemp = $sel.text.pageNumberAtIndex $sel.location, true
	$lineNum = $sel.text.lineNumberAtIndex $sel.location, true
	if $lineNum == undefined
		$outputTemp &= $contentTypeName{$text.documentContentType}
	else
		$outputTemp &= $pageLineSep & $lineNum
	end
	$pageLine{$sel} = $outputTemp
	$outputTemp &= $contextSep
	$preFoundLen = $outputLength - $outputTemp.length
	$preFoundLen -= $found.length
	$preFoundLen = Cast to Int $preFoundLen / 2, false
	if $preFoundLen > $sel.location
		$preFoundLen = $sel.location
	end
	$preFoundSel = TextSelection.newWithLocationAndBound $text, $sel.location - $preFoundLen, $sel.location
	$postFoundLen = $outputLength - $preFoundLen
	$postFoundLen -= $found.length
	$contextBound = $sel.bound + $postFoundLen
	if $contextBound  > $sel.text.length
		$postFoundLen = $sel.text.length - $sel.bound
	end
	$postFoundSel = TextSelection.newWithLocationAndBound $text, $sel.bound, $sel.bound + $postFoundLen
	$outputTemp &= $preFoundSel.substring
	$outputTemp &= $foundPrefix & $found
	$outputTemp &= $foundSuffix & $postFoundSel.substring
	$range = Range.new 0, $outputTemp.length
	$outputTemp.transliterateInRange $range, $map
	$outputArray.appendValue $outputTemp
	$outputHash{$outputTemp} = $sel
end

$message = 'Show which?'
$detail = $sels.count & ' found (page'
$detail &= $pageLineSep & 'line'
$detail &= $contextSep & $foundPrefix
$detail &= 'found' & $foundSuffix
$detail &= ' with context)'

$clipboard = '➪ Copy All Matches to Clipboard'
$newFile = '➪ Copy All Matches to New File'
$bookmark = '➪ Bookmark All Matches'
$actions = Array.new $clipboard, $newFile, $bookmark

if $outputArray.count > $displayItemNumber
	$continue = true
	$previous = '[↑ SEE PREVIOUS MATCHES]'
	$next = '[↓ SEE NEXT MATCHES]'
	$index = 0
	$count = $displayItemNumber
	$lastItemIndex = $sels.count
	while $continue == true
		$displayArray = $outputArray.subarrayAtIndex $index, $count
		foreach $action in $actions
			$displayArray.prependValue $action
		end
		$indexBound = $index + $count
		if $lastItemIndex > $indexBound
			$displayArray.prependValue $next
		end
		if $index > 0
			$displayArray.prependValue $previous
		end
		$userInput = Prompt Options $message, $detail, 'OK', $displayArray
		if $userInput == $previous
			$index -= $displayItemNumber
			$count = $displayItemNumber
		elsif $userInput == $next
			$index += $count
			$count = $displayItemNumber
			$indexBound = $index + $count
			if $indexBound > $lastItemIndex
				$count = $sels.count - $index
			end
		else
			$continue = false
		end
	end
else
	$displayArray = $outputArray
	foreach $action in $actions
		$displayArray.prependValue $action
	end
	$userInput = Prompt Options $message, $detail, 'OK', $displayArray
end

$isAction = $actions.indexOfValue $userInput
if $isAction != -1
	if $userInput == $bookmark
		$ZWSP = Cast to String "\x{200B}"
		foreach $sel in $sels
			$bookmarkName = $bookmarkPrefix & $sel.substring
			$bookmarkName &= $contextSep & $pageLine{$sel}
			$bookmarkName &= $bookmarkSuffix
			$doc.setSelection $sel
			while Bookmark Exists $bookmarkName
				$bookmarkName &= $ZWSP
			end
			Add Bookmark As $bookmarkName
		end
		Set Navigator Shown true
		Set Navigator Mode 'Bookmarks by Alpha'
	elsif $userInput == $clipboard
		$output = $outputArray.join $outputSep
		Write Clipboard $output
	else  # i.e. $userInput == $newFile
		$output = Array.new
		$SelectSentence = Hash.new
		$SelectSentence{'Danish'} = ':Rediger:Vælg:Vælg sætning'
		$SelectSentence{'English'} = ':Edit:Select:Select Sentence'
		$SelectSentence{'French'} = ':Édition:Sélectionner:Sélectionner Phrase'
		$SelectSentence{'German'} = ':Bearbeiten:Auswählen:Satz auswählen'
		$SelectSentence{'Italian'} = ':Modifica:Seleziona:Seleziona periodo'
		$SelectSentence{'Japanese'} = ':編集:選択:文章選択'
		$SelectSentence{'Polish'} = ':Edytuj:Zaznacz:Zaznacz Zdanie'
		$SelectSentence{'Spanish'} = ':Editar:Seleccionar:Seleccionar frase'
		$lproj = Application Property 'localization'
		foreach $sel in $sels
			$outputTemp = $sel.text.pageNumberAtIndex $sel.location, true
			$lineNum = $sel.text.lineNumberAtIndex $sel.location, true
			if $lineNum == undefined
				$outputTemp &= $contentTypeName{$text.documentContentType}
			else
				$outputTemp &= $pageLineSep & $lineNum
			end
			$pageLine{$sel} = $outputTemp
			$outputTemp &= $contextSep
			$doc.setSelection $sel
			Document.setActive $doc
			Menu $SelectSentence{$lproj}
			$range = TextSelection.activeRange
			$substring = Cast to String $doc.selectedSubtext, false  # false: fix cross-references
			$substring.insertAtIndex $sel.bound - $range.location, $foundSuffix
			$substring.insertAtIndex $sel.location - $range.location, $foundPrefix
			$outputTemp &= $substring
			$output.appendValue $outputTemp
		end
		$doc.setSelections $selsOrig  # restore the original selections
		$output = $output.join $outputSep
		Document.newWithText $output
	end
else
	$doc.setSelection $outputHash{$userInput}
end

bjast
Posts: 39
Joined: 2010-01-06 10:58:07

Re: Context Extraction Marco?

Post by bjast » 2010-02-13 07:59:52

Thanks Kino, this is quite a extensive macro. Reading it gives one an appreciation for how much you put into these projects.

Since I will be using it extensively to copy text from the resulting file, I was able to remove the << >> marks around each found word. This got me wondering if the original found text could be used to highlight all of the found items in the new file. Similar to when you do a Find All from the Find dialog.

I would be glad to add this on my own, but am afraid I would end up hammer what you have already accomplished.

Are you up for this, or have I already asked too much of you?

Either way I'm grateful for what you've offered.

Kino
Posts: 400
Joined: 2008-05-17 04:02:32

Re: Context Extraction Marco?

Post by Kino » 2010-02-13 08:41:54

If I don’t misunderstand, you can achieve that by adding the following lines just after Document.newWithText $output near the end of the macro.

Code: Select all

		$findFound = '\Q' & $foundPrefix
		$findFound &= '\E(\p{Any}+?)\Q' & $foundSuffix
		$findFound &= '\E'
		Replace All $findFound, '\1', 'E'
		$highlight = Color.newWithHexTriplet 0xFFCC00
		Set Highlight Color $highlight
		Select Start
This code assumes that « and » are not used anywhere in your documents. If it is not the case, use other characters (or character sequences) as $foundPrefix and $foundSuffix defined near the beginning of the macro, e.g. { and }, ` and ^, etc.

To change the highlight colour, replace 0xFFCC00 with the hex RGB value of your favourite colour.
http://en.wikipedia.org/wiki/Web_colors

bjast
Posts: 39
Joined: 2010-01-06 10:58:07

Re: Context Extraction Marco?

Post by bjast » 2010-02-13 09:22:51

Thanks for this addition to your macro. Unfortunately, when I added the additional code to the macro it ended up not just selecting all of the text in the file, but coloring the entire page - even when nothing was selected.

Perhaps I added the code in the wrong place. Here is a sample of where I placed it:

$doc.setSelections $selsOrig # restore the original selections
$output = $output.join $outputSep
Document.newWithText $output

#The following was added to select all of the searched for values that appear in the document.
$findFound = '\Q' & $foundPrefix
$findFound &= '\E(\p{Any}+?)\Q' & $foundSuffix
$findFound &= '\E'
Replace All $findFound, '\1', 'E'
$highlight = Color.newWithHexTriplet 0xFFCC00
Set Highlight Color $highlight
Select Start

end
else
$doc.setSelection $outputHash{$userInput}
end

Kino
Posts: 400
Joined: 2008-05-17 04:02:32

Re: Context Extraction Marco?

Post by Kino » 2010-02-13 09:37:48

Strange. That is the right place. Anyway, please try the attached macro file.
FindAllandShowContext_nwm.zip
(7.01 KiB) Downloaded 274 times
I’m going to the bed, so I cannot answer your future questions until tomorrow afternoon in JST.

bjast
Posts: 39
Joined: 2010-01-06 10:58:07

Re: Context Extraction Marco?

Post by bjast » 2010-02-13 09:44:56

Thanks for this addition. It worked just fine. I will compare this your macro to the one I have modified and figure out what I did differently, then use your adjustments to make my variation work.

I'll let you know how it goes.

Thanks for all of your assistance.

Hope you can get some sleep ...

Kino
Posts: 400
Joined: 2008-05-17 04:02:32

Re: Context Extraction Marco?

Post by Kino » 2010-02-13 19:17:32

kino wrote:This code assumes that « and » are not used anywhere in your documents.
I don’t like that kind of limitations. So I modified the macro again to remove the restriction by making it use calculated location values to apply a highlight colour on found strings.
FindAllandShowContext_nwm.zip
(7.19 KiB) Downloaded 277 times
1. I moved $highlight = Color.newWithHexTriplet 0xFFCC00 into the top group of macro commands for it is customizable.

2. The last portion of the macro has become as follows:

Code: Select all

	else  # i.e. $userInput == $newFile
		$output = $founds = Array.new
		$SelectSentence = Hash.new
		$SelectSentence{'Danish'} = ':Rediger:Vælg:Vælg sætning'
		$SelectSentence{'English'} = ':Edit:Select:Select Sentence'
		$SelectSentence{'French'} = ':Édition:Sélectionner:Sélectionner Phrase'
		$SelectSentence{'German'} = ':Bearbeiten:Auswählen:Satz auswählen'
		$SelectSentence{'Italian'} = ':Modifica:Seleziona:Seleziona periodo'
		$SelectSentence{'Japanese'} = ':編集:選択:文章選択'
		$SelectSentence{'Polish'} = ':Edytuj:Zaznacz:Zaznacz Zdanie'
		$SelectSentence{'Spanish'} = ':Editar:Seleccionar:Seleccionar frase'
		$lproj = Application Property 'localization'
		$offset = 0
		foreach $sel in $sels
			$outputTemp = $sel.text.pageNumberAtIndex $sel.location, true
			$lineNum = $sel.text.lineNumberAtIndex $sel.location, true
			if $lineNum == undefined
				$outputTemp &= $contentTypeName{$text.documentContentType}
			else
				$outputTemp &= $pageLineSep & $lineNum
			end
			$outputTemp &= $contextSep
			$offset += $outputTemp.length
			$doc.setSelection $sel
			Document.setActive $doc
			Menu $SelectSentence{$lproj}
			$range = TextSelection.activeRange
			$substring = Cast to String $doc.selectedSubtext, false  # false: fix cross-references
			$loc = $sel.location - $range.location
			$found = Range.new $loc + $offset, $sel.length
			$founds.appendValue $found
			$outputTemp &= $substring
			$output.appendValue $outputTemp
			$offset += $substring.length + $outputSep.length
		end
		$doc.setSelections $selsOrig  # restore the original selections
		$output = $output.join $outputSep
		$output = Document.newWithText $output
		$output.clearAndDisableUndoHistory
		$sels = Array.new
		foreach $found in $founds
			$sel = TextSelection.new $output.text, $found
			$sels.appendValue $sel
		end
		Push Target Selection $sels
			Set Highlight Color $highlight
		Pop Target Selection
	end
else
	$doc.setSelection $outputHash{$userInput}
end

bjast
Posts: 39
Joined: 2010-01-06 10:58:07

Re: Context Extraction Marco?

Post by bjast » 2010-02-14 07:09:50

This last release completely resolved the highlighting issues. I too, like you, don't like it when there are assumed limitations to a macro. This will now be a very useful tool for me in my research.

Thanks!

Post Reply