Adrian's method is fine, and I use it all the time, but there are number of caveats.
Strictly speaking Adrian's method creates a combined list and then removes items that can be paired. If either list contains duplicates, this could change the result. Also it will leave the unique items from either list.
A good method to find/count the number of occurrences in a list is to use a Hash. Hashes are structures that use key/value pairs. Since keys have to be unique, you will be guaranteed that every item will occur only once. Typical code will look like this:
Code: Select all
$doc = Document.active
$sels = $doc.text.find '\w+', 'Ea'
$list = Hash.new
foreach $sel in $sels
$list{$sel.substring} += 1
end
$uniqueList = $list.keys
Note that at the end of this procedure $uniqueList will be an array containing all the unique items found with the search expression (adjust as necessary). However the order of $uniqueList will be seemingly random. You will have to sort it first, if that's what you want.
Meanwhile $list{$word} will give you the number of occurrences of $word in your document. If you don't really need the count you could use '=1' instead of '+=1' or keep some other useful information about the relevant words. For example with a little extra work you could keep the location of the first/last occurrence or a list of all occurrences, etc.
We can now adapt this procedure to jb's problem.
Code: Select all
$docA = Document.withDisplayName 'List_A.rtf' # Adjust this as appropriate
$sels = $docA.text.find '\w+', 'Ea'
$list = Hash.new
foreach $sel in $sels
$list{$sel.substring} = 1
end
$docB = Document.withDisplayName 'List_B.rtf' # Adjust this as appropriate
$sels = $docB.text.find '\w+', 'Ea'
$duplicateList = Array.new
$notInAList = Array.new
foreach $sel in $sels
$item = $sel.substring
if $list.definesKey($item)
$duplicateList.push $item
else
$notInAList.push $item
end
end
Document.newWithText $notInAList.join("\n")
Notice that in this case I used arrays for the output. This means that
- Multiple occurrences of words not in A will be listed multiple times
- The "not in A list" will have the words in the order they are found in B
If you prefer a unique list you could use a Hash for the "notInAList" instead. In that case write '$notInAList{$item} = 1' (or '+=1') as desired.
Obviously all of these lists can be sorted or rearranged as desired. Also make sure to use find expressions that work for the case you are looking for.
Finally I have used ".substring" because that works fastest. But if necessary use ".subtext" to keep formatting. However in that case you would need to be a bit more careful. The Hash keys will not allow formatted strings (I believe).
Finally it should be said that you could do all of this with arrays instead. Arrays have a command ".containsValue" which could be used to check if an item occurs in the list. This would be much slower with long lists, but with 500 ~ 1,500 words this would hardly be noticeable, I think.