I recently updated the package TriesWithFrequencies.m with two new functions TrieRemoval
and TrieNodeFrequencies. The function TrieNodeFrequencies
reverses the outcomes of TrieNodeProbabilities
; it was made to facilitate the implementation and usage of TrieRemoval
.
The removal of a sub-tree from a trie is useful when we want to alternate the learning stored in the trie.
The new functions are illustrated with examples below. (These examples reuse and extend the ones in the blog post “Tries with frequencies for data mining”.)
Consider the trie made with the list of words
{"war", "ward", "warden", "work", "car", "cars", "carbs"}
The function TrieNodeFrequencies
can be used to reverse the effect of TrieNodeProbabilities
. Its first argument is a trie, the second one is a scaling parameter (which is 1 by default). Here is an example:
TrieForm[TrieNodeFrequencies[TrieNodeProbabilities[mytrie], 14]]
The function TrieRemove
can be used to remove sub-parts of a trie that correspond to a specified “word”. Here is an example using a trie with frequencies:
TrieForm[TrieRemove[mytrie, "car"]]
Here is an example using a trie with probabilities:
TrieForm[TrieRemove[TrieNodeProbabilities[mytrie], "car"]]
TrieRemove
has two options “HowManyTimes” and “FrequencyValues” with default values Automatic
. The option “HowManyTimes” takes a number or Automatic
and it is still under development (its functionality is partially correct). The option “FrequencyValues” takes True | False | Automatic
and it is used for the interpretation of the sub-tree removal. If the values are frequencies the removal process is more straightforward. If the values are probabilities the removal process has to take into account that the values on the same level of the sub-tree are conditional probabilities that sum to 1. TrieRemove
has two different implementations one for tries with frequencies and one for tries with probabilities. The option setting "FrequencyValues"->True|False
is used to redirect to the correct implementation. When "FrequencyValues"->Automatic
is given a simple heuristic is used to determine is the trie argument a trie with frequencies or a trie with probabilities.