Removal of sub-trees in tries

I recently updated the package TriesWithFrequencies.m with two new functions TrieRemoval and TrieNodeFrequencies. The function TrieNodeFrequencies reverses the outcomes of TrieNodeProbabilities; it was made to facilitate the implementation and usage of TrieRemoval.

The removal of a sub-tree from a trie is useful when we want to alternate the learning stored in the trie.

The new functions are illustrated with examples below. (These examples reuse and extend the ones in the blog post “Tries with frequencies for data mining”.)
Consider the trie made with the list of words

{"war", "ward", "warden", "work", "car", "cars", "carbs"}

Trie

 

The function TrieNodeFrequencies can be used to reverse the effect of TrieNodeProbabilities. Its first argument is a trie, the second one is a scaling parameter (which is 1 by default). Here is an example:

TrieForm[TrieNodeFrequencies[TrieNodeProbabilities[mytrie], 14]]
TrieNodeFrequenciesTrieNodeProbabilities14

The function TrieRemove can be used to remove sub-parts of a trie that correspond to a specified “word”. Here is an example using a trie with frequencies:

TrieForm[TrieRemove[mytrie, "car"]]

TrieRemoveOfcar

Here is an example using a trie with probabilities:

TrieForm[TrieRemove[TrieNodeProbabilities[mytrie], "car"]]

TrieRemoveOfTrieNodeProbabilitiesOfcar

TrieRemove has two options “HowManyTimes” and “FrequencyValues” with default values Automatic. The option “HowManyTimes” takes a number or Automatic and it is still under development (its functionality is partially correct). The option “FrequencyValues” takes True | False | Automatic and it is used for the interpretation of the sub-tree removal. If the values are frequencies the removal process is more straightforward. If the values are probabilities the removal process has to take into account that the values on the same level of the sub-tree are conditional probabilities that sum to 1. TrieRemove has two different implementations one for tries with frequencies and one for tries with probabilities. The option setting "FrequencyValues"->True|False is used to redirect to the correct implementation. When "FrequencyValues"->Automatic is given a simple heuristic is used to determine is the trie argument a trie with frequencies or a trie with probabilities.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s