Text VP Tree

Text VP Tree

A vantage-point tree is a data structure that takes advantage of the spatial distribution of data and lets allows for faster searching through the data by lowering the amount of comparisons that need to be made. Consider the traditional example of phonological neighborhood density calculation. The code would be written to compare each item to all the other items. For $n$ items, there would be $n-1$ comparisons. So, to calculate the phonological neighborhood density for each item in a given corpus, there would need to be $n \times (n-1)\, = \, n^2-n$ comparisons. This is a lot of comparisons!

With a vantage-point tree, however, we might get an average of only needing $log_2(n)$ comparisons per query because of the way the data are organized. This means we would only need $n \times log_2(n)$ comparisons in total, which can be substantially lower than $n^2-n$ for larger corpora.

This impelentation is based on the description by Samet (2006).

Function documentation

TextVPTree(items::Array, d::Function)

Outer constructor for a TextVPTree. Takes in an array of items items and a distance function d and proceeds to build a vantage-point tree from them.

source
radiusSearch(tree::TextVPTree, query, epsilon)

Performs a search for all items in a VP tree tree that are within a radius epsilon from a query query.

Returns

A Vector of items that are within the given radius epsilon

source
nneighbors(tree::TextVPTree, query, n)

Find the n nearest neighbors in a VP tree tree to a given query query.

Returns

  • A PriorityQueue of items where the keys are the items themselves and the

values are the distances from the items to query; the PriorityQueue is defined such that small values have higher priorities than large ones

source

References

Samet, H. (2006). Foundations of multidimensional and metric data structures. San Francisco, California: Morgan Kaufmann.