# Text VP Tree

A vantage-point tree is a data structure that takes advantage of the spatial distribution of data and lets allows for faster searching through the data by lowering the amount of comparisons that need to be made. Consider the traditional example of phonological neighborhood density calculation. The code would be written to compare each item to all the other items. For $n$ items, there would be $n-1$ comparisons. So, to calculate the phonological neighborhood density for each item in a given corpus, there would need to be $n \times (n-1)\, = \, n^2-n$ comparisons. This is a lot of comparisons!

With a vantage-point tree, however, we might get an average of only needing $log_2(n)$ comparisons per query because of the way the data are organized. This means we would only need $n \times log_2(n)$ comparisons in total, which can be substantially lower than $n^2-n$ for larger corpora.

This impelentation is based on the description by Samet (2006).

## Function documentation

`LexicalCharacteristics.TextVPTree`

— Method.`TextVPTree(items::Array, d::Function)`

Outer constructor for a `TextVPTree`

. Takes in an array of items `items`

and a distance function `d`

and proceeds to build a vantage-point tree from them.

`LexicalCharacteristics.radiusSearch`

— Method.`radiusSearch(tree::TextVPTree, query, epsilon)`

Performs a search for all items in a VP tree `tree`

that are within a radius `epsilon`

from a query `query`

.

**Returns**

A `Vector`

of items that are within the given radius `epsilon`

`LexicalCharacteristics.nneighbors`

— Method.`nneighbors(tree::TextVPTree, query, n)`

Find the `n`

nearest neighbors in a VP tree `tree`

to a given query `query`

.

**Returns**

A

`PriorityQueue`

of items where the keys are the items themselves and the

values are the distances from the items to `query`

; the `PriorityQueue`

is defined such that small values have higher priorities than large ones

## References

Samet, H. (2006). *Foundations of multidimensional and metric data structures*. San Francisco, California: Morgan Kaufmann.