Uniqueness point
The uniqueness point of a word is defined as the segment in a sequence after which that sequence can be uniquely identified. In cohort models of speech perception, it is after this point that a listener will recognize a word while it's being spoken.
Examples
using LexicalCharacteristics
sample_corpus = [
["K", "AE1", "T"], # cat
["K", "AA1", "B"], # cob
["B", "AE1", "T"], # bat
["T", "AE1", "T", "S"], # tats
["M", "AA1", "R", "K"], # mark
["K", "AE1", "B"], # cab
]
upt(sample_corpus, [["K", "AA1", "T"]]; inCorpus=true)
Query | UPT | |
---|---|---|
Array… | Any | |
1 | ["K", "AA1", "T"] | 2 |
Here, [K AA1 B] cob has a uniqueness point of 2. Looking at the corpus, we can be sure we're looking at cob after observing the [AA1] because nothing else begins with the sequence [K AA1]. Thus, its uniqueness point is 2.
using LexicalCharacteristics
sample_corpus = [
["K", "AE1", "T"], # cat
["K", "AA1", "B"], # cob
["B", "AE1", "T"], # bat
["T", "AE1", "T", "S"], # tats
["M", "AA1", "R", "K"], # mark
["K", "AE1", "B"], # cab
]
upt(sample_corpus, [["K", "AE1", "D"]]; inCorpus=false)
Query | UPT | |
---|---|---|
Array… | Any | |
1 | ["K", "AE1", "D"] | 3 |
As is evident, given this sample corpus, [K AE1 D] cad is unique after the 3rd segment. That is, it can be uniquely identified after hearing the [D].
using LexicalCharacteristics
sample_corpus = [
["K", "AE1", "T"], # cat
["K", "AA1", "B"], # cob
["B", "AE1", "T"], # bat
["T", "AE1", "T", "S"], # tats
["M", "AA1", "R", "K"], # mark
["K", "AE1", "B"], # cab
]
upt(sample_corpus, [["T", "AE1", "T"]]; inCorpus=false)
Query | UPT | |
---|---|---|
Array… | Any | |
1 | ["T", "AE1", "T"] | 4 |
Here, [T AE1 T] tat cannot be uniquely identified until after the sequence is complete, so its uniqueness point is one longer than its length.
Function documentation
LexicalCharacteristics.upt
— Methodupt(corpus, queries; inCorpus=true)
Calculates the phonological uniqueness point (upt) the items in queries
based on the items in corpus
. If the items are expected to be in the corpus, this function will calculate the uniqueness point to be when a branch can be considered to only represent 1 word. If the items are not expected to be in the corpus, the uniqueness point will be taken to be the depth at which the tree can no longer be traversed.
Parameters
- corpus The items comprising the corpus to compare against when calculating the uniqueness point of each query
- queries The items for which to calculate the uniqueness point
- inLexicon Whether the query items are expected to be in the corpus or not
Returns
- A
DataFrame
with the queries in the first column and the uniqueness points in the second