Uniqueness point

Uniqueness point

The uniqueness point of a word is defined as the segment in a sequence after which that sequence can be uniquely identified. In cohort models of speech perception, it is after this point that a listener will recognize a word while it's being spoken.

Examples

using LexicalCharacteristics
sample_corpus = [
["K", "AE1", "T"], # cat
["K", "AA1", "B"], # cob
["B", "AE1", "T"], # bat
["T", "AE1", "T", "S"], # tats
["M", "AA1", "R", "K"], # mark
["K", "AE1", "B"], # cab
]
upt(sample_corpus, [["K", "AA1", "T"]]; inCorpus=true)
1×2 DataFrames.DataFrame
│ Row │ Query                   │ UPT │
├─────┼─────────────────────────┼─────┤
│ 1   │ String["K", "AA1", "T"] │ 2   │

Here, [K AA1 B] cob has a uniqueness point of 2. Looking at the corpus, we can be sure we're looking at cob after observing the [AA1] because nothing else begins with the sequence [K AA1]. Thus, its uniqueness point is 2.

using LexicalCharacteristics
sample_corpus = [
["K", "AE1", "T"], # cat
["K", "AA1", "B"], # cob
["B", "AE1", "T"], # bat
["T", "AE1", "T", "S"], # tats
["M", "AA1", "R", "K"], # mark
["K", "AE1", "B"], # cab
]
upt(sample_corpus, [["K", "AE1", "D"]]; inCorpus=false)
1×2 DataFrames.DataFrame
│ Row │ Query                   │ UPT │
├─────┼─────────────────────────┼─────┤
│ 1   │ String["K", "AE1", "D"] │ 3   │

As is evident, given this sample corpus, [K AE1 D] cad is unique after the 3rd segment. That is, it can be uniquely identified after hearing the [D].

using LexicalCharacteristics
sample_corpus = [
["K", "AE1", "T"], # cat
["K", "AA1", "B"], # cob
["B", "AE1", "T"], # bat
["T", "AE1", "T", "S"], # tats
["M", "AA1", "R", "K"], # mark
["K", "AE1", "B"], # cab
]
upt(sample_corpus, [["T", "AE1", "T"]]; inCorpus=false)
1×2 DataFrames.DataFrame
│ Row │ Query                   │ UPT │
├─────┼─────────────────────────┼─────┤
│ 1   │ String["T", "AE1", "T"] │ 4   │

Here, [T AE1 T] tat cannot be uniquely identified until after the sequence is complete, so its uniqueness point is one longer than its length.

Function documentation

upt(corpus, queries; inCorpus=true)

Calculates the phonological uniqueness point (upt) the items in queries based on the items in corpus. If the items are expected to be in the corpus, this function will calculate the uniqueness point to be when a branch can be considered to only represent 1 word. If the items are not expected to be in the corpus, the uniqueness point will be taken to be the depth at which the tree can no longer be traversed.

Parameters

  • corpus The items comprising the corpus to compare against when calculating the uniqueness point of each query

  • queries The items for which to calculate the uniqueness point

  • inLexicon Whether the query items are expected to be in the corpus or not

Returns

  • A DataFrame with the queries in the first column and the uniqueness points in the second

source