Abstract:
A novel system for automatic reading tutoring provides effective error detection and reduced false alarms combined with low processing time burdens and response times short enough to maintain a natural, engaging flow of interaction. According to one illustrative embodiment, an automatic reading tutoring method includes displaying a text output and receiving an acoustic input. The acoustic input is modeled with a domain-specific target language model specific to the text output, and with a general-domain garbage language model, both of which may be efficiently constructed as context-free grammars. The domain-specific target language model may be built dynamically or "on-the-fly" based on the currently displayed text (eg the story to be read by the user), while the general-domain garbage language model is shared among all different text outputs. User-perceptible tutoring feedback is provided based on the target language model and the garbage language model.
Abstract:
An answering machine detection module is used to determine whether a call recipient is an actual person or an answering machine. The answering machine detection module includes a speech recognizer and a call analysis module. The speech recognizer receives an audible response of the call recipient to a call. The speech recognizer processes the audible response and provides an output indicative of recognized speech. The call analysis module processes the output of the speech recognizer to generate an output indicative of whether the call recipient is a person or an answering machine.
Abstract:
Both speech and alternate modality inputs are used in inputting information spoken into a mobile device. The alternate modality inputs can be used to perform sequential commitment of words in a speech recognition result.
Abstract:
In a method for tracking pitch in a speech signal (200), first and second window vectors, x>t t-p, are created from samples (414, 416, 418, 408, 410, 412) taken across first and second windows (402, 400) of the speech signal. The first window (402) is separated from the second window (400) by a test pitch period (406). The energy of the speech signal in the first window is combined with the correlation between the first window vector and the second window vector to produce a predictable energy factor. The predictable energy factor is then used to determine a pitch score for the test pitch period. Based in part on the pitch score, a portion of the pitch track is identified.
Abstract:
An index for searching spoken documents having speech data and text meta-data is created by obtaining probabilities of occurrence of words and positional information of the words of the speech data and combining it with at least positional information of the words in the text meta-data. A single index can be created because the speech data and the text meta-data are treated the same and considered only different categories .
Abstract:
A method and apparatus determine a likelihood of a speech state based on an alternative sensor signal (316) and an air conduction microphone signal (318). The likelihood of the speech state is used, together with the alternative sensor signal and the air conduction microphone signal, to estimate (322) a clean speech value for a clean speech signal (324).
Abstract:
A method and apparatus determine a likelihood of a speech state based on an alternative sensor signal (316) and an air conduction microphone signal (318). The likelihood of the speech state is used, together with the alternative sensor signal and the air conduction microphone signal, to estimate (322) a clean speech value for a clean speech signal (324).
Abstract:
An answering machine detection module is used to determine whether a call recipient is an actual person or an answering machine. The answering machine detection module includes a speech recognizer and a call analysis module. The speech recognizer receives an audible response of the call recipient to a call. The speech recognizer processes the audible response and provides an output indicative of recognized speech. The call analysis module processes the output of the speech recognizer to generate an output indicative of whether the call recipient is a person or an answering machine.
Abstract:
A method and apparatus to determine a channel response for an alternative sensor using an alternative sensor signal and an air conduction microphone signal (500). The channel response and a prior probabillity distuibution for clean speech valuse then used to estimate a clean speech value (502, 504, 506 and 508).