System and method for recognizing speech with dialect grammars
    31.
    发明授权
    System and method for recognizing speech with dialect grammars 有权
    用方言语法识别语音的系统和方法

    公开(公告)号:US09082405B2

    公开(公告)日:2015-07-14

    申请号:US14554164

    申请日:2014-11-26

    CPC classification number: G10L15/19 G10L15/005 G10L15/1822 G10L15/183

    Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable media for recognizing speech. The method includes receiving speech from a user, perceiving at least one speech dialect in the received speech, selecting at least one grammar from a plurality of optimized dialect grammars based on at least one score associated with the perceived speech dialect and the perceived at least one speech dialect, and recognizing the received speech with the selected at least one grammar. Selecting at least one grammar can be further based on a user profile. Multiple grammars can be blended. Predefined parameters can include pronunciation differences, vocabulary, and sentence structure. Optimized dialect grammars can be domain specific. The method can further include recognizing initial received speech with a generic grammar until an optimized dialect grammar is selected. Selecting at least one grammar from a plurality of optimized dialect grammars can be based on a certainty threshold.

    Abstract translation: 这里公开了用于识别语音的系统,计算机实现的方法和计算机可读介质。 该方法包括从用户接收语音,感知所接收到的语音中的至少一个语音方言,基于与所感知的语音方言相关联的至少一个分数,从多个优化的方言语法中选择至少一个语法,以及感知的至少一个 语音方言,并用所选择的至少一种语法识别所接收的语音。 选择至少一个语法可以进一步基于用户简档。 可以混合多种语法。 预定义参数可以包括发音差异,词汇和句子结构。 优化的方言语法可以是域特定的。 该方法还可以包括用通用语法识别初始接收到的语音,直到选择优化的方言语法。 从多个优化方言语法中选择至少一个语法可以基于确定性阈值。

    System and method for standardized speech recognition infrastructure
    32.
    发明授权
    System and method for standardized speech recognition infrastructure 有权
    标准语音识别基础设施的系统和方法

    公开(公告)号:US09053704B2

    公开(公告)日:2015-06-09

    申请号:US14330739

    申请日:2014-07-14

    CPC classification number: G10L15/075 G10L15/063 G10L15/065 G10L15/07 G10L15/08

    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for selecting a speech recognition model in a standardized speech recognition infrastructure. The system receives speech from a user, and if a user-specific supervised speech model associated with the user is available, retrieves the supervised speech model. If the user-specific supervised speech model is unavailable and if an unsupervised speech model is available, the system retrieves the unsupervised speech model. If the user-specific supervised speech model and the unsupervised speech model are unavailable, the system retrieves a generic speech model associated with the user. Next the system recognizes the received speech from the user with the retrieved model. In one embodiment, the system trains a speech recognition model in a standardized speech recognition infrastructure. In another embodiment, the system handshakes with a remote application in a standardized speech recognition infrastructure.

    Abstract translation: 这里公开了用于在标准化语音识别基础设施中选择语音识别模型的系统,方法和计算机可读存储介质。 系统从用户接收语音,并且如果与用户相关联的用户特定的监督语音模型可用,则检索监督的语音模型。 如果用户特定的监督语音模型不可用,并且如果无人监督的语音模型可用,则系统检索无监督语音模型。 如果用户特定的监督语音模型和无监督语音模型不可用,则系统检索与用户相关联的通用语音模型。 接下来,系统使用所检索的模型识别来自用户的接收到的语音。 在一个实施例中,系统在标准化语音识别基础设施中训练语音识别模型。 在另一个实施例中,系统与标准语音识别基础设施中的远程应用握手。

    Utterance endpointing in task-oriented conversational systems

    公开(公告)号:US12243517B1

    公开(公告)日:2025-03-04

    申请号:US17500834

    申请日:2021-10-13

    Abstract: A task-oriented dialog system determines an endpoint in a user utterance by receiving incremental portions of a user utterance that is provided in real time during a task-oriented communication session between a user and a virtual agent (VA). The task-oriented dialog system recognizes words in the incremental portions using an automated speech recognition (ASR) model and generates semantic information for the incremental portions of the utterance by applying a natural language processing (NLP) model to the recognized words. An acoustic-prosodic signature of the incremental portions of the utterance is generated using an acoustic-prosodic model. The task-oriented dialog system can generate a feature vector that represents the incrementally recognized words, the semantic information, the acoustic-prosodic signature, and corresponding confidence scores of the model outputs. A model is applied to the feature vector to identify a likely endpoint in the user utterance.

    Annotating and modeling natural language semantics through annotation conversion

    公开(公告)号:US12154552B1

    公开(公告)日:2024-11-26

    申请号:US17462889

    申请日:2021-08-31

    Abstract: A natural language understanding (NLU) system generates in-place annotations for natural language utterances or other types of time-based media based on stand-off annotations. The in-place annotations are associated with particular sub-sequences of an annotation, which provides richer information than stand-off annotations, which are associated only with an utterance as a whole. To generate the in-place annotations for an utterance, the NLU system applies an encoder network and a decoder network to obtain attention weights for the various tokens within the utterance. The NLU system disqualifies tokens of the utterance based on their corresponding attention weights, and selects highest-scoring contiguous sequences of tokens between the disqualified tokens. In-place annotations are associated with the selected sequences.

    Extracting natural language semantics from speech without the use of speech recognition

    公开(公告)号:US11508355B1

    公开(公告)日:2022-11-22

    申请号:US16172115

    申请日:2018-10-26

    Abstract: Systems and methods are disclosed herein for discerning aspects of user speech to determine user intent and/or other acoustic features of a sound input without the use of an ASR engine. To this end, a processor may receive a sound signal comprising raw acoustic data from a client device, and divides the data into acoustic units. The processor feeds the acoustic units through a first machine learning model to obtain a first output and determines a first mapping, using the first output, of each respective acoustic unit to a plurality of candidate representations of the respective acoustic unit. The processor feeds each candidate representation of the plurality through a second machine learning model to obtain a second output, determines a second mapping, using the second output, of each candidate representation to a known condition, and determines a label for the sound signal based on the second mapping.

    Underspecification of intents in a natural language processing system

    公开(公告)号:US10216832B2

    公开(公告)日:2019-02-26

    申请号:US15384275

    申请日:2016-12-19

    Abstract: A natural language processing system has a hierarchy of user intents related to a domain of interest, the hierarchy having specific intents corresponding to leaf nodes of the hierarchy, and more general intents corresponding to ancestor nodes of the leaf nodes. The system also has a trained understanding model that can classify natural language utterances according to user intent. When the understanding model cannot determine with sufficient confidence that a natural language utterance corresponds to one of the specific intents, the natural language processing system traverses the hierarchy of intents to find a more general user intent that is related to the most applicable specific intent of the utterance and for which there is sufficient confidence. The general intent can then be used to prompt the user with questions applicable to the general intent to obtain the missing information needed for a specific intent.

    HIERARCHICAL SPEECH RECOGNITION DECODER
    37.
    发明申请

    公开(公告)号:US20190035389A1

    公开(公告)日:2019-01-31

    申请号:US16148884

    申请日:2018-10-01

    CPC classification number: G10L15/197 G10L15/02 G10L15/063 G10L2015/0631

    Abstract: A speech interpretation module interprets the audio of user utterances as sequences of words. To do so, the speech interpretation module parameterizes a literal corpus of expressions by identifying portions of the expressions that correspond to known concepts, and generates a parameterized statistical model from the resulting parameterized corpus. When speech is received the speech interpretation module uses a hierarchical speech recognition decoder that uses both the parameterized statistical model and language sub-models that specify how to recognize a sequence of words. The separation of the language sub-models from the statistical model beneficially reduces the size of the literal corpus needed for training, reduces the size of the resulting model, provides more fine-grained interpretation of concepts, and improves computational efficiency by allowing run-time incorporation of the language sub-models.

    Hierarchical speech recognition decoder

    公开(公告)号:US10096317B2

    公开(公告)日:2018-10-09

    申请号:US15131833

    申请日:2016-04-18

    Abstract: A speech interpretation module interprets the audio of user utterances as sequences of words. To do so, the speech interpretation module parameterizes a literal corpus of expressions by identifying portions of the expressions that correspond to known concepts, and generates a parameterized statistical model from the resulting parameterized corpus. When speech is received the speech interpretation module uses a hierarchical speech recognition decoder that uses both the parameterized statistical model and language sub-models that specify how to recognize a sequence of words. The separation of the language sub-models from the statistical model beneficially reduces the size of the literal corpus needed for training, reduces the size of the resulting model, provides more fine-grained interpretation of concepts, and improves computational efficiency by allowing run-time incorporation of the language sub-models.

Patent Agency Ranking