COMPUTER-IMPLEMENTED DEEP TENSOR NEURAL NETWORK
    1.
    发明申请
    COMPUTER-IMPLEMENTED DEEP TENSOR NEURAL NETWORK 审中-公开
    计算机实现深度传感器神经网络

    公开(公告)号:WO2014035738A1

    公开(公告)日:2014-03-06

    申请号:PCT/US2013/055898

    申请日:2013-08-21

    CPC classification number: G06N3/02 G06N3/04 G06N3/0454 G06N3/084

    Abstract: A deep tensor neural network (DTNN) is described herein, wherein the DTNN is suitable for employment in a computer-implemented recognition/classification system. Hidden layers in the DTNN comprise at least one projection layer, which includes a first subspace of hidden units and a second subspace of hidden units. The first subspace of hidden units receives a first nonlinear projection of input data to a projection layer and generates the first set of output data based at least in part thereon, and the second subspace of hidden units receives a second nonlinear projection of the input data to the projection layer and generates the second set of output data based at least in part thereon. A tensor layer, which can converted into a conventional layer of a DNN, generates the third set of output data based upon the first set of output data and the second set of output data.

    Abstract translation: 本文描述了深张量神经网络(DTNN),其中DTNN适合于在计算机实现的识别/分类系统中的使用。 DTNN中的隐藏层包括至少一个投影层,其包括隐藏单元的第一子空间和隐藏单元的第二子空间。 隐藏单元的第一子空间至少部分地将输入数据的第一非线性投影接收到投影层,并且至少部分地生成第一组输出数据,并且隐藏单元的第二子空间接收输入数据的第二非线性投影 投影层并且至少部分地基于其生成第二组输出数据。 可以转换成DNN的常规层的张量层基于第一组输出数据和第二组输出数据产生第三组输出数据。

    AUTOMATIC READING TUTORING
    2.
    发明申请
    AUTOMATIC READING TUTORING 审中-公开
    自动阅读指导

    公开(公告)号:WO2009035825A2

    公开(公告)日:2009-03-19

    申请号:PCT/US2008/073570

    申请日:2008-08-19

    CPC classification number: G10L15/18 G09B17/003 G10L15/183

    Abstract: A method of providing automatic reading tutoring is disclosed. The method includes retrieving a textual indication of a story from a data store and creating a language model including constructing a target context free grammar indicative of a first portion of the story. A first acoustic input is received and a speech recognition engine is employed to recognize the first acoustic input. An output of the speech recognition engine is compared to the language model and a signal indicative of whether the output of the speech recognition matches at least a portion of the target context free grammar is provided.

    Abstract translation: 公开了提供自动阅读辅导的方法。 该方法包括从数据存储中检索故事的文本指示并创建包括构建指示故事的第一部分的目标上下文自由语法的语言模型。 接收到第一声输入,并且采用语音识别引擎来识别第一声输入。 将语音识别引擎的输出与语言模型进行比较,并且提供指示语音识别的输出是否与目标语境自由语法的至少一部分匹配的信号。

    METHOD OF DETERMINING UNCERTAINTY ASSOCIATED WITH NOISE REDUCTION
    3.
    发明申请
    METHOD OF DETERMINING UNCERTAINTY ASSOCIATED WITH NOISE REDUCTION 审中-公开
    确定与噪声相关的不确定度的方法

    公开(公告)号:WO2003100769A1

    公开(公告)日:2003-12-04

    申请号:PCT/US2003/016032

    申请日:2003-05-20

    CPC classification number: G10L15/20 G10L21/0208

    Abstract: A method and apparatus are provided for determining uncertainty in noise reduction based on a parametric model of speech distortion. The method is first used to reduce noise in a noisy signal. In particular, noise is reduced (304) from a representation of a portion of a noisy signal to produce a representation of a cleaned signal by utilizing an acoustic environment model (413). The uncertainty associated with the noise reduction process is then computed. In one embodiment, the uncertainty of the noise reduction process is used, in conjunction with the noise-reduced signal, to decode (306) a pattern state.

    Abstract translation: 提供了一种基于语音失真的参数模型来确定降噪中的不确定性的方法和装置。 该方法首先用于降低噪声信号中的噪声。 特别地,从噪声信号的一部分的表示中减少噪声(304),以通过利用声学环境模型(413)产生清洁信号的表示。 然后计算与降噪过程相关的不确定性。 在一个实施例中,噪声降低处理的不确定性与噪声降低信号结合使用以解码(306)模式状态。

    MULTILINGUAL DEEP NEURAL NETWORK
    4.
    发明申请
    MULTILINGUAL DEEP NEURAL NETWORK 审中-公开
    多层神经网络

    公开(公告)号:WO2014164080A1

    公开(公告)日:2014-10-09

    申请号:PCT/US2014/020448

    申请日:2014-03-05

    CPC classification number: G10L15/063 G06N3/0454 G06N3/084 G10L15/16

    Abstract: Described herein are various technologies pertaining to a multilingual deep neural network (MDNN). The MDNN includes a plurality of hidden layers, wherein values for weight parameters of the plurality of hidden layers are learned during a training phase based upon training data in terms of acoustic raw features for multiple languages. The MDNN further includes softmax layers that are trained for each target language separately, making use of the hidden layer values trained jointly with multiple source languages. The MDNN is adaptable, such that a new softmax layer may be added on top of the existing hidden layers, where the new softmax layer corresponds to a new target language.

    Abstract translation: 这里描述的是涉及多语言深层神经网络(MDNN)的各种技术。 MDNN包括多个隐藏层,其中基于针对多种语言的声学原始特征的训练数据,在训练阶段期间学习多个隐藏层的权重参数的值。 MDNN还包括针对每种目标语言分别训练的softmax层,利用与多种源语言联合训练的隐藏层值。 MDNN是适应性的,使得可以在现有隐藏层之上添加新的softmax层,其中新的softmax层对应于新的目标语言。

    SPEECH-CENTRIC MULTIMODAL USER INTERFACE DESIGN IN MOBILE TECHNOLOGY
    5.
    发明申请
    SPEECH-CENTRIC MULTIMODAL USER INTERFACE DESIGN IN MOBILE TECHNOLOGY 审中-公开
    语音中心多用户界面设计在移动技术

    公开(公告)号:WO2008113063A1

    公开(公告)日:2008-09-18

    申请号:PCT/US2008/057175

    申请日:2008-03-15

    Inventor: YU, Dong DENG, Li

    CPC classification number: G06F3/038 G06F2203/0381 G10L15/24

    Abstract: A multi-modal human computer interface (HCI) receives a plurality of available information inputs concurrently, or serially, and employs a subset of the inputs to determine or infer user intent with respect to a communication or information goal. Received inputs are respectively parsed, and the parsed inputs are analyzed and optionally synthesized with respect to one or more of each other. In the event sufficient information is not available to determine user intent or goal, feedback can be provided to the user in order to facilitate clarifying, confirming, or augmenting the information inputs.

    Abstract translation: 多模式人机界面(HCI)同时或串行地接收多个可用信息输入,并且使用输入的子集来确定或推断关于通信或信息目标的用户意图。 分别对接收到的输入进行解析,并且解析输入相对于彼此中的一个或多个进行分析并任选地合成。 如果没有足够的信息来确定用户意图或目标,则可以向用户提供反馈,以便于澄清,确认或增加信息输入。

    INTEGRATED SPEECH RECOGNITION AND SEMANTIC CLASSIFICATION
    6.
    发明申请
    INTEGRATED SPEECH RECOGNITION AND SEMANTIC CLASSIFICATION 审中-公开
    综合语音识别和语义分类

    公开(公告)号:WO2008089470A1

    公开(公告)日:2008-07-24

    申请号:PCT/US2008/051584

    申请日:2008-01-21

    CPC classification number: G10L15/1815

    Abstract: A novel system integrates speech recognition and semantic classification, so that acoustic scores in a speech recognizer that accepts spoken utterances may be taken into account when training both language models and semantic classification models. For example, a joint association score may be defined that is indicative of a correspondence of a semantic class and a word sequence for an acoustic signal. The joint association score may incorporate parameters such as weighting parameters for signal-to-class modeling of the acoustic signal, language model parameters and scores, and acoustic model parameters and scores. The parameters may be revised to raise the joint association score of a target word sequence with a target semantic class relative to the joint association score of a competitor word sequence with the target semantic class. The parameters may be designed so that the semantic classification errors in the training data are minimized.

    Abstract translation: 一种新颖的系统集成了语音识别和语义分类,从而在训练语言模型和语义分类模型时,可以考虑接受讲话语音的语音识别器中的声学分数。 例如,可以定义联合关联分数,其表示声学信号的语义类别和单词序列的对应关系。 联合关联评分可以包括参数,例如声信号的信号到类建模的加权参数,语言模型参数和分数,以及声学模型参数和分数。 可以修改参数以相对于具有目标语义类的竞争者词序列的联合关联分数来提高目标词序列与目标语义类别的联合关联分数。 参数可以被设计成使训练数据中的语义分类误差最小化。

    AUTOMATIC READING TUTORING WITH PARALLEL POLARIZED LANGUAGE MODELING
    7.
    发明申请
    AUTOMATIC READING TUTORING WITH PARALLEL POLARIZED LANGUAGE MODELING 审中-公开
    具有平行极化语言建模的自动阅读引导

    公开(公告)号:WO2008089469A1

    公开(公告)日:2008-07-24

    申请号:PCT/US2008/051582

    申请日:2008-01-21

    CPC classification number: G06F17/271 G09B17/003 G10L15/197 G10L2015/221

    Abstract: A novel system for automatic reading tutoring provides effective error detection and reduced false alarms combined with low processing time burdens and response times short enough to maintain a natural, engaging flow of interaction. According to one illustrative embodiment, an automatic reading tutoring method includes displaying a text output and receiving an acoustic input. The acoustic input is modeled with a domain-specific target language model specific to the text output, and with a general-domain garbage language model, both of which may be efficiently constructed as context-free grammars. The domain-specific target language model may be built dynamically or "on-the-fly" based on the currently displayed text (eg the story to be read by the user), while the general-domain garbage language model is shared among all different text outputs. User-perceptible tutoring feedback is provided based on the target language model and the garbage language model.

    Abstract translation: 用于自动阅读辅导的新颖系统提供了有效的错误检测和减少的假警报以及较短的处理时间负担和响应时间足够短以保持自然的,互动的互动流。 根据一个说明性实施例,自动阅读辅导方法包括显示文本输出并接收声输入。 声输入是用专门针对文本输出的领域特定的目标语言模型建立的,并且具有通用域垃圾语言模型,这两种语言模型都可以被有效地构建为无上下文的语法。 可以基于当前显示的文本(例如,用户要阅读的故事)动态地或“即时”地构建域特定目标语言模型,而一般域垃圾语言模型在所有不同的方式之间共享 文本输出。 基于目标语言模型和垃圾语言模型提供了用户可感知的辅导反馈。

    FULL-SEQUENCE TRAINING OF DEEP STRUCTURES FOR SPEECH RECOGNITION

    公开(公告)号:WO2012039938A3

    公开(公告)日:2012-03-29

    申请号:PCT/US2011/050738

    申请日:2011-09-07

    Abstract: A method is disclosed herein that include an act of causing a processor to access a deep-structured model retained in a computer-readable medium, wherein the deep-structured model comprises a plurality of layers with weights assigned thereto, transition probabilities between states, and language model scores. The method can further include the act of jointly substantially optimizing the weights, the transition probabilities, and the language model scores of the deep-structured model using the optimization criterion based on a sequence rather than a set of unrelated frames.

    DEEP BELIEF NETWORK FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
    9.
    发明申请
    DEEP BELIEF NETWORK FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION 审中-公开
    DEEP BELIEF网络用于大型语音连续语音识别

    公开(公告)号:WO2012036934A1

    公开(公告)日:2012-03-22

    申请号:PCT/US2011/050472

    申请日:2011-09-06

    CPC classification number: G10L15/14 G06N3/0454 G06N3/084

    Abstract: A method is disclosed herein that includes an act of causing a processor to receive a sample, wherein the sample is one of spoken utterance, an online handwriting sample, or a moving image sample. The method also comprises the act of causing the processor to decode the sample based at least in part upon an output of a combination of a deep structure and a context-dependent Hidden Markov Model (HMM), wherein the deep structure is configured to output a posterior probability of a context-dependent unit. The deep structure is a Deep Belief Network consisting of many layers of nonlinear units with connecting weights between layers trained by a pretraining step followed by a fine-tuning step.

    Abstract translation: 本文公开了一种包括使处理器接收样本的动作的方法,其中样本是口语发音之一,在线手写样本或运动图像样本之一。 该方法还包括使处理器至少部分地基于深结构和上下文相关隐马尔可夫模型(HMM)的组合的输出来解码样本的动作,其中深结构被配置为输出 上下文相关单位的后验概率。 深层结构是由许多非线性单元组成的深层信念网络,其中层之间的连接权重通过预培训步骤后跟微调步骤训练。

    INCREMENTALLY REGULATED DISCRIMINATIVE MARGINS IN MCE TRAINING FOR SPEECH RECOGNITION
    10.
    发明申请
    INCREMENTALLY REGULATED DISCRIMINATIVE MARGINS IN MCE TRAINING FOR SPEECH RECOGNITION 审中-公开
    在MCE培训中进行语音识别的严格管制歧视

    公开(公告)号:WO2008024148A1

    公开(公告)日:2008-02-28

    申请号:PCT/US2007/014409

    申请日:2007-06-20

    CPC classification number: G10L15/063 G10L15/144

    Abstract: A method and apparatus for training an acoustic model are disclosed. A training corpus is accessed and converted into an initial acoustic model. Scores are calculated for a correct class and competitive classes, respectively, for each token given the acoustic model. From this score a misclassification measure is calculated and then a loss function is calculated from the misclassification measure. The loss function also includes a margin value that varies over each iteration in the training. Based on the calculated loss function the acoustic model is updated, where the loss function with the margin value is minimized. This process repeats until such time as an empirical convergence is met.

    Abstract translation: 公开了一种用于训练声学模型的方法和装置。 训练语料库被访问并转换成初始声学模型。 对于给定声学模型的每个令牌,分数是针对正确的类和竞争类别计算的。 从该分数计算错误分类度量,然后从错误分类度量计算损失函数。 损失函数还包括在训练中每次迭代变化的保证金值。 基于计算的损耗函数,更新声学模型,其中具有余量值的损失函数被最小化。 该过程重复,直到满足经验收敛的时间为止。

Patent Agency Ranking