-
公开(公告)号:DE69518723T2
公开(公告)日:2001-05-23
申请号:DE69518723
申请日:1995-06-21
Applicant: IBM
Inventor: NAHAMOO DAVID , PADMANABHAN MUKUND
Abstract: A method for estimating the probability of phone boundaries as well as the accuracy of the acoustic modelling in cutting down a search-space in a speech recognition system. The accuracy of the acoustic modelling is quantified by the rank of the correct phone. The invention includes a microphone for converting an utterance into an electrical signal. The signal from the microphone is processed by an acoustic processor and label match which finds the best-matched acoustic label prototype from the acoustic label prototype store. A probability distribution on phone boundaries is then produced for every time frame using the first decision tree described in the invention. These probabilities are compared to a threshold and some time frames are identified as boundaries between phones. An acoustic score is computed, for all phones between every given pair of hypothesized boundaries, and the phones are ranked on the basis of this score. The second decision tree is traversed for every time frame to obtain the worst case rank of the correct phone at that time, and using the phone score and phone rank computed in, a shortlist of allowed phones is made up for every time frame. This information is used to select a subset of acoustic word models in store, and a fast acoustic word match processor matches the label string from the acoustic processor against this subset of abridged acoustic word models to produce an utterance signal. The utterance signal output by the fast acoustic word match processor comprises of at least one word. In general, however, the fast acoustic word match processor will output a number of candidate words. Each word signal produced by the fast acoustic word match processor is input into a word context match which compares the word context to language models in store and outputs at least one candidate word. From the recognition candidates produced by the fast acoustic match and the language model, the detailed acoustic match matches the label string from the acoustic processor against detailed acoustic word models in store and outputs a word string corresponding to an utterance.
-
公开(公告)号:DE69231309T2
公开(公告)日:2001-02-15
申请号:DE69231309
申请日:1992-09-30
Applicant: IBM
Inventor: BELLEGARDA EVELINE JEANNINE , BELLEGARDA JEROME RENE , NAHAMOO DAVID , NATHAN KRISHNA SUNDARAM
Abstract: Method and apparatus for automatic recognition of handwritten text based on a suitable representation of handwriting in one or several feature vector spaces(s), Gaussian modeling in each space, and mixture decoding to take into account the contribution of all relevant prototypes in all spaces. The feature vector space(s) is selected to encompass both a local and a global description of each appropriate point on a pen trajectory. Windowing is performed to capture broad trends in the handwriting, after which a linear transformation is applied to suitably eliminate redundancy. The resulting feature vector space(s) is called chirographic space(s). Gaussian modeling is performed to isolate adequate chirographic prototype distributions in each space, and the mixture coefficients weighting these distributions are trained using a maximum likelihood framework. Decoding can be performed simply and effectively by accumulating the contribution of all relevant prototype distributions. Post-processing using a language model may be included.
-
公开(公告)号:CA2345661A1
公开(公告)日:2000-04-13
申请号:CA2345661
申请日:1999-10-01
Applicant: IBM
Inventor: NAHAMOO DAVID , SEDIVY JAN , GOPALAKRISHNAN PONANI , LUCAS BRUCE D , MAES STEPHANE H
IPC: G06F3/16 , G06F9/44 , G06F9/46 , G06F9/54 , G06F12/00 , G06F15/00 , G06F17/28 , G06F17/30 , G06F40/00 , G10L13/00 , G10L15/22 , G10L15/26 , H04M1/253 , H04M1/27 , H04M1/725 , H04M3/42 , H04M3/44 , H04M3/493 , H04M3/50 , H04M7/00 , H04M11/00 , G06F15/16
Abstract: A conversational browsing system (10) comprising a conversational browser (1 1) having a command and control interface (12) for converting speech commands o r multi-modal input from I/O resources (27) into navigation request. The syste m (10) comprises conversational engines (23) for decoding input commands for interpretation by the command and control interface and decoding meta- information provided by the CML processor for generating synthesized audio output. The system includes a communication stack (19) for transmitting the navigation request to a content server and receiving a CML file from the content server based on the navigation request. A conversational transcoder (13) transforms presentation material from one modality to a conversational modality. The transcoder (13) includes a functional transcoder (13a) to transform a page of GUI to a page of CUI (conversational user interface) and a logical transcoder (13b) to transform business logic of an application, transaction or site into an acceptable dialog.
-
14.
公开(公告)号:CA1332195C
公开(公告)日:1994-09-27
申请号:CA570927
申请日:1988-06-30
Applicant: IBM
Inventor: BAHL LALIT R , MERCER ROBERT L , NAHAMOO DAVID
Abstract: RAPIDLY TRAINING A SPEECH RECOGNIZER TO A SUBSEQUENT SPEAKER GIVEN TRAINING DATA OF A REFERENCE SPEAKER Apparatus and method for training the statistics of a Markov Model speech recognizer to a subsequent speaker who utters part of a training text after the recognizer has been trained for the statistics of a reference speaker who utters a full training text. Where labels generated by an acoustic processor in response to uttered speech serve as outputs for Markov models, the present apparatus and method determine label output probabilities at transitions in the Markov models corresponding to the subsequent speaker where there is sparse training data. Specifically, label output probabilities for the subsequent speaker are re-parameterized based on confusion matrix entries having values indicative of the similarity between an ?th label output of the subsequent speaker and a kth label output for the reference speaker. The label output probabilities based on re-parameterized data are combined with initialized label output probabilities to form "smoothed" label output probabilities which feature smoothed probability distributions. Based on label outputs generated when the subsequent speaker utters the shortened training text, "basic" label output probabilities computed by conventional methodology are linearly averaged against the smoothed label output probabilities to produce improved label output probabilities.
-
公开(公告)号:CA2024382C
公开(公告)日:1994-08-02
申请号:CA2024382
申请日:1990-08-31
Applicant: IBM
Inventor: NADAS ARTHUR , NAHAMOO DAVID
Abstract: A method and apparatus for finding the best or near best binary classification of a set of observed events, according to a predictor feature X so as to minimize the uncertainty in the value of a category feature Y. Each feature has three or more possible values. First, the predictor feature value and the category feature value of each event is measured. From the measured predictor feature values, the joint probabilities of each category feature value and each predictor feature value are estimated. The events are then split, arbitrarily, into two sets of predictor feature values. From the estimated joint probabilities, the conditional probability of an event falling into one set of predictor feature values is calculated for each category feature value. A number of pairs of sets of category feature values are then defined where each set SYj contains only those category feature values having the j lowest values of the conditional probability. From among these pairs of sets, an optimum pair is found having the lowest uncertainty in the value of the predictor feature. From the optimum sets of category feature values, the conditional probability that an event falls within one set of category feature values; is calculated for each predictor feature value. A number of pairs of sets of predictor feature values are defined where each set SXi(t + 1) contains only those predictor feature values having the i lowest values of the conditional probability. From among the sets SXi a pair of sets is found having the lowest uncertainty in the value of the category feature. An event is then classified according to whether its predictor feature value is a member of the set of optimal predictor feature values.
-
公开(公告)号:DE112013006027T5
公开(公告)日:2015-09-24
申请号:DE112013006027
申请日:2013-09-25
Applicant: IBM
Inventor: MOHAMMED SIDDIQUE A , NAHAMOO DAVID , SHANMUGAM DHANDAPANI
Abstract: Ein Verfahren, das dazu dient, eine taktile Rückmeldung zu geben, weist das Anzeigen einer visuellen Darstellung eines physischen Objekts, das mindestens eine haptische Eigenschaft aufweist, das Erzeugen von zeitlich veränderlichen Daten, die zu der mindestens einen haptischen Eigenschaft von der visuellen Darstellung gehören, das Senden der zeitlich veränderlichen Daten an eine Datenverarbeitungseinheit, die eine Rückmeldevorrichtung enthält, welche mit der Datenverarbeitungseinheit elektrisch verbunden ist, und das Erzeugen der taktilen Rückmeldung über die Rückmeldevorrichtung als Reaktion auf einen von einem Benutzer auf die Rückmeldevorrichtung ausgeübten Druck auf.
-
公开(公告)号:DE102012220130A1
公开(公告)日:2013-05-23
申请号:DE102012220130
申请日:2012-11-06
Applicant: IBM
Inventor: BEN-DAVID SHAY , CONNELL JONATHAN HUDSON , HOORY RON , NAHAMOO DAVID , SICCONI ROBERTO
IPC: G06F21/32
Abstract: Ein Verfahren, ein System und ein Computerprogrammprodukt zum Zugang zu sicheren Einrichtungen werden bereitgestellt. Das Verfahren kann beinhalten: Empfangen einer Zugangsanfrage zu einer sicheren Einrichtung von einer Mobileinheit; Authentifizieren eines Benutzers mittels biometrischer Authentifizierung mit mehreren Faktoren mit Daten von der Mobileinheit; Erhalten von Daten von einer oder mehreren ortsfesten Sensoreinheiten an einem Standort in räumlicher Nähe der sicheren Einrichtung; Querprüfen der Daten von der Mobileinheit mit Daten von der einen oder den mehreren ortsfesten Sensoreinheiten; und Gewähren des Zugangs zur sicheren Einrichtung, wenn die Authentifizierung des Benutzers und die Querprüfung erfolgreich sind. Beim Querprüfen kann mithilfe von Daten von der einen oder den mehreren ortsfesten Sensoreinheiten ermittelt werden, ob die Zugangsanfrage von der Mobileinheit in der Nähe der sicheren Einrichtung erfolgt. Das Verfahren kann beinhalten: Erhalten von Daten von einer oder mehreren ortsfesten Sensoreinheiten und Verwenden der Daten, um Authentifizierungsdaten bereitzustellen; und Querprüfen einiger der Authentifizierungsdaten von der Mobileinheit mit einigen der Authentifizierungsdaten von der einen oder den mehreren ortsfesten Sensoreinheiten.
-
公开(公告)号:CA2345665C
公开(公告)日:2011-02-08
申请号:CA2345665
申请日:1999-10-01
Applicant: IBM
Inventor: COFFMAN DANIEL , COMERFORD LIAM D , DEGENNARO STEVEN V , EPSTEIN EDWARD A , GOPALAKRISHNAN PONANI , MAES STEPHANE H , NAHAMOO DAVID
IPC: G06F3/16 , G06F9/00 , G06F9/44 , G06F9/46 , G06F9/54 , G06F12/00 , G06F15/00 , G06F17/28 , G06F17/30 , G06F40/00 , G10L13/00 , G10L15/22 , G10L15/26 , H04M1/253 , H04M1/27 , H04M1/725 , H04M3/42 , H04M3/44 , H04M3/493 , H04M3/50 , H04M7/00 , H04M11/00
Abstract: A conversational computing system that provides a universal coordinated multi-modal conversational user interface (CUI)(10) across a plurality of conversationally aware applications (11) (i.e., applications that "speak" conversational protocols) and conventional applications (12). The conversationally aware applications (11) communicate with a conversational kernel (14) via conversational application APIs (13). The conversational kernel (14) controls the dialog across applications and devices (local and networked) on the basis of their registered conversational capabilities and requirements and provides a unified conversational user interface and conversational services and behaviors. The conversational computing system may be built on top of a conventional operating system and APIs (15) and conventional device hardware (16). The conversational kernel (14) handles all I/O processing and controls conversational engines (18). The conversational kernel (14) converts voice requests into queries and converts outputs and results into spoken messages using conversational engines (18) and conversational arguments (17). The conversational application API (13) conveys all the information for the conversational kernel (14) to transform queries into application calls and conversely convert output into speech, appropriately sorted before being provided to the user.
-
公开(公告)号:DE69423692T2
公开(公告)日:2000-09-28
申请号:DE69423692
申请日:1994-09-08
Applicant: IBM
Inventor: EPSTEIN MARK EDWARD , GOPALAKRISHNAN PONANI S , NAHAMOO DAVID , PICHENY MICHAEL ALAN , SEDIVY JAN
Abstract: A speech coding apparatus and method uses classification rules to code an utterance while consuming fewer computing resources. The value of at least one feature of an utterance is measured during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values. Classification rules map each feature vector signal from a set of all possible feature vector signals to exactly one of at least two different classes of prototype vector signals. Each class contains a plurality of prototype vector signals. According to the classification rules, a first feature vector signal is mapped to a first class of prototype vector signals. The closeness of the feature value of the first feature vector signal is compared to the parameter values of only the prototype vector signals in the first class of prototype vector signals to obtain prototype match scores for the first feature vector signal and each prototype vector signal in the first class. At least the identification value of at least the prototype vector signal having the best prototype match score is output as a coded utterance representation signal of the first feature vector signal.
-
公开(公告)号:DE69231309D1
公开(公告)日:2000-09-07
申请号:DE69231309
申请日:1992-09-30
Applicant: IBM
Inventor: BELLEGARDA EVELINE JEANNINE , BELLEGARDA JEROME RENE , NAHAMOO DAVID , NATHAN KRISHNA SUNDARAM
Abstract: Method and apparatus for automatic recognition of handwritten text based on a suitable representation of handwriting in one or several feature vector spaces(s), Gaussian modeling in each space, and mixture decoding to take into account the contribution of all relevant prototypes in all spaces. The feature vector space(s) is selected to encompass both a local and a global description of each appropriate point on a pen trajectory. Windowing is performed to capture broad trends in the handwriting, after which a linear transformation is applied to suitably eliminate redundancy. The resulting feature vector space(s) is called chirographic space(s). Gaussian modeling is performed to isolate adequate chirographic prototype distributions in each space, and the mixture coefficients weighting these distributions are trained using a maximum likelihood framework. Decoding can be performed simply and effectively by accumulating the contribution of all relevant prototype distributions. Post-processing using a language model may be included.
-
-
-
-
-
-
-
-
-