Abstract:
A method and apparatus for removing the effect of background music or noise from speech input to a speech recognizer so as to improve recognition accuracy has been devised. Samples of pure music or noise related to the background music or noise that corrupts the speech input are utilized to reduce the effect of the background in speech recognition. The pure music and noise samples can be obtained in a variety of ways. The music or noise corrupted speech input is segmented in overlapping segments and is then processed in two phases: first, the best matching pure music or noise segment is aligned with each speech segment; then a linear filter is built for each segment to remove the effect of background music or noise from the speech input and the overlapping segments are averaged to improve the signal to noise ratio. The resulting acoustic output can then be fed to a speech recognizer.
Abstract:
Embodiments of the invention provide a method, system and computer program product for voice print tagging for interactive voice response (IVR) session management. In an embodiment of the invention, a method of voiceprint tagging for IVR session management is provided. The method includes establishing an IVR session for a caller from over a network and presenting a portion of the IVR session to the caller over the network. The method also includes storing a voiceprint tag in memory associating a voiceprint of the caller with a portion of the IVR session. Finally, the method includes responding to a premature termination of the IVR session by re-establishing the prematurely terminated IVR session with the caller at the portion of the IVR session indicated by the voiceprint tag of the caller.
Abstract:
A method for conversational computing includes executing code embodying a conversational virtual machine, registering a plurality of input/output resources with a conversational kernel, providing an interface between a plurality of active applications and the conversational kernel processing input/output data, receiving input queries and input events of a multi-modal dialog across a plurality of user interface modalities of the plurality of active applications, generating output messages and output events of the multi-modal dialog in connection with the plurality of active applications, managing, by the conversational kernel, a context stack associated with the plurality of active applications and the multi-modal dialog to transform the input queries into application calls for the plurality of active applications and convert the output messages into speech, wherein the context stack accumulates a context of each of the plurality of active applications.
Abstract:
A method for conversational computing includes executing code embodying a conversational virtual machine, registering a plurality of input/output resources with a conversational kernel, providing an interface between a plurality of active applications and the conversational kernel processing input/output data, receiving input queries and input events of a multi-modal dialog across a plurality of user interface modalities of the plurality of active applications, generating output messages and output events of the multi-modal dialog in connection with the plurality of active applications, managing, by the conversational kernel, a context stack associated with the plurality of active applications and the multi-modal dialog to transform the input queries into application calls for the plurality of active applications and convert the output messages into speech, wherein the context stack accumulates a context of each of the plurality of active applications.
Abstract:
PROBLEM TO BE SOLVED: To provide a combination of a log-linear model with a multitude of speech features to recognize unknown speech utterances in a speech recognition system. SOLUTION: The speech recognition system models the posterior probability of linguistic units relevant to speech recognition using the log-linear model. The posterior model captures the probability of the linguistic unit given the observed speech features and the parameters of the posterior model. The posterior model may be determined using the probability of the word sequence hypotheses given a multitude of speech features. Log-linear models are used with features derived from sparse or incomplete data. The speech features that are utilized may include asynchronous, overlapping and statistically non-independent speech features. Not all features used in training need to appear in testing/recognition. COPYRIGHT: (C)2005,JPO&NCIPI
Abstract:
A conversational browsing system (10) comprising a conversational browser (11) having a command and control interface (12) for converting speech commands or multi-modal input from I/O resources (27) into navigation request. The system (10) comprises conversational engines (23) for decoding input commands for interpretation by the command and control interface and decoding meta-information provided by the CML processor for generating synthesized audio output. The system includes a communication stack (19) for transmitting the navigation request to a content server and receiving a CML file from the content server based on the navigation request. A conversational transcoder (13) transforms presentation material from one modality to a conversational modality. The transcoder (13) includes a functional transcoder (13a) to transform a page of GUI to a page of CUI (conversational user interface) and a logical transcoder (13b) to transform business logic of an application, transaction or site into an acceptable dialog.
Abstract:
A conversational computing system that provides a universal coordinated multi-modal conversational user interface (CUI) (10) across a plurality of conversationally aware applications (11) (i.e., applications that "speak" conversational protocols) and conventional applications (12). The conversationally aware applications (11) communicate with a conversational kernel (14) via conversational application APIs (13). The conversational kernel (14) controls the dialog across applications and devices (local and networked) on the basis of their registered conversational capabilities and requirements and provides a unified conversational user interface and conversational services and behaviors. The conversational computing system may be built on top of a conventional operating system and APIs (15) and conventional device hardware (16). The conversational kernel (14) handles all I/O processing and controls conversational engines (18). The conversational kernel (14) converts voice requests into queries and converts outputs and results into spoken messages using conversational engines (18) and conversational arguments (17). The conversational application API (13) conveys all the information for the conversational kernel (14) to transform queries into application calls and conversely convert output into speech, appropriately sorted before being provided to the user.
Abstract:
A communications apparatus is configured to bridge modalities and different communications formats. The apparatus may a bridge that receives an input through a modality gateway and to deliver an output through an output channel and at least one communication engine that translates the input into the output.
Abstract:
A facilities access system comprises a mobile device to request access; the gathering multifactor biometric data (such as voiceprint, fingerprint, face, iris, multibiometrics etc.) to authenticate the user; fixed sensor/s located in/near the facility; the validating the mobile and fixed-sensor data; and the granting of access if successful. The fixed-sensor data may also validate that the request was made in the vicinity of the facility, possibly by content of the access request, or by use of an outgoing challenge followed by receipt of challenge confirmation. The position of the mobile device may be determined in order to select the closest fixed sensor device. The authentication process may be conducted on the mobile device or at a remote server; whilst the fixed-sensors may be used to determine whether a user is present or absent at the facility. The fixed-sensor may act to confirm the same biometric factor as the mobile device.