Abstract:
A method for conversational computing includes executing code embodying a conversational virtual machine, registering a plurality of input/output resources with a conversational kernel, providing an interface between a plurality of active applications and the conversational kernel processing input/output data, receiving input queries and input events of a multi-modal dialog across a plurality of user interface modalities of the plurality of active applications, generating output messages and output events of the multi-modal dialog in connection with the plurality of active applications, managing, by the conversational kernel, a context stack associated with the plurality of active applications and the multi-modal dialog to transform the input queries into application calls for the plurality of active applications and convert the output messages into speech, wherein the context stack accumulates a context of each of the plurality of active applications.
Abstract:
A system (100) for intelligent caching and network management. The system (100) includes event and time information (102) representing a user's schedule, a location database (106) including information about destination devices (110) and capabilities of the destination devices (110), a predictor (104) which receives the event and time information (102) and the information and capabilities of the destination devices to predict a location of the user and/or resources needed at the location, such that the resources are transferred to the user at a location when and where the resources are needed. User preference profiles (108), which include user preference information, are further employed by the predictor (104) to predict a location of the user and/or resources needed at the location.
Abstract:
A method for conversational computing includes executing code embodying a conversational virtual machine, registering a plurality of input/output resources with a conversational kernel, providing an interface between a plurality of active applications and the conversational kernel processing input/output data, receiving input queries and input events of a multi-modal dialog across a plurality of user interface modalities of the plurality of active applications, generating output messages and output events of the multi-modal dialog in connection with the plurality of active applications, managing, by the conversational kernel, a context stack associated with the plurality of active applications and the multi-modal dialog to transform the input queries into application calls for the plurality of active applications and convert the output messages into speech, wherein the context stack accumulates a context of each of the plurality of active applications.
Abstract:
A method for conversational computing includes executing code embodying a conversational virtual machine, registering a plurality of input/output resources with a conversational kernel, providing an interface between a plurality of active applications and the conversational kernel processing input/output data, receiving input queries and input events of a multi-modal dialog across a plurality of user interface modalities of the plurality of active applications, generating output messages and output events of the multi-modal dialog in connection with the plurality of active applications, managing, by the conversational kernel, a context stack associated with the plurality of active applications and the multi-modal dialog to transform the input queries into application calls for the plurality of active applications and convert the output messages into speech, wherein the context stack accumulates a context of each of the plurality of active applications.
Abstract:
A method and apparatus for removing the effect of background music or noise from speech input to a speech recognizer so as to improve recognition accuracy has been devised. Samples of pure music or noise related to the background music or noise that corrupts the speech input are utilized to reduce the effect of the background in speech recognition. The pure music and noise samples can be obtained in a variety of ways. The music or noise corrupted speech input is segmented in overlapping segments and is then processed in two phases: first, the best matching pure music or noise segment is aligned with each speech segment; then a linear filter is built for each segment to remove the effect of background music or noise from the speech input and the overlapping segments are averaged to improve the signal to noise ratio. The resulting acoustic output can then be fed to a speech recognizer.
Abstract:
A technique to improve the recognition accuracy when transcribing speech data that contains data from a wide range of environments. Input data in many situations contains data from a variety of sources in different environments. Such classes include: clean speech, speech corrupted by noise (e.g., music), non-speech (e.g., pure music with no speech), telephone speech, and the identity of a speaker. A technique is described whereby the different classes of data are first automatically identified, and then each class is transcribed by a system that is made specifically for it. The invention also describes a segmentation algorithm that is based on making up an acoustic model that characterizes the data in each class, and then using a dynamic programming algorithm (the viterbi algorithm) to automatically identify segments that belong to each class. The acoustic models are made in a certain feature space, and the invention also describes different feature spaces for use with different classes.
Abstract:
A system (100) for intelligent caching and network management. The system (100) includes event and time information (102) representing a user's schedule, a location database (106) including information about destination devices (110) and capabilities of the destination devices (110), a predictor (104) which receives the event and time information (102) and the information and capabilities of the destination devices to predict a location of the user and/or resources needed at the location, such that the resources are transferred to the user at a location when and where the resources are needed. User preference profiles (108), which include user preference information, are further employed by the predictor (104) to predict a location of the user and/or resources needed at the location.
Abstract:
A conversational computing system that provides a universal coordinated multi-modal conversational user interface (CUI) (10) across a plurality of conversationally aware applications (11) (i.e., applications that "speak" conversational protocols) and conventional applications (12). The conversationally aware applications (11) communicate with a conversational kernel (14) via conversational application APIs (13). The conversational kernel (14) controls the dialog across applications and devices (local and networked) on the basis of their registered conversational capabilities and requirements and provides a unified conversational user interface and conversational services and behaviors. The conversational computing system may be built on top of a conventional operating system and APIs (15) and conventional device hardware (16). The conversational kernel (14) handles all I/O processing and controls conversational engines (18). The conversational kernel (14) converts voice requests into queries and converts outputs and results into spoken messages using conversational engines (18) and conversational arguments (17). The conversational application API (13) conveys all the information for the conversational kernel (14) to transform queries into application calls and conversely convert output into speech, appropriately sorted before being provided to the user.
Abstract:
A system and method for providing automatic and coordinated sharing of conversational resources, e.g. functions and arguments, between network-connected servers and devices, and their corresponding applications. In one aspect, a system for providing automatic and coordinated sharing of conversational resources comprises: a network comprising a first (100), and second (106) network device; the first (100) and second (106) network device each comprising a set of conversational resources (102, 107), a dialog manager (103, 108), for managing a conversation and executing calls requesting a conversational service, and a communication stack (111, 115), for communicating messages over a network using conversational protocols, wherein the conversational protocols establish coordinated network communication between the dialog managers of the first and second device to automatically share the set of conversational resources of the first and second network device, when necessary, to perform their respective requested conversational service.
Abstract:
A virtual workspace is provided for a user with a number of electronic devices, in which information can be exchanged among the electronic devices through a number of connections between the electronic devices. The virtual workspace is provided by determining where services are located and the type of the services, determining one or more data formats associated with data accessible by one or more of the electronic devices. A portion of the data has a given one of one or more data formats. An electronic device is selected based at least on predetermined criteria and the given data format. A route through the connections to the selected electronic devices is determined, where the route may comprise a given one or more of the connections. At least the portion of the data associated with the give data format is routed to the selected electronic device. The portion of the data is utilizable for presentation by the selected electronic device when received by the selected electronic device.