Abstract:
PROBLEM TO BE SOLVED: To provide new application programming language based on user interaction with an arbitrary device which is being used by a user for performing access to information in an arbitrary type. SOLUTION: In a desired execution configuration, this conversation type mark-up language(CML) is the language of a high level XML base for expressing a 'dialog' or 'conversation' to be performed by a user with a prescribed computing device. An application preparer can program an application by using the element of an interactive base called 'conversation type gesture'. Also, it is possible to realize the various execution configurations of a multi- modal browser for supporting the characteristics of the CML according to various modal specific expressions, for example, the graphical user interface(GUI) browser of an HTML base and the speech browser of a Voice XML base.
Abstract:
PROBLEM TO BE SOLVED: To perform automatic calling and data transfer processing, based on at least one of the recognition of a caller or an author and the time of calling or messaging by providing a switching means that processes a call according to at least one of between the identification of the caller and the subject of the cell and a programming means of a system. SOLUTION: A server 20 is programmed so as to automatically answer to an incoming telephone call, e-mail facsimile/modem, etc. When a received incoming call is a telephone call, a recording means 40 records audio data. Next, the identification of a caller is decided. More specifically, language statement and response of the caller are transmitted to a speaker recognition module 22 and are compared with a speaker model that is previously stored. If it is possible to identify it, identification is executed by using both the module 22 and an ASR/NLU module 24. Also, the ID of the caller may also be utilized for identification.
Abstract:
PROBLEM TO BE SOLVED: To provide the device and the method that reject access of the user of an exceptional class to a specific service or feature of a system by means of biological measurement identification and nonbiological measurement identification. SOLUTION: The device includes an acoustic model that represents related to each system user corresponding to a telephone number of each system user and a talker identification module that couples with a data base 12 in operation and acquires and decodes a voice sample from each user during trial by a potential system user going to make a phone call. The talker identification module compares the acquired and decoded voice sample with the acoustic model related to a telephone number dialed by the potential user and stored in advance and terminates a tried phone call by the potential user when the decoded voice sample is substantially in matching with the stored acoustic model.
Abstract:
A technique to improve the recognition accuracy when transcribing speech data that contains data from a wide range of environments. Input data in many situations contains data from a variety of sources in different environments. Such classes include: clean speech, speech corrupted by noise (e.g., music), non-speech (e.g., pure music with no speech), telephone speech, and the identity of a speaker. A technique is described whereby the different classes of data are first automatically identified, and then each class is transcribed by a system that is made specifically for it. The invention also describes a segmentation algorithm that is based on making up an acoustic model that characterizes the data in each class, and then using a dynamic programming algorithm (the viterbi algorithm) to automatically identify segments that belong to each class. The acoustic models are made in a certain feature space, and the invention also describes different feature spaces for use with different classes.
Abstract:
Apparatus for preventing unauthorized use of a voice dialing system and, particularly, a call forwarding feature associated with the system whereby system users may forward a telephone number respectively associated therewith to a remote location in order to receive phone calls at the remote location, comprises: a database for pre-storing telephone numbers of system users and for pre-storing acoustic models respectively representative of speech associated with each system user, the acoustic models respectively corresponding to the telephone numbers; and a speaker identification module operatively coupled to the database for obtaining and decoding a speech sample from a potential system user during the potential users' attempt to make a telephone call, the speaker identification module comparing the decoded speech sample obtained with the pre-stored acoustic model associated with the telephone number dialed by the potential user; whereby if the decoded speech sample substantially matches the pre-stored acoustic model, then the phone call attempted by the potential user is terminated.
Abstract:
Fast and detailed match techniques for speaker recognition are combined into a hybrid system in which speakers are associated in groups when potential confusion is detected between a speaker being enrolled and a previously enrolled speaker. Thus the detailed match techniques are invoked only at the potential onset of saturation of the fast match technique while the detailed match is facilitated by limitation of comparisons to the group and the development of speaker-dependent models which principally function to distinguish between members of a group rather than to more fully characterize each speaker. Thus storage and computational requirements are limited and fast and accurate speaker recognition can be extended over populations of speakers which would degrade or saturate fast match systems and degrade performance of detailed match systems.