Abstract:
A system and method are presented for the synthesis of speech from provided text. Particularly, the generation of parameters within the system is performed as a continuous approximation in order to mimic the natural flow of speech as opposed to a step-wise approximation of the feature stream. Provided text may be partitioned and parameters generated using a speech model. The generated parameters from the speech model may then be used in a post-processing step to obtain a new set of parameters for application in speech synthesis.
Abstract:
A system and method for learning alternate pronunciations for speech recognition is disclosed. Alternative name pronunciations may be covered, through pronunciation learning, that have not been previously covered in a general pronunciation dictionary. In an embodiment, the detection of phone-level and syllable-level mispronunciations in words and sentences may be based on acoustic models trained by Hidden Markov Models. Mispronunciations may be detected by comparing the likelihood of the potential state of the targeting pronunciation unit with a pre-determined threshold through a series of tests. It is also within the scope of an embodiment to detect accents.
Abstract:
A system and method are presented for the synthesis of speech from provided text. Particularly, the generation of parameters within the system is performed as a continuous approximation in order to mimic the natural flow of speech as opposed to a step-wise approximation of the feature stream. Provided text may be partitioned and parameters generated using a speech model. The generated parameters from the speech model may then be used in a post-processing step to obtain a new set of parameters for application in speech synthesis.
Abstract:
A system and method are presented for on premise and offline survivability of an interactive voice response system in a cloud telephony system. Voice interaction control may be divided from the media resources. Survivability is invoked when the communication technology between the Cloud and the voice interaction's resource provider is degraded or disrupted. The system is capable of recovering after a disruption event such that a seamless transition between failure and non-failure states is provided for a limited impact to a user's experience. When communication paths or Cloud control is reestablished, the user resumes normal processing and full functionality as if the failure had not occurred.
Abstract:
A system and method are presented for the encoding of participants in a conference setting. In an embodiment, audio from conference participants in a voice-over-IP setting may be received and processed by the system. In an embodiment, audio may be received in a compressed form and de-compressed for processing. For each participant, return audio is generated, compressed (if applicable) and transmitted to the participant. The system may recognize when participants are using the same audio encoding format and are thus receiving audio that may be similar or identical. The audio may only be encoded once instead of for each participant. Thus, redundant encodings are recognized and eliminated resulting in less CPU usage.
Abstract:
A system and method are presented for multi-factor authentication using voice biometric verification. When a user requests access to a system or application, voice identification may be triggered. An auditory connection is initiated with the user where the user may be prompted to speak the current value of their multi-factor authentication token. The captured voice of the user speaking is concurrently fed into an automatic speech recognition engine and a voice biometric verification engine. The automatic speech recognition system recognizes the digit sequence to verify that the user is in possession of the token and the voice biometric engine verifies that the speaker is the person claiming to be the user requesting access. The user is then granted access to the system or application once they have been verified.
Abstract:
A communication system including a media server through which communication packets are exchanged for recording and monitoring purposes is disclosed. A tap is associated with each communication endpoint allowing for cradle to grave recording of communications despite their subsequent routing or branching. An incoming communication is routed to a first tap and upon selection of a receiving party; the first tap is routed to a second tap which forwards communication packets on to the receiving party. The taps may be used to forward communication packets to any number of other taps or destinations, such as a recording device, monitoring user, or other user in the form of a conference.
Abstract:
A provisioning mechanism that may be used when a device is distributed to a third party over an untrusted distribution channel. The provisioning mechanism allows a server to recognize and trust the remote device. For example, a device may need to be paired to web services hosted in the cloud. The device, which has been delivered to a customer, may be on-premises, such as at a customer site or a data center. In order to avoid use by an unauthorized party, the device may have been shipped in an un-provisioned state. As a result, the customer will have to sync (or pair) the device to the cloud products hosted in the cloud in order to have full functionality. In an embodiment, the process for device pairing may only need to be completed once, upon initial start-up of the device.