Abstract:
Speech and/or non-speech in an audio input are convolved to localize sounds to different locations for a user. An audio diarization system segments the audio input into speech and non-speech segments. These segments are convolved with one or more head related transfer functions (HRTFs) so the sounds localize to different sound localization points (SLPs) for the user.
Abstract:
A computer-implemented system and method for performing distributed speech recognition is provided. Audio data is collected. A main grammar and secondary grammars are simultaneously provided for the audio data. Each secondary grammar includes an independent grammar. Speech recognition is simultaneously performed on the audio data using each secondary grammar. A new grammar is constructed for the audio data based on the main grammar template using results of the speech recognition. Further speech recognition is performed on the audio data using the new grammar.
Abstract:
A computer-implemented system and method for efficient voice transcription is provided. A verbal message is processed by splitting the verbal message into segments and generating text for each of the segments via automated speech recognition. A confidence score is assigned to each text segment. The text segments are provided to workbenches, in order, staring with the text segment having a lowest confidence score. For at least one text segment provided to the workbench, one of edits to the text segment and manually transcribed text to replace the text segment are received. A threshold is applied to a time for performing the message processing and upon satisfaction of the threshold, the message processing is terminated. A text message is generated for the verbal message based on one of the generated text segment, manual transcription, or edited text segment for each of the text segments in that verbal message.
Abstract:
There is disclosed a method in a data processing system for automatically and accurately determining an outcome of a phone call by using signifiers or audibles as a way to increase accuracy without altering the flow of the conversation. The disclosed method can be used in any call center, such as a high volume call center used in the financial (banking), insurance, rental (hotel or car), ticket sales, and the like. The method comprises receiving voice data of a phone call; transmitting a response communication based on the voice data, wherein the response communication includes at least one audible or signifier; identifying, by at least one processor, the at least one audible or signifier in the response communication; and automatically determining the outcome of the phone call based on the audible or signifier in the response communication. A data processing system and a non-transitory computer-readable medium for storing instructions consistent with the described method are also disclosed.
Abstract:
Unwanted calls are detected by determining all calls which are unverified by prior prompting the caller to provide data, such as “press 5 to be connected” or “say ‘proceed’” before being allowed to connect. Once connected, the called party may indicate that the call was/is unwanted and should be disconnected. Then, the call is disconnected from the called party while being maintained with the switch. The call is also recorded in embodiments of the disclosed technology, with the audio therefrom, or audio signature, being used to detect future unwanted calls. The detection of future unwanted calls may further be modified or determined based on association of called parties to each other, which, additionally, may be used to change the threshold of closeness of audio signatures between calls.
Abstract:
A system and method for the automated monitoring of inmate telephone calls as well as multi-modal search, retrieval and playback capabilities for said calls. A general term for such capabilities is multi-modal audio mining. The invention is designed to provide an efficient means for organizations such as correctional facilities to identify and monitor the contents of telephone conversations and to provide evidence of possible inappropriate conduct and/or criminal activity of inmates by analyzing monitored telephone conversations for events, including, but not limited to, the addition of third parties, the discussion of particular topics, and the mention of certain entities.
Abstract:
A method and system for transcription of spoken language into continuous text for a user comprising the steps of inputting spoken language of at least one user or of a communication partner of the at least one user into a mobile device of the respective user, wherein the input spoken language of the user is transported within a corresponding stream of voice over IP data packets to a transcription server; transforming the spoken language transported within the respective stream of voice over IP data packets into continuous text by means of a speech recognition algorithm run by said transcription server, wherein said speech recognition algorithm is selected depending on a natural language or dialect spoken in the area of the current position of said mobile device; and outputting said transformed continuous text forwarded by said transcription server to said mobile device of the respective user or to a user terminal of the respective user in real time.
Abstract:
A method and apparatus of processing a voice call are disclosed. One example method of operation may include recording at least a portion of a voice call, and storing the portion of the voice call in memory. The method may also include processing the portion of the voice call to identify at least one segment of interest, and forwarding the at least one segment of interest to a new call party responsive to a call transfer action.
Abstract:
Contact centers often record customer-agent communications for training, quality control, and other purposes. Supervisors often contribute to the communications in the form of voice messages only the agent can hear (e.g., whisper mode) and/or text messages displayed on a screen and optionally documents or other content that may be provided or shared by the supervisor to the agent. Capturing the supervisor's contribution to a communication is provided to enable later playback or review of the communication to include all inputs and enable a more complete understanding of the actions that were or were not taken.
Abstract:
A system and method is provided for providing searchable customer call indexes. Consistent with disclosed embodiments, a system may receive call information associated with telephone conversations between callers and a vendor, the call information including an audio recording or transcript for each telephone conversation. The system may also identify one or more keywords from the audio recordings or transcripts and index the call information into one or more indexes based on the identified keywords. Finally, the system may determine search results responsive to a search query based on the indexing. In some embodiments, changes to customer service may be identified based on the search results.