Abstract:
One or more media items can be bound to a voice call using a binding protocol. The binding protocol allows call participants to more easily transfer media items to other call participants using one or more user interfaces. A call participant can initiate a media transfer by selecting the media and a communication modality for transferring the media. The binding protocol can be active or lazy. In lazy binding, the call participant can select the desired media for transfer before the voice call is established, and subsequently mark the media for binding with the voice call. In active binding, the call participant can select and transfer the desired media item during the voice call, and the media item is automatically bound to the voice call. The media item can be transferred using a user-selected communication modality over an independent data communication channel.
Abstract:
Systems and methods are provided for associating a phonetic pronunciation with a name by receiving the name, mapping the name to a plurality of monosyllabic components that are combinable to construct the phonetic pronunciation of the name, receiving a user input to select one or more of the plurality, and combining the selected one or more of the plurality of monosyllabic components to construct the phonetic pronunciation of the name.
Abstract:
One or more media items can be bound to a voice call using a binding protocol. The binding protocol allows call participants to more easily transfer media items to other call participants using one or more user interfaces. A call participant can initiate a media transfer by selecting the media and a communication modality for transferring the media. The binding protocol can be active or lazy. In lazy binding, the call participant can select the desired media for transfer before the voice call is established, and subsequently mark the media for binding with the voice call. In active binding, the call participant can select and transfer the desired media item during the voice call, and the media item is automatically bound to the voice call. The media item can be transferred using a user-selected communication modality over an independent data communication channel.
Abstract:
Systems and processes for robust end-pointing of speech signals using speaker recognition are provided. In one example process, a stream of audio having a spoken user request can be received. A first likelihood that the stream of audio includes user speech can be determined. A second likelihood that the stream of audio includes user speech spoken by an authorized user can be determined. A start-point or an end-point of the spoken user request can be determined based at least in part on the first likelihood and the second likelihood.
Abstract:
One or more media items can be bound to a voice call using a binding protocol. The binding protocol allows call participants to more easily transfer media items to other call participants using one or more user interfaces. A call participant can initiate a media transfer by selecting the media and a communication modality for transferring the media. The binding protocol can be active or lazy. In lazy binding, the call participant can select the desired media for transfer before the voice call is established, and subsequently mark the media for binding with the voice call. In active binding, the call participant can select and transfer the desired media item during the voice call, and the media item is automatically bound to the voice call. The media item can be transferred using a user-selected communication modality over an independent data communication channel.
Abstract:
A method comprising: providing a plurality of pronunciation guessers, each of the plurality of pronunciation guessers being associated with a respective phonetic alphabet of a language or a locale; determining a user language or a user locale; associating a first phonetic alphabet with the user language or the user locale; receiving at each pronunciation guesser a representation of a name; guessing, at each pronunciation guesser, a phonetic pronunciation of one or more components of the name; mapping the phonetic pronunciation of the one or more components of the name guessed by each of the plurality of pronunciation guessers to the first phonetic alphabet to generate a list of guessed pronunciations; receiving an audio pronunciation of the name; and selecting a combination of components from the list of guessed pronunciations that, when pronounced, substantially matches the audio pronunciation of the name.