Abstract:
Systems and processes for providing user-specific acoustic models are provided. In accordance with one example, a method includes, at an electronic device having one or more processors, receiving a plurality of speech inputs, each of the speech inputs associated with a same user of the electronic device; providing each of the plurality of speech inputs to a user-independent acoustic model, the user-independent acoustic model providing a plurality of speech results; initiating a user-specific acoustic model on the electronic device; and adjusting the user-specific acoustic model based on the plurality of speech inputs and the plurality of speech results.
Abstract:
Speech recognition is performed on a received utterance to determine a plurality of candidate text representations of the utterance, including a primary text representation and one or more alternative text representations. Natural language processing is performed on the primary text representation to determine a plurality of candidate actionable intents, including a primary actionable intent and one or more alternative actionable intents. A result is determined based on the primary actionable intent. The result is provided to the user. A recognition correction trigger is detected. In response to detecting the recognition correction trigger, a set of alternative intent affordances and a set of alternative text affordances are concurrently displayed.
Abstract:
Systems and processes for automatic accent detection are provided. In accordance with one example, a method includes, at an electronic device with one or more processors and memory, receiving a user input, determining a first similarity between a representation of the user input and a first acoustic model of a plurality of acoustic models, and determining a second similarity between the representation of the user input and a second acoustic model of the plurality of acoustic models. The method further includes determining whether the first similarity is greater than the second similarity. In accordance with a determination that the first similarity is greater than the second similarity, the first acoustic model may be selected; and in accordance with a determination that the first similarity is not greater than the second similarity, the second acoustic model may be selected.
Abstract:
Systems and processes for performing a task with a digital assistant are provided. In accordance with one example, a method includes, at an electronic device having one or more processors, receiving a natural-language input; determining, based on the natural- language input, a first task and first usefulness score associated with the first task; receiving, from another electronic device, a second task and second usefulness score associated with the second task; determining whether the first usefulness score is higher than the second usefulness score; in accordance with a determination that the first usefulness score is higher than the second usefulness score: performing the first task determined by the electronic device; and providing an output indicating whether the first task has been performed; and in accordance with a determination that the second usefulness score is higher than the first usefulness score: performing the second task received from the another electronic device; and providing an output indicating whether the second task has been performed.
Abstract:
The present disclosure generally relates to context-based endpoint detection in user speech input. A method for identifying an endpoint of a spoken request by a user may include receiving user input of natural language speech including one or more words; identifying at least one context associated with the user input; generating a probability, based on the at least one context associated with the user input, that a location in the user input is an endpoint; determining whether the probability is greater than a threshold; and in accordance with a determination that the probability is greater than the threshold, identifying the location in the user input as the endpoint.
Abstract:
The present disclosure generally relates to methods and user interfaces for managing visual content at a computer system. In some embodiments, methods and user interfaces for managing visual content in media are described. In some embodiments, methods and user interfaces for managing visual indicators for visual content in media are described. In some embodiments, methods and user interfaces for inserting visual content in media are described. In some embodiments, methods and user interfaces for identifying visual content in media are described. In some embodiments, methods and user interfaces for translating visual content in media are described. In some embodiments, methods and user interfaces for translating visual content in media are described. In some embodiments, methods and user interfaces for managing user interface objects for visual content in media are described.
Abstract:
Systems and processes for processing speech in a digital assistant are provided. In one example process, a first speech input can be received from a user. The first speech input can be processed using a first automatic speech recognition system to produce a first recognition result. An input indicative of a potential error in the first recognition result can be received. The input can be used to improve the first recognition result. For example, the input can include a second speech input that is a repetition of the first speech input. The second speech input can be processed using a second automatic speech recognition system to produce a second recognition result.
Abstract:
Systems and processes for providing user-specific acoustic models are provided. In accordance with one example, a method includes, at an electronic device having one or more processors, initiating a user-specific acoustic model on the electronic device; receiving first speech input corresponding to a plurality of speech inputs, each of the speech inputs associated with a user of the electronic device; adjusting the user-specific acoustic model based on the plurality of speech inputs; wherein the method further comprises providing the adjusted user-specific acoustic model to another electronic device and at the another electronic device: receiving the adjusted user-specific acoustic model: receiving a second speech input from a speaker; and identifying, with the adjusted user-specific acoustic model, the speaker of the second speech input as the user.