Abstract:
Visual effects schedules are applied to audiovisual performances with differing visual effects applied in correspondence with differing elements of musical structure. Segmentation techniques applied to one or more audio tracks (e.g., vocal or backing tracks) are used to compute some of the components of the musical structure. In some cases, applied visual effects schedules are mood-denominated and may be selected by a performer as a component of his or her visual expression or determined from an audiovisual performance using machine learning techniques.
Abstract:
Techniques have been developed to facilitate the livestreaming of group audiovisual performances. Audiovisual performances including vocal music are captured and coordinated with performances of other users in ways that can create compelling user and listener experiences. For example, in some cases or embodiments, duets with a host performer may be supported in a sing-with-the-artist style audiovisual livestream in which aspiring vocalists request or queue particular songs for a live radio show entertainment format. The developed techniques provide a communications latency-tolerant mechanism for synchronizing vocal performances captured at geographically-separated devices (e.g., at globally-distributed, but network-connected mobile phones or tablets or at audiovisual capture devices geographically separated from a live studio).
Abstract:
Advanced, but user-friendly composition and editing environments for musical scores may be provided using the types, and in some cases the instances, of computing devices that will in turn consume musical score content so generated. Indeed, by integrating musical composition facilities within synthetic musical instruments that can be widely deployed on hand-held or portable computing devices, a social music network that includes such synthetic musical instruments gains access to a large, and potentially prolific, population of authors, editors and reviewers, as well as to the community-sourced musical scores that they can generate. By curating such content and/or by applying crowd-sourcing or other computational techniques to maintain quality, a social music network may rapidly deploy the new and ever evolving content that its user community desires.
Abstract:
Captured vocals may be automatically transformed using advanced digital signal processing techniques that provide captivating applications, and even purpose-built devices, in which mere novice user-musicians may generate, audibly render and share musical performances. In some cases, the automated transformations allow spoken vocals to be segmented, arranged, temporally aligned with a target rhythm, meter or accompanying backing tracks and pitch corrected in accord with a score or note sequence. Speech-to-song music applications are one such example. In some cases, spoken vocals may be transformed in accord with musical genres such as rap using automated segmentation and temporal alignment techniques, often without pitch correction. Such applications, which may employ different signal processing and different automated transformations, may nonetheless be understood as speech-to-rap variations on the theme.
Abstract:
Techniques have been developed for transmitting and receiving information conveyed through the air from one portable device to another as a generally unperceivable coding within an otherwise recognizable acoustic signal. For example, in some embodiments in accordance with the present invention(s), information is acoustically communicated from a first handheld device toward a second by encoding the information in a signal that, when converted into acoustic energy at an acoustic transducer of the first handheld device, is characterized in that the acoustic energy is discernable to a human ear yet the encoding of the information therein is generally not perceivable by the human. The acoustic energy is transmitted from the acoustic transducer of the first handheld device toward the second handheld device across an air gap that constitutes a substantially entirety of the distance between the devices. Acoustic energy received at the second handheld device may then be processed using signal processing techniques tailored to detection of the particular information encodings employed.
Abstract:
Vocal audio of a user together with performance synchronized video is captured and coordinated with audiovisual contributions of other users to form composite duet-style or glee club-style or window-paned music video-style audiovisual performances. In some cases, the vocal performances of individual users are captured (together with performance synchronized video) on mobile devices, television-type display and/or set-top box equipment in the context of karaoke-style presentations of lyrics in correspondence with audible renderings of a backing track. Contributions of multiple vocalists are coordinated and mixed in a manner that selects for presentation, at any given time along a given performance timeline, performance synchronized video of one or more of the contributors. Selections are in accord with a visual progression that codes a sequence of visual layouts in correspondence with other coded aspects of a performance score such as pitch tracks, backing audio, lyrics, sections and/or vocal parts.
Abstract:
Audiovisual performances, including vocal music, are captured and coordinated with those of other users in ways that create compelling user experiences. In some cases, the vocal performances of individual users are captured (together with performance synchronized video) on mobile devices, television-type display and/or set-top box equipment in the context of karaoke-style presentations of lyrics in correspondence with audible renderings of a backing track. Contributions of multiple vocalists are coordinated and mixed in a manner that selects for visually prominent presentation performance synchronized video of one or more of the contributors. Prominence of particular performance synchronized video may be based, at least in part, on computationally-defined audio features extracted from (or computed over) captured vocal audio. Over the course of a coordinated audiovisual performance timeline, these computationally-defined audio features are selective for performance synchronized video of one or more of the contributing vocalists.