-
公开(公告)号:US20180139377A1
公开(公告)日:2018-05-17
申请号:US15573325
申请日:2016-01-19
Applicant: SRI International
Inventor: David Chao ZHANG , John Benjamin SOUTHALL , Michael Anthony ISNARDI , Michael Raymond PIACENTINO , David Christopher BERENDS , Girish ACHARYA , Douglas A. BERCOW , Aaron SPAULDING , Sek CHAI
CPC classification number: H04N5/23212 , A61B5/0077 , A61B5/442 , G06K9/00228 , G06K9/036 , G06K9/6212 , G06T5/20 , G06T7/0012 , G06T7/11 , G06T2207/20016 , G06T2207/30088 , H04N5/23229 , H04N5/23293 , H04N5/2356
Abstract: Device logic in a mobile device configures a processor to capture a series of images, such as a video, using a consumer-grade camera, and to analyze the images to determine the best-focused image, of the series of images, that captures a region of interest. The images may be of a textured surface, such as facial skin of a mobile device user. The processor sets a focal length of the camera to a fixed position for collecting the images. The processor may guide the user to position the mobile device for capturing the images, using audible cues. For each image, the processor crops the image to the region of interest, extracts luminance information, and determines one or more energy levels of the luminance via a Laplacian pyramid. The energy levels may be filtered, and then are compared to energy levels of the other images to determine the best-focused image.
-
公开(公告)号:US20240257801A1
公开(公告)日:2024-08-01
申请号:US18393575
申请日:2023-12-21
Applicant: SRI International
Inventor: Jeffrey LUBIN , Alexander ERDMANN , James BERGEN , Harry BRATT , Jihua HUANG , Sarah BAKST , Michael LOMNITZ , Zachary DANIELS , John CADIGAN , Ali CHAUDHRY , Zhiwei ZHU , Joshua CHATTIN , Girish ACHARYA
CPC classification number: G10L15/1807 , G10L15/02 , G10L15/063 , G10L15/183 , G10L15/25 , G10L25/18
Abstract: A method, apparatus, and system for creating a script for rendering audio and/or video streams include identifying at least one prosodic speech feature in a received audio stream and/or a received language model, creating a respective prosodic speech symbol for each of the at least one identified prosodic speech features, converting the received audio stream and/or the received language model into a text stream, temporally inserting the created at least one prosodic speech symbol into the text stream, identifying in a received video stream at least one prosodic gesture of at least a portion of a body of a speaker of the received audio stream, creating at least one respective gesture symbol for each of the at least one identified prosodic gestures, and temporally inserting the created at least one gesture symbol into the text stream along with the at least one prosodic speech symbol to create a prosodic script.
-