Abstract:
A video summarized method based on mining the story structure and semantic relations among concept entities has steps of processing a video to generate multiple important shots that are annotated with respective keywords: Performing a concept expansion process by using the keywords to create expansion trees for the annotated shots; rearranging the keywords of the expansion trees and classifying to calculate relations thereof; applying a graph entropy algorithm to determine significant shots and edges interconnected with the shots. Based on the determined result of the graph entropy algorithm, a structured relational graph is built to display the significant shots and edges thereof. Consequently, users can more rapidly browse the content of a video and comprehend if different shots are related.
Abstract:
The present invention includes a method for speech encoding and decoding and a design of speech coder and decoder. The characteristic of speech encoding method relies on the type of data with high compression rate after the whole speech data is compressed. The present invention is able to lower the bit rate of the original speech from 64 Kbps to 1.6 Kbps and provide a bit rate lower than the traditional compression method. It can provide good speech quality, and attain the function of storing the maximum speech data with minimum memory. As to the speech decoding method, some random noises are appropriated added into the exciting source, so that more speech characteristics can be simulated to produce various speech sounds. In addition, the present invention also discloses a coder and a decoder designed by application specific integrated circuit, and the structural design is optimized according to the software. Its operating speed is much faster than the digital signal processor, and suits the system requiring fast computation speed such as multiple line encoding; its cost is also lower than the digital signal processor.
Abstract:
The present invention discloses a video summarization system and the method thereof. A similarity computing apparatus computes the similarity between each frame to obtain multiple similarity values. A key frame extracting apparatus chooses the key frames from the frames wherein the sum of the similarity values between the key frames is a minimum. A feature space mapping apparatus converts the sentences into multiple corresponding sentence vectors and computes the distance between each sentence vector to obtain multiple distance values. A clustering apparatus divides the sentences into multiple clusters according to the distance values and the importance of the sentences, and also applies a splitting step to split the cluster with the highest importance into multiple new clusters. A key sentence extracting apparatus chooses multiple key sentence from the clusters, wherein the sum of the importance of the key sentences is the maximum.
Abstract:
A video summarized method based on mining the story structure and semantic relations among concept entities has steps of processing a video to generate multiple important shots that are annotated with respective keywords: Performing a concept expansion process by using the keywords to create expansion trees for the annotated shots; rearranging the keywords of the expansion trees and classifying to calculate relations thereof; applying a graph entropy algorithm to determine significant shots and edges interconnected with the shots. Based on the determined result of the graph entropy algorithm, a structured relational graph is built to display the significant shots and edges thereof. Consequently, users can more rapidly browse the content of a video and comprehend if different shots are related.
Abstract:
A method and system used to determine the similarity between an input speech data and a sample speech data is provided. First, the input speech data is segmented into a plurality of input speech frames and the sample speech data is segmented into a plurality of sample speech frames. Then, the input speech frames and the sample speech frames are used to build a matching matrix, wherein the matching matrix comprises the distance values between each of the input speech frames and each of the sample speech frames. Next, the distance values are used to calculate a matching score. Finally, the similarity between the input speech data and the sample speech data is determined according to this matching score.
Abstract:
A block intra prediction direction detection algorithm comprises acts of dividing a block, finding directions from edge assent rules, determining a main edge of the block, selecting prediction modes from the main edge, choosing base prediction modes and using all unique selected and base prediction modes in intra prediction. The algorithms comprise a 4×4 block intra prediction direction detection algorithm, a 16×16 luminance block intra prediction direction detection algorithm and an 8×8 chrominance block intra prediction direction detection algorithm.
Abstract:
The present invention discloses a video summarization system and the method thereof. A similarity computing apparatus computes the similarity between each frame to obtain multiple similarity values. A key frame extracting apparatus chooses the key frames from the frames wherein the sum of the similarity values between the key frames is a minimum. A feature space mapping apparatus converts the sentences into multiple corresponding sentence vectors and computes the distance between each sentence vector to obtain multiple distance values. A clustering apparatus divides the sentences into multiple clusters according to the distance values and the importance of the sentences, and also applies a splitting step to split the cluster with the highest importance into multiple new clusters. A key sentence extracting apparatus chooses multiple key sentence from the clusters, wherein the sum of the importance of the key sentences is the maximum.
Abstract:
An image-capturing device and method for removing strangers from an image are described. First, a first image is input. Then, a control module determines if an unwanted object processing step is needed, and obtains a result. If the result is no, the first image is directly sent to an output module. If the result is yes, an image-identifying module begins to identify the target-image and the unwanted object in the first image, and then, an unwanted object processing module starts the step to process unwanted images. The unwanted object processing step can remove the unwanted object from an image and fill the left lacuna region. Afterwards, a second image is produced and sent to the output module.
Abstract:
A method for diagnosing breakdown cause of a vehicle is disclosed. In the method, several sound signals are sensed respectively with several sound sensing devices, which are respectively installed at several zones among the vehicle, from the vehicle. A current driving status of the vehicle is obtained through an electrical control unit (ECU) of the vehicle. Determine a sound source of the vehicle according to the sound signals. A breakdown cause of the vehicle is diagnosed according to the sound signals, the current driving status and the sound source.
Abstract:
The present invention discloses an audio signal segmentation algorithm comprising the following steps. First, an audio signal is provided. Then, an audio activity detection (AAD) step is applied to divide the audio signal into at least one noise segment and at least one noisy audio segment. Then, an audio feature extraction step is used on the noisy audio segment to obtain multiple audio features. Then, a smoothing step is applied. Then, multiple speech frames and multiple music frames are discriminated. The speech frames and the music frames compose at least one speech segment and at least one music segment. Finally, the speech segment and the music segment are segmented from the noisy audio segment.