Abstract:
PURPOSE: A system and method for constructing named entity dictionary are provided to easily build a named entity dictionary by extracting named entities from a certain type of information included in a web document such as a table or list, and using the named entities. CONSTITUTION: A web document collector (110) collects web documents. An information extractor(130) extracts the table or list type information from the web documents. A name entity extractor(140) extracts the name entities from the table or list type information. The name entity dictionary(160) stores the extracted named entities. An address extractor(120) extracts the address of the web document by analyzing the web document. The web document analyzer transmits the extracted address to the web document collector.
Abstract:
PURPOSE: A method and apparatus for automatically finding synonyms are provided to establish synonyms of keywords based on the statistical information between keywords and the morphological similarity by using large scale web keywords and a click log. CONSTITUTION: An allomorph candidate generator(101) generates allomorph candidates for search keywords by using a keyword log for the search keywords or the user session information. A synonym extracting unit(102) for verification extracts verification synonym from web documents by using synonym patterns. The allomorph generating unit(103) removes over-generated or error candidates from the allomorph candidates and generates allomorphs for the search keywords using the synonym for verification.
Abstract:
PURPOSE: A device and method for classifying a document of a single class category are provided to perform exact document classification by using an association rule extracted by an association rule detection method as a quality for document classification. CONSTITUTION: An associative rule training unit(100) generates matrices of qualities from a learning document set to generate an association rule candidate with a depth or widths primary search method. The associative rule training unit generates an associative rule training model from association rules candidates. A document class category classifier(150) uses an association rules learning model to classify a document of a document set.
Abstract:
PURPOSE: A device and a method for processing web information by extracting local information are provided to integrate various web information around related regional information to provide processed document data. CONSTITUTION: A major information extracting unit(150) extracts major information including regional information from document data according to a result of language analysis and a selected topic. A related information mapping unit(170) groups and maps the document data. An information integrating unit(180) compares the mapped document data. The information integrating unit integrates the document data according to the comparison result.
Abstract:
PURPOSE: A device and a method for keyword extraction and an associative word network configuration of document data are provided to extract automatically issue key word from a Blog document group and constitute an associative network in between extracted key words, thereby showing exact keyword according to each document. CONSTITUTION: An issue keyword extractor(104) parses structure information of a document in an inputted web document group. An issue keyword extractor extracts an issue keyword based on analyzed morpheme. An associative work network configurator(106) extracts relations between extracted issue keywords. An indexing unit(108) indexes extracted issue keywords and configured associated word network. According to a control command, a presentation unit(114) suggests the issue keyword and associated word network information.
Abstract:
PURPOSE: A topic map based indexing device, a topic map based searching device, a topic map based searching system and a method thereof are provided to obtain question analyzing information about question of a user and search similar questions in a community Q/A topic map according to question analyzing information and effectively outputs an answer, thereby searching most suitable answer. CONSTITUTION: A Q/A pre-processing block(102) normalizes the community Q/A list as monolithic. A Q/A analysis block(104) obtains Q/A analyzing information through analyze of the community Q/A list. A Q/A stores block stores indexing information through duplicated answer removal, meaningless answer removal, an answer list sorting, extracting answer of the top order and topic decision according to the Q/A analyzing information as community Q/A topic map.
Abstract:
PURPOSE: A method for storing and searching information based on a web base, and a system for managing of the same are provided to store extracted tupe and triple information to inverse-index structure extracted through high-quality language analysis such as triple/recognizing individual name and relation extraction thereby shortening a search time. CONSTITUTION: A language analysis block(100) performs language analysis of structure/non-structure. An object name recognition block(110) recognizes object name in the document. A triple storage block(130) stores information of extracted tupler type and extracted triple type by expanding reverse index structure. A query analysis block(140) extracts pattern of search information of tuple or triple type search information, after analysis a user query. A triple search block(150) performs search from the inverse-index structure.
Abstract:
A method for searching for media information via natural language analysis is provided to offer a scheme for searching the media wanted by a user by making efficient analysis of a user's natural language query. A method for searching for media information via natural language analysis comprises the following several steps. If media information(101) is stored at a database, metadata is extracted from the inputted media information(103). The metadata matched with the media information is stored at a metadata index database(105). If natural language media search query information(111) is inputted, the inputted media search query is analyzed and a metadata analysis rule(113) is extracted. Then, the metadata analysis rule is stored at a metadata recognition rule database(115). If a user starts a media search operation(121), the natural language search query is recognized as metadata by using the metadata recognition rule database, and the media information matched with the recognized data is searched by using the metadata index database.
Abstract:
An apparatus and a method for generating a response sentence are provided to perform exact meaning analysis of a voice-recognized sentence by performing second point of sentence/substitutes extraction and second meaning analysis with respect to the voice-recognized sentence. A response sentence generating method comprises the following steps of: performing morpheme analysis of a voice-recognized sentence(200,210); extracting a first point of sentence from the sentence(220); performing first meaning analysis of the sentence based on the extracted first point of sentence(230); extracting a second point of sentence including the first point of sentence from the sentence based on the first meaning analysis result in order to further extract point of sentence which are not extracted in the above second step(240); generating a meaning analysis result of the voice-recognized sentence by performing second meaning analysis of the sentence based on the extracted second point of sentence(250); and generating a response sentence to the voice-recognized sentence based on the generated meaning analysis result(260).
Abstract:
A method and an apparatus for displaying information based on hierarchical classification are provided to supply a user visually with a variety of classification information configured in a hierarchical form as a basic structure and statistical information about mass media indexed under lately-issued ontology based semantic web environment, and browse the media information. A first classification reference corresponding to at least two pre-decided ontology semantic structure classes is selected(201). Information corresponding to each selected first classification reference is searched. The searched information is classified in correspondence with a second classification reference as an ontology semantic structure class lower than the first classification reference. A matrix where the at least two first classification references are respectively an axis and the second classification reference is a component of the axis is generated(207).