Abstract:
본 발명은 단백질 이름 정규화 방법에 관한 것으로, 보다 상세하게는 온톨로지 매핑을 이용한 단백질 이름 정규화 방법 및 장치에 관한 것이다. 본 발명은 생물학 문헌을 입력받아 단백질 개체명을 추출하는 단계; 상기 추출된 단백질 개체명과 온톨로지를 통해서 구축된 동의어 사전과의 유사도를 계산하여 단백질 코드를 분석하는 단계; 소정 종분류 학습모델을 이용하여 상기 생물학 문헌에 포함된 단백질의 종 정보를 분류하는 단계; 및 상기 분석된 단백질 코드 및 상기 분류된 종 정보를 통합하여 온톨로지 ID를 할당하는 단계를 포함하는 온톨로지 매핑을 이용한 단백질 이름 정규화 방법을 개시한다. 단백질 이름 정규화, 온톨로지, 생물학 문헌
Abstract:
A method for reporting new documents relevant to a user profile based on a protein-protein interaction network is provided to prevent excessive information and obtain new information quickly by enabling a user to make the profile based on the protein-protein interaction network, monitoring the new document in an Internet database, and reporting the new document relevant to the user profile in an email. A user makes a user profile by using a protein-protein interaction network and inputting a keyword, an issuance year, an author, an institute, and a journal name to filter documents. A profile processor generates a query by converting a protein ID expressed in the protein-protein interaction network into a protein name and adding the additional filtering information, requests search to a document database, and transfers the query to a document processor when the new document is found in the document database. The document processor extracts protein-protein interaction, checks whether the protein included in the protein-protein interaction network is relevant to the protein ID, makes a summary by highlighting a document number and a part commenting the protein-protein interaction in the document, and reports the summary to the user by using an email.
Abstract:
A method and an apparatus of protein name normalization using ontology mapping are provided to recognize accurately the protein written on the literatures by mapping the protein name recognized in the biological literatures into a normalized protein ontology. A literature recognition part(110) extracts a protein name and species data by accepting biological literatures input. An abbreviated word dictionary DB(130) is composed of pairs of an abbreviated protein names and an original protein names. An abbreviated protein name restoration part(120) restores the abbreviated protein name into the original protein name. A synonym dictionary DB(150) is constructed through the ontology. An inverted index structure DB(160) of the synonym dictionary has an inverted index structure of the synonym dictionary. A protein code analysis part(140) analyzes the protein code by calculating similarity of the protein code by comparing the extracted protein name and the inverted index structure DB of the synonym dictionary. An ontology ID allocating part(190) allocates final ontology IDs by protein name based on the protein code and kind.
Abstract:
A method and a system for verifying protein-protein interaction with text mining are provided to avoid a duplicated experiment by utilizing knowledge proved through documents before the estimated protein-protein interaction is experimentally proved and generate a measure for estimating performance of an estimation system by verifying a result of the estimation system. An ontology database(160) stores protein-protein interaction and hierarchical structure information among proteins. A text mining part(120) extracts the protein-protein interaction from protein-related documents by using a text mining method. An ontology mapper(130) maps the extracted protein-protein interaction to an ontology ID by using the ontology database. An information filter(140) filters the information having high weight based on a frequency of the information and an effect factor of the protein-related documents among the mapped protein-protein interaction information. An information indexer indexes protein-related document/sentence, ontology ID, protein-protein interaction, and precision information.