Invention Grant
- Patent Title: Searching multilingual documents based on document structure extraction
-
Application No.: US15818860Application Date: 2017-11-21
-
Publication No.: US10691734B2Publication Date: 2020-06-23
- Inventor: Xin Tang , Kun Yan Yin , He Li , XueLiang Zhao , Xin Xu
- Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
- Applicant Address: US NY Armonk
- Assignee: International Business Machines Corporation
- Current Assignee: International Business Machines Corporation
- Current Assignee Address: US NY Armonk
- Agency: Schmeiser, Olsen & Watts
- Agent Mark C. Vallone
- Main IPC: G06F16/33
- IPC: G06F16/33 ; G06F16/93 ; G06F16/338 ; G06F16/35 ; G06F16/9535 ; G06F40/40 ; G06F40/58

Abstract:
An approach is provided for searching multilingual documents. Structure components are extracted from multilingual documents. Based on the extracted components, the documents are grouped into classifications including respective sets of documents expressed in different respective natural languages. A natural language in a query is detected. One of the documents is selected based on the document having content indicated by the query and the natural language of the document matching the detected natural language. Structure components of the selected document are extracted. Based on the extracted structure components of the selected document, one of the classifications is identified as including the selected document. Other document(s) in the classification are identified and presented as having content that matches the content of the selected document. The natural language(s) of the other document(s) are each different from the natural language of the selected document.
Public/Granted literature
- US20190155942A1 SEARCHING MULTILINGUAL DOCUMENTS BASED ON DOCUMENT STRUCTURE EXTRACTION Public/Granted day:2019-05-23
Information query