Searching multilingual documents based on document structure extraction

    公开(公告)号:GB2583679A

    公开(公告)日:2020-11-04

    申请号:GB202011326

    申请日:2018-11-20

    Applicant: IBM

    Abstract: An approach is provided for searching multilingual documents. Structure components are extracted from multilingual documents. Based on the extracted components, the documents are grouped into classifications including respective sets of documents expressed in different respective natural languages. A natural language in a query is detected. One of the documents is selected based on the document having content indicated by the query and the natural language of the document matching the detected natural language. Structure components of the selected document are extracted. Based on the extracted structure components of the selected document, one of the classifications is identified as including the selected document. Other document(s) in the classification are identified and presented as having content that matches the content of the selected document. The natural language(s) of the other document(s) are each different from the natural language of the selected document.

Patent Agency Ranking