Method and system for searching words in documents written in a source language as transcript of words in an origin language
Abstract:
The invention relates to a method used by computers for searching words in documents written in a source language, which are not in the vocabulary of said source language, but are transcript of meaningful words in an origin language. The method is comprised of a preparation process and a search process. During the preparation process a database of unrecognized words in the source language is maintained, which contains, among other data, normalized phonetic conversion of the unrecognized word, as well as a corpus of all words of the documents in the search domain and indexes for efficient search. During search, a phonetic conversion and normalization is done for the search word, and the distance to similar phonetics words in the corpus is calculated. The found words in the corpus are arranged in ascending order, and the relevant documents are displayed.
Information query
Patent Agency Ranking
0/0