Invention Grant
- Patent Title: Dynamic corpus generation
- Patent Title (中): 动态语料库生成
-
Application No.: US11270014Application Date: 2005-11-09
-
Publication No.: US07941418B2Publication Date: 2011-05-10
- Inventor: Carlos Alejandro Arguelles
- Applicant: Carlos Alejandro Arguelles
- Applicant Address: US WA Redmond
- Assignee: Microsoft Corporation
- Current Assignee: Microsoft Corporation
- Current Assignee Address: US WA Redmond
- Agency: Westerman, Champlin & Kelly, P.A.
- Main IPC: G06F7/00
- IPC: G06F7/00 ; G06F17/30

Abstract:
A computer-implemented method of generating a dynamic corpus includes generating web threads, based upon corresponding sets of words dequeued from a word queue, to obtain web thread resulting URLs. The web thread resulting URLs are enqueued in a URL queue. Multiple text extraction threads are generated, based upon documents downloaded using URLs dequeued from the URL queue, to obtain text files. New words are randomly obtained from the text files, and the randomly obtained words from the text files are enqueued in the word queue. This process is iteratively performed, resulting in a dynamic corpus.
Public/Granted literature
- US20070106977A1 Dynamic corpus generation Public/Granted day:2007-05-10
Information query