Invention Grant
- Patent Title: System for estimating a distribution of message content categories in source data
- Patent Title (中): 用于估计源数据中消息内容类别分布的系统
-
Application No.: US12077534Application Date: 2008-03-19
-
Publication No.: US08180717B2Publication Date: 2012-05-15
- Inventor: Gary King , Daniel Hopkins , Ying Lu
- Applicant: Gary King , Daniel Hopkins , Ying Lu
- Applicant Address: US MA Cambridge
- Assignee: President and Fellows of Harvard College
- Current Assignee: President and Fellows of Harvard College
- Current Assignee Address: US MA Cambridge
- Agency: Bingham McCutchen LLP
- Main IPC: G06F17/00
- IPC: G06F17/00 ; G06F17/21 ; G06N5/00

Abstract:
A method of computerized content analysis that gives “approximately unbiased and statistically consistent estimates” of a distribution of elements of structured, unstructured, and partially structured source data among a set of categories. In one embodiment, this is done by analyzing a distribution of small set of individually-classified elements in a plurality of categories and then using the information determined from the analysis to extrapolate a distribution in a larger population set. This extrapolation is performed without constraining the distribution of the unlabeled elements to be equal to the distribution of labeled elements, nor constraining a content distribution of content of elements in the labeled set (e.g., a distribution of words used by elements in the labeled set) to be equal to a content distribution of elements in the unlabeled set. Not being constrained in these ways allows the estimation techniques described herein to provide distinct advantages over conventional aggregation techniques.
Public/Granted literature
- US20090030862A1 System for estimating a distribution of message content categories in source data Public/Granted day:2009-01-29
Information query