Invention Grant
US08255397B2 Method and apparatus for document clustering and document sketching 有权
用于文档聚类和文档素描的方法和装置

  • Patent Title: Method and apparatus for document clustering and document sketching
  • Patent Title (中): 用于文档聚类和文档素描的方法和装置
  • Application No.: US12198841
    Application Date: 2008-08-26
  • Publication No.: US08255397B2
    Publication Date: 2012-08-28
  • Inventor: Sreenivas Gollapudi
  • Applicant: Sreenivas Gollapudi
  • Applicant Address: US CA Palo Alto
  • Assignee: Ebrary
  • Current Assignee: Ebrary
  • Current Assignee Address: US CA Palo Alto
  • Agency: Glenn Patent Group
  • Agent Michael A. Glenn
  • Main IPC: G06F17/30
  • IPC: G06F17/30
Method and apparatus for document clustering and document sketching
Abstract:
A first embodiment of the invention provides a system that automatically classifies documents in a collection into clusters based on the similarities between documents, that automatically classifies new documents into the right clusters, and that may change the number or parameters of clusters under various circumstances. A second embodiment of the invention provides a technique for comparing two documents, in which a fingerprint or sketch of each document is computed. In particular, this embodiment of the invention uses a specific algorithm to compute the document's fingerprint. One embodiment uses a sentence in the document as a logical delimiter or window from which significant words are extracted and, thereafter, a hash is computed of all pair-wise permutations. Words are extracted based on their weight in the document, which can be computed using measures such as term frequency and the inverse document frequency.
Information query
Patent Agency Ranking
0/0