Content pattern based automatic document classification
Abstract:
Computer systems, devices, and associated methods of content pattern based automatic document classification are disclosed herein. In one embodiment, a method includes receiving a document and a sequence of words corresponding to a document class having a class label from a network storage. The method also includes determining a longest common subsequence of words between the words in the document and the sequence of words and calculating a similarity percentage between the document and the sequence of words based on the determined longest common subsequence. When the calculated similarity percentage is above a threshold, the class label corresponding to the document class is automatically applied to the received document in the network storage.
Public/Granted literature
Information query
Patent Agency Ranking
0/0