Invention Grant
- Patent Title: Regularizing word segmentation
-
Application No.: US17656225Application Date: 2022-03-23
-
Publication No.: US12087279B2Publication Date: 2024-09-10
- Inventor: Bhuvana Ramabhadran , Hainan Xu , Kartik Audhkhasi , Yinghui Huang
- Applicant: Google LLC
- Applicant Address: US CA Mountain View
- Assignee: Google LLC
- Current Assignee: Google LLC
- Current Assignee Address: US CA Mountain View
- Agency: Honigman LLP
- Agent Brett A. Krueger; Grant Griffith
- Main IPC: G10L15/02
- IPC: G10L15/02 ; G06F40/284 ; G06N3/04 ; G10L15/04 ; G10L15/06 ; G10L15/16 ; G10L25/30

Abstract:
A method for subword segmentation includes receiving an input word to be segmented into a plurality of subword units. The method also includes executing a subword segmentation routine to segment the input word into a plurality of subword units by accessing a trained vocabulary set of subword units and selecting the plurality of subword units from the input word by greedily finding a longest subword unit from the input word that is present in the trained vocabulary set until an end of the input word is reached.
Public/Granted literature
- US20220310061A1 Regularizing Word Segmentation Public/Granted day:2022-09-29
Information query