Invention Grant
- Patent Title: Computer-implemented method of creating a translation model for low resource language pairs and a machine translation system using this translation model
-
Application No.: US16237414Application Date: 2018-12-31
-
Publication No.: US11037028B2Publication Date: 2021-06-15
- Inventor: Ondrej Bojar , Roman Sudarikov
- Applicant: Charles University Faculty of Mathematics and Physics
- Applicant Address: CZ Prague
- Assignee: Charles University Faculty of Mathematics and Physics
- Current Assignee: Charles University Faculty of Mathematics and Physics
- Current Assignee Address: CZ Prague
- Agency: Hitaffer & Hitaffer, PLLC
- Agent Thedford I. Hitaffer
- Main IPC: G06F40/58
- IPC: G06F40/58 ; G06K9/62 ; G06N3/02 ; G06F40/51

Abstract:
A computer-implemented method for creating a translation model for low resource language pairs and applicable on noisy inputs utilizing several approaches: choosing particular input corpora covering in-domain noisy and clean texts as well as unrelated but larger general parallel texts, performing several chosen methods of creating synthetic parallel corpora and filtering, pre-processing, deduplicating and concatenating training corpora.
Public/Granted literature
Information query