Invention Grant
- Patent Title: Techniques for unsupervised learning embeddings on source code tokens from non-local contexts
-
Application No.: US16198969Application Date: 2018-11-23
-
Publication No.: US10901708B1Publication Date: 2021-01-26
- Inventor: Russell Reas , Neela Sawant , Srinivasan Sengamedu Hanumantha Rao , Yinglong Wang , Anton Emelyanov , Shishir Sethiya
- Applicant: Amazon Technologies, Inc.
- Applicant Address: US WA Seattle
- Assignee: Amazon Technologies, Inc.
- Current Assignee: Amazon Technologies, Inc.
- Current Assignee Address: US WA Seattle
- Agency: Nicholson De Vos Webster & Elliott LLP
- Main IPC: G06F8/41
- IPC: G06F8/41 ; G06N20/00 ; G06F8/30

Abstract:
Techniques for unsupervised learning of embeddings on source code from non-local contexts are described. Code can be processed to generate an abstract syntax tree (AST) which represents syntactic paths between tokens in the code. Once the AST(s) have been generated, the paths in the AST(s) can be crawled to identify terminals (e.g., leaf nodes in the AST) and paths between terminals can be identified. The pairs of tokens identified at the ends of each path can then be used to generate a cooccurrence matrix. For example, if X number of unique terminals are identified, a matrix of size X by X can be generated to indicate a frequency at which pairs of terminals cooccur. This cooccurrence matrix can then be used as input to existing techniques for learning vector-space embeddings, such as word2vec, GloVe, Swivel, etc.
Information query