Near-memory processing of embeddings method and system for reducing memory size and energy in deep learning-based recommendation systems
Abstract:
Provided is a hybrid near-memory processing system including a GPU, a PIM-HBM, a CPU, and a main memory. An embedding vector is loaded through the GPU and the PIM-HBM, an embedding table is divided and stored in the main memory and the HBM in a training process for inference of a recommendation system, an embedding lookup operation is performed in the main memory or the HBM according to a location of a necessary embedding vector in an inference process of the recommendation system, an additional embedding manipulation operation is performed in the CPU and the PIM with respect to the embedding vector of which the embedding lookup operation is completed, embedding vectors processed through embedding manipulation are finally concatenated in the PIM to generate an embedding result, and the embedding result is transmitted to the GPU to derive a final inference result through a top multiplayer perceptron (MLP) process.
Information query
Patent Agency Ranking
0/0