Apparatus for hardware accelerated machine learning
Abstract:
An architecture and associated techniques of an apparatus for hardware accelerated machine learning are disclosed. The architecture features multiple memory banks storing tensor data. The tensor data may be concurrently fetched by a number of execution units working in parallel. Each operational unit supports an instruction set specific to certain primitive operations for machine learning. An instruction decoder is employed to decode a machine learning instruction and reveal one or more of the primitive operations to be performed by the execution units, as well as the memory addresses of the operands of the primitive operations as stored in the memory banks. The primitive operations, upon performed or executed by the execution units, may generate some output that can be saved into the memory banks. The fetching of the operands and the saving of the output may involve permutation and duplication of the data elements involved.
Public/Granted literature
Information query
Patent Agency Ranking
0/0