Abstract:
An apparatus to facilitate optimization of a neural network (NN) is disclosed. The apparatus includes optimization logic to define a NN topology having one or more macro layers, adjust the one or more macro layers to adapt to input and output components of the NN and train the NN based on the one or more macro layers.
Abstract:
An apparatus to facilitate compute optimization is disclosed. The apparatus includes a plurality of processing units each comprising a plurality of execution units (EUs), wherein the plurality of EUs comprise a first EU type and a second EU type.
Abstract:
In an example, an apparatus comprises a plurality of processing unit cores, a plurality of cache memory modules associated with the plurality of processing unit cores, and a machine learning model communicatively coupled to the plurality of processing unit cores, wherein the plurality of cache memory modules share cache coherency data with the machine learning model. Other embodiments are also disclosed and claimed.
Abstract:
An apparatus to facilitate compute optimization is disclosed. The apparatus includes a memory device including a first integrated circuit (IC) including a plurality of memory channels and a second IC including a plurality of processing units, each coupled to a memory channel in the plurality of memory channels.
Abstract:
One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex machine learning compute operation.
Abstract:
A mechanism is described for facilitating smart distribution of resources for deep learning autonomous machines. A method of embodiments, as described herein, includes detecting one or more sets of data from one or more sources over one or more networks, and introducing a library to a neural network application to determine optimal point at which to apply frequency scaling without degrading performance of the neural network application at a computing device.
Abstract:
Methods and apparatus relating to an adaptive partition mechanism with arbitrary tile shape for tile based rendering GPU (Graphics Processing Unit) architecture are described. In an embodiment, the primitive intersection cost value for each atomic tile of an image are determined at least partially based on a vertex element size, a vertex shader length, and a number of the vertices of a primitive of the image. Other embodiments are also disclosed and claimed.
Abstract:
In accordance with some embodiments, classification of input/output requests from a database to a storage system may be performed. Each input/output request may be associated with a database class, and each database class may be mapped to a quality of service policy. Thus, quality of service may be enforced such that different data blocks within the storage system of the database may be afforded appropriate quality of service.
Abstract:
One embodiment provides a parallel processor comprising a hardware scheduler to schedule pipeline commands for compute operations to one or more of multiple types of compute units, a plurality of processing resources including a first sparse compute unit configured for input at a first level of sparsity and hybrid memory circuitry including a memory controller, a memory interface, and a second sparse compute unit configured for input at a second level of sparsity that is greater than the first level of sparsity.
Abstract:
A mechanism is described for facilitating smart distribution of resources for deep learning autonomous machines. A method of embodiments, as described herein, includes detecting one or more sets of data from one or more sources over one or more networks, and introducing a library to a neural network application to determine optimal point at which to apply frequency scaling without degrading performance of the neural network application at a computing device.