Abstract:
An apparatus to facilitate compute optimization is disclosed. The apparatus includes a plurality of processing units each comprising a plurality of execution units (EUs), wherein the plurality of EUs comprise a first EU type and a second EU type
Abstract:
In an example, an apparatus comprises a compute engine comprising a high precision component and a low precision component; and logic, at least partially including hardware logic, to receive instructions in the compute engine; select at least one of the high precision component or the low precision component to execute the instructions; and apply a gate to at least one of the high precision component or the low precision component to execute the instructions. Other embodiments are also disclosed and claimed.
Abstract:
An integrated circuit (IC) package apparatus is disclosed. The IC package includes one or more processing units and a bridge, mounted below the one or more processing unit, including one or more arithmetic logic units (ALUs) to perform atomic operations.
Abstract:
Techniques related to graphics rendering are discussed. Such techniques may include predicting primitive intersection information for tiles of a frame, rendering the frame on a tile-by-tile basis based on the predicted primitive intersection information, and re-rendering any tiles with predicted primitive errors.
Abstract:
An energy aware framework for computation and communication devices (CCDs) is disclosed. CCDs may support applications, which may participate in energy aware optimization. Such applications may be designed to support execution modes, which may be associated with different computation and communication demands or requirements. An optimization block may collect computation requirement values (CRVM), communication demand values (CDVM), and such other values of each execution mode to perform a specific task(s). The optimization block may collect computation energy cost information (CECIM) and multi-radio communication energy cost information (MCECIM) for each execution mode. Also, the optimization block may collect the workload values of a cloud-side processing device. The optimization block may determine power estimation values (PEV), based on the energy cost values (CECIM), (MCECIM), CRVM, and CDVM. The optimization block may then determine the execution mode or the apparatus best suited to perform the tasks.
Abstract:
One embodiment provides a parallel processor comprising a hardware scheduler to schedule pipeline commands for compute operations to one or more of multiple types of compute units, a plurality of processing resources including a first sparse compute unit configured for input at a first level of sparsity and hybrid memory circuitry including a memory controller, a memory interface, and a second sparse compute unit configured for input at a second level of sparsity that is greater than the first level of sparsity.
Abstract:
Embodiments are generally directed to an adaptive deformable kernel prediction network for image de-noising. An embodiment of a method for de-noising an image by a convolutional neural network implemented on a compute engine, the image including a plurality of pixels, the method comprising: for each of the plurality of pixels of the image, generating a convolutional kernel having a plurality of kernel values for the pixel; generating a plurality of offsets for the pixel respectively corresponding to the plurality of kernel values, each of the plurality of offsets to indicate a deviation from a pixel position of the pixel; determining a plurality of deviated pixel positions based on the pixel position of the pixel and the plurality of offsets; and filtering the pixel with the convolutional kernel and pixel values of the plurality of deviated pixel positions to obtain a de-noised pixel.
Abstract:
One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to cause the compute apparatus to perform a complex compute operation.
Abstract:
Embodiments provide mechanisms to facilitate compute operations for deep neural networks. One embodiment comprises a graphics processing unit comprising one or more multiprocessors, at least one of the one or more multiprocessors including a register file to store a plurality of different types of operands and a plurality of processing cores. The plurality of processing cores includes a first set of processing cores of a first type and a second set of processing cores of a second type. The first set of processing cores are associated with a first memory channel and the second set of processing cores are associated with a second memory channel.
Abstract:
An integrated circuit (IC) package apparatus is disclosed. The IC package includes one or more processing units and a bridge, mounted below the one or more processing unit, including one or more arithmetic logic units (ALUs) to perform atomic operations.