Abstract:
A mechanism is described for facilitating sharing of data and compression expansion of models at autonomous machines. A method of embodiments, as described herein, includes detecting a first processor processing information relating to a neural network at a first computing device, where the first processor comprises a first graphics processor and the first computing device comprises a first autonomous machine. The method further includes facilitating the first processor to store one or more portions of the information in a library at a database, where the one or more portions are accessible to a second processor of a computing device.
Abstract:
An embodiment of a graphics apparatus may include a processor, memory communicatively coupled to the processor, and a collaboration engine communicatively coupled to the processor to identify a shared graphics component between two or more users in an environment, and share the shared graphics components with the two or more users in the environment. Embodiments of the collaboration engine may include one or more of a centralized sharer, a depth sharer, a shared preprocessor, a multi-port graphics subsystem, and a decode sharer. Other embodiments are disclosed and claimed.
Abstract:
An embodiment of an electronic processing system may include an application processor, persistent storage media communicatively coupled to the application processor, and a graphics subsystem communicatively coupled to the application processor. The system may include one or more of a draw call re-orderer communicatively coupled to the application processor and the graphics subsystem to re-order two or more draw calls, a workload re-orderer communicatively coupled to the application processor and the graphics subsystem to re-order two or more work items in an order independent mode, a queue primitive included in at least one of the two or more draw calls to define a producer stage and a consumer stage, and an order-independent executor communicatively coupled to the application processor and the graphics subsystem to provide tile-based order independent execution of a compute stage. Other embodiments are disclosed and claimed.
Abstract:
Various embodiments are generally directed to techniques for causing the storage of a color data value of a clear color to be deferred as rendered color data values are stored for samples. A device comprises a processor circuit and a storage to store instructions that cause the processor circuit to render a pixel from multiple samples taken of a three-dimensional model of an object, the pixel corresponding to a pixel sample data which comprises multiple color storage locations that are each identified by a numeric identifier, and which comprises multiple sample color indices that each correspond to a sample to point to at least one color storage location; and allocate color storage locations in an order selected to define a subset of possible combinations of binary index values among all of the sample color indices as invalid combinations. Other embodiments are described and claimed.
Abstract:
An apparatus is disclosed with one or more processors including a graphics processing unit, the graphics processing unit including a graphics processing pipeline, and a memory to store data, including graphics data processed by the graphics processing pipeline, wherein the graphics processing unit is to conduct a training session with an application, the training session including a plurality of executions of the application utilizing the graphics processing pipeline, wherein the plurality of executions of the application includes executing the application under a plurality of different operating parameters, a plurality of different hardware configurations, or both, collect performance data for the application during the plurality of executions of the application, generate a performance profile for the application as processed in the graphics processing pipeline based on the collected performance data, train a neural network to configure the graphics processing pipeline based on performance profile data from the performance profile for the application, and utilize the trained neural network to configure the graphics processing pipeline to execute an instance of the application. Furthermore, a method and one or more computer-readable media are disclosed.
Abstract:
One embodiment provides for a method of performing machine learning operations for an autonomous vehicle, with determining that a computational workload is to be processed by a computing device of the autonomous vehicle, determining a first latency to a remote datacenter via an autonomous vehicle network, dispatching at least a first portion of the computational workload for processing via the remote datacenter when the first latency is below a threshold associated with the computational workload, determining a second latency to an autonomous vehicle within range of wireless network device in response to a determination that the first latency is above the threshold associated with the computational workload, and dispatching at least a second portion of the computational workload in response determining that the second latency is below the threshold associated with the computational workload.
Abstract:
An apparatus to facilitate memory tiling is disclosed. The apparatus includes a memory, one or more execution units (EUs) to execute a plurality of processing threads via access to the memory and tiling logic to apply a tiling pattern to memory addresses for data stored in the memory.
Abstract:
Disclosed embodiments relate to apparatuses, systems, and methods for performing sort indexing and/or permutation using an index. An exemplary apparatus or processor comprises decode circuitry to decode an instruction, the instruction to include a first field to identify a location of a source vector, a second field to identify a location of a destination vector, and an opcode to indicate to execution circuitry to execute the decoded instruction to sort values of the source vector and store a result of the sort in the destination vector by generating, per each element of the source vector, an index value, and permuting the values of the elements of the source vector into the destination vector based upon the index values for the element, and execution circuitry to execute the decoded instruction as indicated by the opcode.