Abstract:
An example system on a chip (SoC) includes a processor, a cache, and a main memory. The SoC can include a first memory to store data in a memory line, wherein the memory line is set to an invalid state. The processor can include a processor coupled to the first memory. The processor can determine that a data size of a first data set received from an application is within a data size range. The processor can determine that an aggregate data size of the first data set and a second data set received from the application is at least a same data size as data size of the memory line. The processor can perform an invalid-to-modify (I2M) operation to change the memory line from the invalid state to a modified state. The processor can write the first data set and the second data set to the memory line.
Abstract:
Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in the form of decode circuitry to decode an instruction having fields for an opcode, a destination matrix operand identifier, and source memory information, and execution circuitry to execute the decoded instruction to load groups of strided data elements from memory into configured rows of the identified destination matrix operand to memory.
Abstract:
Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address; and execution circuitry to execute the decoded instruction to set a tile configuration for the processor to utilize tiles in matrix operations based on a description retrieved from the memory address, wherein a tile a set of 2-dimensional registers are discussed.
Abstract:
A method of analyzing aborts of transactional execution transactions. Starting a transactional execution transaction with a first logical processor. Performing, with a second logical processor, store to memory instructions, while the first logical processor is performing the transactional execution transaction. Capturing memory addresses of, and instruction pointer values associated with, at least a sample of the store to memory instructions. Performing, with the second logical processor, a first store to memory instruction to a first memory address, which is to cause the transactional execution transaction to abort. Capturing the first memory address. Determining an instruction pointer value associated with the first store to memory instruction by correlating at least the captured first memory address with the captured memory addresses of said at least the sample of the store to memory instructions.
Abstract:
Methods and apparatuses relating to memory corruption detection are described. In one embodiment, a hardware processor includes an execution unit to execute an instruction to request access to a block of a memory through a pointer to the block of the memory, and a memory management unit to allow access to the block of the memory when a memory corruption detection value in the pointer is validated with a memory corruption detection value in the memory for the block, wherein a position of the memory corruption detection value in the pointer is selectable between a first location and a second, different location.
Abstract translation:描述与存储器损坏检测有关的方法和设备。 在一个实施例中,硬件处理器包括执行单元和存储器管理单元,所述执行单元执行指令以请求访问存储器的块,所述指针指向存储器的块,以及存储器管理单元, 指针中的存储器损坏检测值用该存储器中用于该块的存储器损坏检测值进行验证,其中指针中存储器损坏检测值的位置可在第一位置和第二不同位置之间选择。 p >
Abstract:
In accordance with embodiments disclosed herein, there are provided methods, systems, mechanisms, techniques, and apparatuses for implementing efficient communication between caches in hierarchical caching design. For example, in one embodiment, such means may include an integrated circuit having a data bus; a lower level cache communicably interfaced with the data bus; a higher level cache communicably interfaced with the data bus; one or more data buffers and one or more dataless buffers. The data buffers in such an embodiment being communicably interfaced with the data bus, and each of the one or more data buffers having a buffer memory to buffer a full cache line, one or more control bits to indicate state of the respective data buffer, and an address associated with the full cache line. The dataless buffers in such an embodiment being incapable of storing a full cache line and having one or more control bits to indicate state of the respective dataless buffer and an address for an inter-cache transfer line associated with the respective dataless buffer. In such an embodiment, inter-cache transfer logic is to request the inter-cache transfer line from the higher level cache via the data bus and is to further write the inter-cache transfer line into the lower level cache from the data bus.
Abstract:
An apparatus and method are described for efficiently transferring data from a producer core to a consumer core within a central processing unit (CPU). For example, one embodiment of a method comprises: A method for transferring a chunk of data from a producer core of a central processing unit (CPU) to consumer core of the CPU, comprising: writing data to a buffer within the producer core of the CPU until a designated amount of data has been written; upon detecting that the designated amount of data has been written, responsively generating an eviction cycle, the eviction cycle causing the data to be transferred from the fill buffer to a cache accessible by both the producer core and the consumer core; and upon the consumer core detecting that data is available in the cache, providing the data to the consumer core from the cache upon receipt of a read signal from the consumer core.
Abstract:
A processing system includes a processing core to execute a virtual machine (VM) comprising a guest operating system (OS) and a memory management unit, communicatively coupled to the processing core, comprising a storage device to store an extended page table entry (EPTE) comprising a mapping from a guest physical address (GPA) associated with the guest OS to an identifier of a memory frame, a first plurality of access right flags associated with accessing the memory frame in a first page mode referenced by an attribute of a memory page identified by the GPA, and a second plurality of access right flags associated with accessing the memory frame in a second page mode referenced by the attribute of the memory page identified by the GPA.
Abstract:
An apparatus and method are described for at-retirement re-execution of faulting operations. For example, one embodiment of a processor comprises: an out-of-order engine to schedule and dispatch operations to an execution unit at least some of the operations comprising load operations to load data from a system memory and store operations to store data to the system memory; a first circuit to determine whether a current load/store operation is at retirement; a second circuit to cause logging circuitry and/or fault registers to be active when a load/store operation has been dispatched at retirement, wherein upon detection of a fault condition associated with the load/store operation, data associated with the fault is to be written to the logging circuitry and/or fault registers, the second circuit to cause the logging circuitry and/or fault registers to be inactive if the load/store operation has not be dispatched at retirement.
Abstract:
An apparatus and method are described for efficiently transferring data from a core of a central processing unit (CPU) to a graphics processing unit (GPU). For example, one embodiment of a method comprises: writing data to a buffer within the core of the CPU until a designated amount of data has been written; upon detecting that the designated amount of data has been written, responsively generating an eviction cycle, the eviction cycle causing the data to be transferred from the buffer to a cache accessible by both the core and the GPU; setting an indication to indicate to the GPU that data is available in the cache; and upon the GPU detecting the indication, providing the data to the GPU from the cache upon receipt of a read signal from the GPU.