Abstract:
Disclosed embodiments relate to a variable format, variable sparsity matrix multiplication (VFVSMM) instruction. In one example, a processor includes fetch and decode circuitry to fetch and decode a VFVSMM instruction specifying locations of A, B, and C matrices having (M×K), (K×N), and (M×N) elements, respectively, execution circuitry, responsive to the decoded VFVSMM instruction, to: route each row of the specified A matrix, staggering subsequent rows, into corresponding rows of a (M×N) processing array, and route each column of the specified B matrix, staggering subsequent columns, into corresponding columns of the processing array, wherein each of the processing units is to generate K products of A-matrix elements and matching B-matrix elements having a same row address as a column address of the A-matrix element, and to accumulate each generated product with a corresponding C-matrix element.
Abstract:
A signal comprising a first edge and a second edge is received. The first edge of the signal is synchronized with a first clock and the synchronized first edge of the signal is passed to an output. The synchronization results in a delay of the first edge of the signal. The second edge of the signal is passed to the output. The passed second edge of the signal has a delay that is less than the delay of the first edge of the signal by at least one clock cycle of the first clock.
Abstract:
Described is an apparatus (e.g., a router) which comprises: multiple ports; and a plurality of crossbar circuits arranged such that at least one crossbar circuit receives all interconnects associated with a data bit of the multiple ports and is operable to re-route signals on those interconnects.
Abstract:
A router of a network-on-chip receives delay information associated with a plurality of links of the network-on-chip. The router determines at least one link of a data path based on the delay information.
Abstract:
A packet-switched request from a first router of a network-on-chip is received. The packet-switched request is generated by source logic of the network-on-chip. Circuit-switched data associated with the packet switched request is also received. The circuit-switched data is stored by a storage element. The circuit-switched data is sent towards destination logic identified in the packet-switched request.
Abstract:
Embodiments of the present invention may provide methods and circuits for energy efficient floating point multiply and/or add operations. A variable precision floating point circuit may determine the certainty of the result of a multiply-add floating point calculation in parallel with the floating-point calculation. The variable precision floating point circuit may use the certainty of the inputs in combination with information from the computation, such as, binary digits that cancel, normalization shifts, and rounding, to perform a calculation of the certainty of the result. A floating point multiplication circuit may determine whether a lowest portion of a multiplication result could affect the final result and may induce a replay of the multiplication operation when it is determined that the result could affect the final result.