-
公开(公告)号:US12299940B2
公开(公告)日:2025-05-13
申请号:US17854310
申请日:2022-06-30
Applicant: Intel Corporation
Inventor: Sreenivas Kothandaraman , Stephen Junkins , Srihari Pratapa , Prasoonkumar Surti
Abstract: Interleaving of variable bitrate streams for GPU implementations is described. An example of an apparatus includes one or more processors including a graphic processor, the graphics processor including a super-compression encoder pipeline to provide variable width interleaved coding; and memory for storage of data, wherein the graphics processor is to perform parallel dictionary encoding on a bitstream of symbols one of multiple workgroups, the workgroup to employ a plurality of encoders to generate a plurality of token-streams of variable lengths; create a histogram including at least tokens from the plurality of token-streams for the workgroup to generate an optimized entropy code; entropy code each of the plurality of token-streams for the workgroup into an encoded bitstream; and variably interleave the encoded bitstreams to generate an interleaved bitstream and bookkeep a size of the interleaved bitstream.
-
公开(公告)号:US12223682B2
公开(公告)日:2025-02-11
申请号:US17357038
申请日:2021-06-24
Applicant: Intel Corporation
Inventor: Stephen Junkins , Sreenivas Kothandaraman , Prasoonkumar Surti , Srihari Pratapa , William Hux , John Feit
Abstract: Variable width interleaved coding for graphics processing is described. An example of an apparatus includes one or more processors including a graphic processor; and memory for storage of data including data for graphics processing, wherein the graphics processor includes an encoder pipeline to provide variable width interleaved coding and a decoder pipeline to decode the variable width interleaved coding, and wherein the encoder pipeline is to receive a plurality of bitstreams from workgroups; perform parallel entropy encoding on the bitstreams to generate a plurality of encoded bitstreams for each of the workgroups; perform variable interleaving of the bitstreams for each workgroup based at least in part on data requirements for decoding received from the decoder pipeline; and compact outputs for each of the workgroups into a contiguous stream of interleaved data.
-
公开(公告)号:US20250117360A1
公开(公告)日:2025-04-10
申请号:US18931412
申请日:2024-10-30
Applicant: Intel Corporation
Inventor: Jorge Parra , Wei-yu Chen , Kaiyu Chen , Varghese George , Junjie Gu , Chandra Gurram , Guei-Yuan Lueh , Stephen Junkins , Subramaniam Maiyuran , Supratim Pal
Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.
-
公开(公告)号:US10657618B2
公开(公告)日:2020-05-19
申请号:US16441499
申请日:2019-06-14
Applicant: Intel Corporation
Inventor: Joydeep Ray , Altug Koker , James A. Valerio , David Puffer , Abhishek R. Appu , Stephen Junkins
IPC: G06T1/60 , G06T1/20 , G06F12/0811 , G06F12/0815 , G06F12/0831 , G06F12/0888
Abstract: One embodiment provides for a general-purpose graphics processing device comprising a general-purpose graphics processing compute block to process a workload including graphics or compute operations, a first cache memory, and a coherency module enable the first cache memory to coherently cache data for the workload, the data stored in memory within a virtual address space, wherein the virtual address space shared with a separate general-purpose processor including a second cache memory that is coherent with the first cache memory.
-
公开(公告)号:US20220414053A1
公开(公告)日:2022-12-29
申请号:US17304678
申请日:2021-06-24
Applicant: Intel Corporation
Inventor: Jorge Parra , Wei-yu Chen , Kaiyu Chen , Varghese George , Junjie Gu , Chandra Gurram , Guei-Yuan Lueh , Stephen Junkins , Subramaniam Maiyuran , Supratim Pal
Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.
-
公开(公告)号:US10949945B2
公开(公告)日:2021-03-16
申请号:US16872046
申请日:2020-05-11
Applicant: Intel Corporation
Inventor: Joydeep Ray , Altug Koker , James A. Valerio , David Puffer , Abhishek R. Appu , Stephen Junkins
IPC: G06T1/60 , G06T1/20 , G06F12/0811 , G06F12/0815 , G06F12/0831 , G06F12/0888
Abstract: One embodiment provides for a general-purpose graphics processing device comprising a general-purpose graphics processing compute block to process a workload including graphics or compute operations, a first cache memory, and a coherency module enable the first cache memory to coherently cache data for the workload, the data stored in memory within a virtual address space, wherein the virtual address space shared with a separate general-purpose processor including a second cache memory that is coherent with the first cache memory.
-
公开(公告)号:US20200342564A1
公开(公告)日:2020-10-29
申请号:US16872046
申请日:2020-05-11
Applicant: Intel Corporation
Inventor: Joydeep Ray , Altug Koker , James A. Valerio , David Puffer , Abhishek R. Appu , Stephen Junkins
IPC: G06T1/20 , G06F12/0811 , G06F12/0815 , G06F12/0831 , G06F12/0888 , G06T1/60
Abstract: One embodiment provides for a general-purpose graphics processing device comprising a general-purpose graphics processing compute block to process a workload including graphics or compute operations, a first cache memory, and a coherency module enable the first cache memory to coherently cache data for the workload, the data stored in memory within a virtual address space, wherein the virtual address space shared with a separate general-purpose processor including a second cache memory that is coherent with the first cache memory.
-
公开(公告)号:US12174783B2
公开(公告)日:2024-12-24
申请号:US17304678
申请日:2021-06-24
Applicant: Intel Corporation
Inventor: Jorge Parra , Wei-yu Chen , Kaiyu Chen , Varghese George , Junjie Gu , Chandra Gurram , Guei-Yuan Lueh , Stephen Junkins , Subramaniam Maiyuran , Supratim Pal
Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.
-
公开(公告)号:US11436695B2
公开(公告)日:2022-09-06
申请号:US17200581
申请日:2021-03-12
Applicant: Intel Corporation
Inventor: Joydeep Ray , Altug Koker , James A. Valerio , David Puffer , Abhishek R. Appu , Stephen Junkins
IPC: G06T1/60 , G06T1/20 , G06F12/0811 , G06F12/0815 , G06F12/0831 , G06F12/0888
Abstract: One embodiment provides for a general-purpose graphics processing device comprising a general-purpose graphics processing compute block to process a workload including graphics or compute operations, a first cache memory, and a coherency module enable the first cache memory to coherently cache data for the workload, the data stored in memory within a virtual address space, wherein the virtual address space shared with a separate general-purpose processor including a second cache memory that is coherent with the first cache memory.
-
公开(公告)号:US20210272231A1
公开(公告)日:2021-09-02
申请号:US17200581
申请日:2021-03-12
Applicant: Intel Corporation
Inventor: Joydeep Ray , Altug Koker , James A. Valerio , David Puffer , Abhishek R. Appu , Stephen Junkins
IPC: G06T1/20 , G06F12/0811 , G06F12/0815 , G06F12/0831 , G06F12/0888 , G06T1/60
Abstract: One embodiment provides for a general-purpose graphics processing device comprising a general-purpose graphics processing compute block to process a workload including graphics or compute operations, a first cache memory, and a coherency module enable the first cache memory to coherently cache data for the workload, the data stored in memory within a virtual address space, wherein the virtual address space shared with a separate general-purpose processor including a second cache memory that is coherent with the first cache memory.
-
-
-
-
-
-
-
-
-