TEXTURING/SHADING IN A GPU PIPELINE BYPASSING BILINEAR FILTER

    公开(公告)号:US20240273805A1

    公开(公告)日:2024-08-15

    申请号:US18642375

    申请日:2024-04-22

    CPC classification number: G06T15/005 G06T15/04

    Abstract: A method of operation of a texturing/shading unit in a GPU pipeline is used for efficient convolution operations. The method uses texture hardware to collectively fetch all the texels required to calculate properties for a group of output pixels without any duplication. The method then bypasses bilinear filter hardware in the texture hardware and passes the fetched and unfiltered texel data from the texture hardware unit to shader hardware in the texturing/shading unit. The shader hardware uses the fetched texel data to perform a plurality of convolution operations to calculate the properties of each of the output pixel.

    APPLYING TEXTURE PROCESSING TO A BLOCK OF FRAGMENTS IN A GRAPHICS PROCESSING UNIT

    公开(公告)号:US20240273669A1

    公开(公告)日:2024-08-15

    申请号:US18394274

    申请日:2023-12-22

    CPC classification number: G06T1/60 G06T1/20 G06T11/001

    Abstract: A method and graphics processing unit (GPU) are provided for applying texture processing to a block of fragments, each of the fragments being associated with a texture coordinate for each of a plurality of dimensions of a texture. A fragment processing unit of the GPU detects that the texture coordinates for the fragments of the block are axis-aligned, and in response to detecting that the texture coordinates for the fragments of the block are axis-aligned, sends a reduced set of texture coordinates to a texture processing unit of the GPU. The texture processing unit: (i) processes the reduced set of texture coordinates to generate texel addresses of texels to be fetched, (ii) fetches texels using the generated texel addresses, (iii) determines a processed value for each of the fragments of the block based on the fetched texels, and (iv) outputs the processed values.

    Method and apparatus for rendering a computer generated image

    公开(公告)号:US12051159B2

    公开(公告)日:2024-07-30

    申请号:US18205121

    申请日:2023-06-02

    Inventor: Simon Fenney

    CPC classification number: G06T17/20 G06T9/00 G06T15/005

    Abstract: A method and apparatus for rendering a computer-generated image using a stencil buffer is described. The method divides an arbitrary closed polygonal contour into first and higher level primitives, where first level primitives correspond to contiguous vertices in the arbitrary closed polygonal contour and higher level primitives correspond to the end vertices of consecutive primitives of the immediately preceding primitive level. The method reduces the level of overdraw when rendering the arbitrary polygonal contour using a stencil buffer compared to other image space methods. A method of producing the primitives in an interleaved order, with second and higher level primitives being produced before the final first level primitives of the contour, is described which improves cache hit rate by reusing more vertices between primitives as they are produced.

    SMALL MULTIPLIER AFTER INITIAL APPROXIMATION FOR OPERATIONS WITH INCREASING PRECISION

    公开(公告)号:US20240241695A1

    公开(公告)日:2024-07-18

    申请号:US18395836

    申请日:2023-12-26

    Inventor: Leonard Rarick

    Abstract: In an aspect, a processor includes circuitry for iterative refinement approaches, e.g., Newton-Raphson, to evaluating functions, such as square root, reciprocal, and for division. The circuitry includes circuitry for producing an initial approximation; which can include a LookUp Table (LUT). LUT may produce an output that (with implementation-dependent processing) forms an initial approximation of a value, with a number of bits of precision. A limited-precision multiplier multiplies that initial approximation with another value; an output of the limited precision multiplier goes to a full precision multiplier circuit that performs remaining multiplications required for iteration(s) in the particular refinement process being implemented. For example, in division, the output being calculated is for a reciprocal of the divisor. The full-precision multiplier circuit requires a first number of clock cycles to complete, and both the small multiplier and the initial approximation circuitry complete within the first number of clock cycles.

    RETRIEVING A BLOCK OF DATA ITEMS IN A PROCESSOR

    公开(公告)号:US20240233064A1

    公开(公告)日:2024-07-11

    申请号:US18394904

    申请日:2023-12-22

    CPC classification number: G06T1/20 G06T11/001 G06T2210/52

    Abstract: A method and processor for retrieving a block of data items, each being associated with a coordinate for each of dimensions of a stored data array. A data processing unit detects that the coordinates are axis-aligned. In response to detecting that the coordinates are axis-aligned, the following are sent to a data load unit: only one coordinate for a first dimension for each line of data items aligned in the first dimension within the block, and only one coordinate for a second dimension for each line of data items aligned in the second dimension within the block, the second dimension being orthogonal to the first dimension. The data load unit: (i) processes the coordinates to generate addresses of data array elements to be fetched from the stored data array, (ii) fetches data array elements using the generated addresses, (iii) determines data item values based on the fetched data array elements, and (iv) outputs the data item values.

    IMPLEMENTING NEURAL NETWORKS IN HARDWARE
    59.
    发明公开

    公开(公告)号:US20240232596A1

    公开(公告)日:2024-07-11

    申请号:US18393320

    申请日:2023-12-21

    Inventor: Xiran Huang

    CPC classification number: G06N3/063

    Abstract: Methods of implementing a neural network in hardware, the neural network including a plurality of layers and the layers being grouped into a plurality of layer groups, each layer group comprising one or more layers of the neural network that are processed in a single pass through the hardware. The layer groups are grouped into a plurality of tile groups, each tile group comprising a set of layer groups that are evaluated when executing the neural network. The method comprises pre-fetching a portion of the input data for a first layer group in a tile group into a buffer slot in on-chip memory; and subsequently releasing the buffer slot after output data for the first layer group has been written to memory.

Patent Agency Ranking