Abstract:
Multiple parallel passive threads of instructions coordinate access to shared resources using "active" and "proactive" semaphores. The active semaphores send messages to execution and/or control circuitry to cause the state of a thread to change. A thread can be placed in an inactive state by a thread scheduler in response to an unresolved dependency, which can be indicated by a semaphore. A thread state variable corresponding to the dependency is used to indicate that the thread is in inactive mode. When the dependency is resolved a message is passed to control circuitry causing the dependency variable to be cleared. In response to the cleared dependency variable the thread is placed in an active state. Execution can proceed on the threads in the active state. A proactive semaphore operates in a similar manner except that the semaphore is configured by the thread dispatcher before or after the thread is dispatched to the execution circuitry for execution.
Abstract:
A technique to enable information sharing among agents within different cache coherency domains. In one embodiment, a graphics device may use one or more caches used by one or more processing cores to store or read information, which may be accessed by one or more processing cores in a manner that does not affect programming and coherency rules pertaining to the graphics device.
Abstract:
According to one embodiment, a computer system is disclosed that includes a memory and a memory controller coupled to the memory. The memory controller includes an arbitration unit that may be programmed to operate according to a first arbitration mode or a second arbitration mode. The computer system also includes a first device and a second device coupled to the arbitration unit. According to a further embodiment, the first device is assigned a higher priority classification than the second device for accessing the memory while the arbitration unit is operating according to the first arbitration mode. In addition, the first device and the second device are assigned equal priority classifications for accessing the memory while the arbitration unit is operating according to the second arbitration mode.
Abstract:
Systems and methods may provide a graphics processor that may identify operating conditions under which certain floating point instructions may utilize power to fewer hardware resources compared to when the instructions are executing under other operating conditions. The operating conditions may be determined by examining operands used in a given instruction, including the relative magnitudes of the operands and whether the operands may be taken as equal to certain defined values. The floating point instructions may include instructions for an addition operation, a multiplication operation, a compare operation, and/or a fused multiply -add operation.
Abstract:
Active and/or proactive semaphore mechanisms and thread synchronization techniques can be applied to various visual and graphical processing techniques.
Abstract:
In some embodiments, a method includes receiving a request to generate a thread and supplying a request to a queue in response at least to the received request. The method may further include fetching a plurality of instructions in response at least in part to the request supplied to the queue and executing at least one of the plurality of instructions. In some embodiments, an apparatus includes a storage medium having stored therein instructions that when executed by a machine result in the method. In some embodiments, an apparatus includes circuitry to receive a request to generate a thread and to queue a request to generate a thread in response at least to the received request. In some embodiments, a system includes circuitry to receive a request to generate a thread and to queue a request to generate a thread in response at least to the received request, and a memory unit to store at least one instruction for the thread.
Abstract:
A first processor 101, such as a central processing unit (CPU), is in a cache coherency domain with a first set of coherency rules 109. Another processor 105, such as a graphics processing unit (GPU), is in a different coherency domain 111 with a second set of coherency rules. The first processor has a level 1 processor cache 103 (L1 cache) and a lower level processor cache 107. The second processor has a level 1 graphics cache 104 (L1 cache) and a lower level cache 108. The first processor uses the first set of coherency rules with the lower level graphics cache. The second processor uses the second set of coherency rules with the lower level graphics cache. The lower level graphics cache may be a mid-level or last level cache. The first processor may snoop the lower level graphics cache.
Abstract:
Systems and methods may provide a graphics processor that may identify operating conditions under which certain floating point instructions may utilize power to fewer hardware resources compared to when the instructions are executing under other operating conditions. The operating conditions may be determined by examining operands used in a given instruction, including the relative magnitudes of the operands and whether the operands may be taken as equal to certain defined values. The floating point instructions may include instructions for an addition operation, a multiplication operation, a compare operation, and/or a fused multiply-add operation.
Abstract:
In accordance with some embodiments, a scatter/gather memory approach may be enabled that is exposed or backed by system memory and uses conventional tags and addresses. Thus, such a technique may be more amenable to conventional software developers and their conventional techniques.