Patent search ap:("INTEL CORP") AND inv:"TIAN XINMIN" Page 1

1.

发明专利
Method and device for reducing memory latency in software application 有权
Title translation: 用于在软件应用中减少存储器延迟的方法和装置

公开(公告)号：JP2011090705A

公开(公告)日：2011-05-06

申请号：JP2010286087

申请日：2010-12-22

Applicant: Intel Corp , インテル・コーポレーション

Inventor： WANG HONG , WAN PERRY , GIRKAR MILIND , SAITO HIDEKI , LAVERY DANIEL , HOFLEHNER GEROLF , TIAN XINMIN , SHEN JOHN , LIAO SHIH-WEI , KIM DONGKEUN , HAAB GRANT , SHAH SANJIV

IPC: G06F9/45 , G06F9/38 , G06F9/46 , G06F9/48 , G06F12/08

CPC classification number: G06F9/3851 , G06F8/4442 , G06F9/383 , G06F9/4843 , G06F9/52

Abstract: PROBLEM TO BE SOLVED: To provide a method and a device for reducing a memory latency in a software application.
SOLUTION: A performance analysis tool 208 is used to profile a resource use amount of the software application 210, and specifies an area of the software application 210 experiencing a performance bottleneck. A compiler runtime command is generated within the software application, to generate and manage a helper thread. The helper thread prefetches a data in the specified areas of the software application experiencing the performance bottleneck. A counting mechanism is inserted into the helper thread and the counting mechanism is inserted into a main thread, to help ensure the prefetched data is not removed from a cache before the main thread is able to take advantage of the prefetched data.
COPYRIGHT: (C)2011,JPO&INPIT

Abstract translation: 要解决的问题：提供一种用于减少软件应用程序中的存储器延迟的方法和装置。解决方案：性能分析工具208用于描述软件应用210的资源使用量，并指定软件应用210遇到性能瓶颈的区域。在软件应用程序中生成编译器运行时命令，以生成和管理辅助线程。辅助线程将预览遇到性能瓶颈的软件应用程序的指定区域中的数据。计数机制被插入到辅助线程中，并且计数机制被插入到主线程中，以帮助确保在主线程能够利用预取数据之前，预取数据不被从高速缓存中移除。版权所有（C）2011，JPO＆INPIT

2.

发明专利
Mechanism for instruction set based on thread execution on plurality of instruction sequencers 有权
Title translation: 基于指令序列的多项式执行指令集的机制

公开(公告)号：JP2011023032A

公开(公告)日：2011-02-03

申请号：JP2010204922

申请日：2010-09-13

Applicant: Intel Corp , インテル・コーポレーション

Inventor： WANG HONG , SHEN JOHN , GROCHOWSKI ED , HELD JAMES PAUL , BIGBEE BRYANT , KAUSHIK SHIVNANDAN D , CHINYA GAUTHAM , ZOU XIANG , HAMMARLUND PER , TIAN XINMIN , AGGARWAL ANIL , RODGERS SCOTT DION , PATEL BAIJU V , RICHARD HANKINS

IPC: G06F9/46 , G06F9/38

CPC classification number: G06F9/3851 , G06F9/4843

Abstract: PROBLEM TO BE SOLVED: To provide a mechanism for scheduling user-level threads so that the user-level threads can be executed on a processor that is not directly managed by an OS.
SOLUTION: User-level threads on a first instruction sequencer are managed in response to executing user-level instructions on a second instruction sequencer that is under control of an application level program. A first user-level thread is run on the second instruction sequencer and contains one or more user level instructions. A first user level instruction has at least (1) a field that makes reference to one or more instruction sequencers or (2) implicitly references with a pointer to a code that specifically addresses one or more instruction sequencers when the code is executed.
COPYRIGHT: (C)2011,JPO&INPIT

Abstract translation: 要解决的问题：提供一种用于调度用户级线程的机制，使得可以在不由OS直接管理的处理器上执行用户级线程。解决方案：响应于在应用级程序的控制下的第二指令定序器上执行用户级指令来管理第一指令定序器上的用户级线程。在第二指令定序器上运行第一用户级线程并且包含一个或多个用户级指令。第一用户级指令至少具有（1）引用一个或多个指令定序器的字段，或（2）隐含地引用指向代码执行时特定地址一个或多个指令定序器的代码的指针。版权所有（C）2011，JPO＆INPIT

3.

发明申请
LOOP PARALLELIZATION BASED ON LOOP SPLITTING OR INDEX ARRAY 审中-公开
Title translation: 基于环路分割或索引阵列的环路并行化

公开(公告)号：WO2012087988A3

公开(公告)日：2012-09-27

申请号：PCT/US2011065948

申请日：2011-12-19

Applicant: INTEL CORP , LIN JIN , RAVI NISHKAM , TIAN XINMIN , NG JOHN L , VALIULLIN RENAT V

Inventor： LIN JIN , RAVI NISHKAM , TIAN XINMIN , NG JOHN L , VALIULLIN RENAT V

IPC: G06F9/38 , G06F9/45

CPC classification number: G06F8/456 , G06F8/4441

Abstract: Methods and apparatus to provide loop parallelization based on loop splitting and/or index array are described. In one embodiment, one or more split loops, corresponding to an original loop, are generated based on the mis-speculation information. In another embodiment, a plurality of subloops are generated from an original loop based on an index array. Other embodiments are also described.

Abstract translation: 描述了基于环路分割和/或索引阵列提供环路并行化的方法和装置。在一个实施例中，基于错误猜测信息生成对应于原始循环的一个或多个分割循环。在另一个实施例中，基于索引阵列从原始循环生成多个子循环。还描述了其它实施例。

4.

发明申请
SPECULATIVE COMPILATION TO GENERATE ADVICE MESSAGES 审中-公开
Title translation: 生成建议信息的统一编译

公开(公告)号：WO2012064690A3

公开(公告)日：2012-07-26

申请号：PCT/US2011059701

申请日：2011-11-08

Applicant: INTEL CORP , KRISHNAIYER RAKESH , IDO HIDEKI SAITO , SU ERNESTO , NG JOHN L , LIN JIN , TIAN XINMIN , GEVA ROBERT Y

Inventor： KRISHNAIYER RAKESH , IDO HIDEKI SAITO , SU ERNESTO , NG JOHN L , LIN JIN , TIAN XINMIN , GEVA ROBERT Y

IPC: G06F9/45

CPC classification number: G06F8/4441 , G06F8/41 , G06F11/3664

Abstract: Methods to improve optimization of compilation are presented. In one embodiment, a method includes identifying one or more optimization speculations with respect to a code region and speculatively performing transformation on an intermediate representation of the code region in accordance with an optimization speculation. The method includes generating an advice message corresponding to the optimization speculation and displaying the advice message if the optimization speculation results in an improved compilation result.

Abstract translation: 提出了改进编译优化的方法。在一个实施例中，一种方法包括根据优化推测识别关于代码区域的一个或多个优化推测和对代码区域的中间表示进行推测地执行变换。该方法包括生成与优化推测对应的建议消息，并且如果优化推测导致改进的编译结果，则显示该建议消息。

5.

发明申请
METHODS AND APPARATUS FOR REDUCING MEMORY LATENCY IN A SOFTWARE APPLICATION 审中-公开
Title translation: 用于减少软件应用中的存储器延迟的方法和装置

公开(公告)号：WO2005033926A3

公开(公告)日：2005-12-29

申请号：PCT/US2004032212

申请日：2004-09-29

Applicant: INTEL CORP

Inventor： TIAN XINMIN , LIAO SHIH-WEI , WANG HONG , GIRKAR MILIND , SHEN JOHN , WANG PERRY , HAAB GRANT , HOFLEHNER GEROLF , LAVERY DANIEL , SAITO HIDEKI , SHAH SANJIV , KIM DONGKEUN

IPC: G06F9/38 , G06F9/45 , G06F9/46 , G06F9/48

CPC classification number: G06F9/3851 , G06F8/4442 , G06F9/383 , G06F9/4843 , G06F9/52

Abstract: Methods and apparatus for reducing memory latency in a software application are disclosed. A disclosed system uses one or more helper threads to prefetch variables for a main thread to reduce performance bottlenecks due to memory latency and/or a cache miss. A performance analysis tool is used to profile the software application's resource usage and identifies areas in the software application experiencing performance bottlenecks. Compiler-runtime instructions are generated into the software application to create and manage the helper thread. The helper thread prefetches data in the identified areas of the software application experiencing performance bottlenecks. A counting mechanism is inserted into the helper thread and a counting mechanism is inserted into the main thread to coordinate the execution of the helper thread with the main thread and to help ensure the prefetched data is not removed from the cache before the main thread is able to take advantage of the prefetched data.

Abstract translation: 公开了用于减少软件应用中的存储器延迟的方法和设备。所公开的系统使用一个或多个助手线程来预取主线程的变量以减少由于存储器延迟和/或缓存未命中导致的性能瓶颈。性能分析工具用于剖析软件应用程序的资源使用情况，并识别出现性能瓶颈的软件应用程序中的区域。编译器运行时指令生成到软件应用程序中以创建和管理帮助程序线程。帮助程序线程会预取遇到性能瓶颈的软件应用程序的标识区域中的数据。将计数机制插入到辅助线程中，并将计数机制插入到主线程中，以协调辅助线程与主线程的执行，并帮助确保在主线程能够执行之前，预取的数据不会从缓存中移除利用预取数据。

6.

发明专利
Multi-entry threading method and apparatus for automatic and directive-guided parallelization of a source program 未知

公开(公告)号：AU6679601A

公开(公告)日：2002-01-14

申请号：AU6679601

申请日：2001-06-08

Applicant: INTEL CORP

Inventor： KIRKEGAARD KNUD , GIRKAR MILIND , GREY PAUL , TIAN XINMIN

IPC: G06F9/45 , G06F9/00

7.

发明申请
SYSTEM, METHOD AND APPARATUS FOR DEPENDENCY CHAIN PROCESSING 审中-公开
Title translation: 用于依赖链处理的系统，方法和装置

公开(公告)号：WO2006036504A2

公开(公告)日：2006-04-06

申请号：PCT/US2005032118

申请日：2005-09-12

Applicant: INTEL CORP

Inventor： NARAYANASAMY SATISH , WANG HONG , SHEN JOHN , ROSNER RONI , ALMOG YOAV , SCHWARTZ NAFTALI , HOFLEHNER GEROLF , LAVERY DANIEL , LI WEI , TIAN XINMIN , GIRKAR MILIND , WANG PERRY

IPC: G06F9/45

CPC classification number: G06F8/443 , G06F8/433 , G06F8/451

Abstract: Embodiments of the present invention provide a method, apparatus and system which may include splitting a dependency chain into a set of reduced-width dependency chains; mapping one or more dependency chains onto one or more clustered dependency chain processors, wherein an issue-width of one or more of the clusters is adapted to accommodate a size of the dependency chains; and/or processing in parallel a plurality of dependency chains of a trace. Other embodiments are described and claimed.

Abstract translation: 本发明的实施例提供了一种方法，装置和系统，其可以包括将依赖链分解成一组缩减宽度的依赖性链; 将一个或多个依赖关系链映射到一个或多个聚类依赖链处理器上，其中一个或多个所述簇的问题宽度适于适应所述依赖关系链的大小; 和/或并行处理多个跟踪的依赖性链。描述和要求保护其他实施例。

8.

发明申请
METHODS AND APPARATUSES FOR COMPILER-CREATING HELPER THREADS FOR MULTI-THREADING 审中-公开
Title translation: 编译器创建用于多线程的助理线程的方法和设备

公开(公告)号：WO2005033931A3

公开(公告)日：2005-12-22

申请号：PCT/US2004032461

申请日：2004-09-30

Applicant: INTEL CORP , LIAO SHIH-WEI , TIAN XINMIN , HOFLEHNER GEROLF F , WANG HONG , LAVERY DANIEL M , WANG PERRY , KIM DONGKEUN , GIRKAR MILIND , SHEN JOHN P

Inventor： LIAO SHIH-WEI , TIAN XINMIN , HOFLEHNER GEROLF F , WANG HONG , LAVERY DANIEL M , WANG PERRY , KIM DONGKEUN , GIRKAR MILIND , SHEN JOHN P

IPC: G06F9/38 , G06F9/45

CPC classification number: G06F9/3842 , G06F8/4442 , G06F9/383 , G06F9/3851

Abstract: Methods and apparatuses for compiler- created helper thread for multithreading are described herein. In one embodiment, exemplary process includes identifying a region of a main thread that likely has one or more delinquent loads, the one or more delinquent loads representing loads which likely suffer cache misses during an execution of the main thread, analyzing the region for one or more helper threads with respect to the main thread, and generating code for the one or more helper threads, the one or more helper threads being speculatively executed in parallel with the main thread to perform one or more tasks for the region of the main thread. Other methods and apparatuses are also described.

Abstract translation: 这里描述了编译器创建的用于多线程的助手线程的方法和装置。在一个实施例中，示例性过程包括识别可能具有一个或多个延误加载的主线程的区域，一个或多个延误加载代表在执行主线程期间可能遭受缓存未命中的加载，分析该区域一个或关于主线程的更多助手线程，以及为一个或多个助手线程生成代码，所述一个或多个助手线程与主线程并行推测性地执行以执行主线程的区域的一个或多个任务。还描述了其他方法和装置。

9.

发明申请
MULTI-ENTRY THREADING METHOD AND APPARATUS FOR AUTOMATIC AND DIRECTIVE-GUIDED PARALLELIZATION OF A SOURCE PROGRAM 审中-公开
Title translation: 用于自动和方向引导的源程序并行化的多入口打包方法和装置

公开(公告)号：WO0203194A2

公开(公告)日：2002-01-10

申请号：PCT/US0118614

申请日：2001-06-08

Applicant: INTEL CORP , KIRKEGAARD KNUD , GIRKAR MILIND , GREY PAUL , TIAN XINMIN

Inventor： KIRKEGAARD KNUD , GIRKAR MILIND , GREY PAUL , TIAN XINMIN

IPC: G06F9/45 , G06F9/00

CPC classification number: G06F8/456 , G06F8/443

Abstract: A method and apparatus for compiling a source program are described. Multiple predetermined sequences within the source program are located. A start code is inserted in the source program prior to a first instruction of each predetermined sequence. An invocation code is inserted in the source program prior to the start code, the invocation code addressing the start code and transferring each sequence to a system for execution. Finally, a stop code is inserted in the source program after a last instruction of each sequence, the stop code signaling to the system to step execution of the sequence.

Abstract translation: 描述用于编译源程序的方法和装置。位于源程序内的多个预定序列。在每个预定序列的第一指令之前，在源程序中插入起始码。在起始代码之前的源程序中插入一个调用代码，调用代码寻址起始代码，并将每个序列传送到一个系统执行。最后，在每个序列的最后一个指令之后，在源程序中插入一个停止代码，停止代码向系统发出信号以逐步执行序列。

10.

发明专利
VERBESSERUNGEN DER VERARBEITUNG UND DES CACHING VON GRAPHIKVERARBEITUNGSEINHEITEN 未知

公开(公告)号：DE102020129969A1

公开(公告)日：2021-05-20

申请号：DE102020129969

申请日：2020-11-13

Applicant: INTEL CORP

Inventor： MAIYURAN SUBRAMANIAM , BILAGI DURGAPRASAD , RAY JOYDEEP , JANUS SCOTT , JAHAGIRDAR SANJEEV , INSKO BRENT , XU LIDONG , APPU ABHISHEK R , HOLLAND JAMES , RANGANATHAN VASANTH , KABURLASOS NIKOS , KOKER ALTUG , TIAN XINMIN , LUEH GUEI-YUAN , WANG CHANGLIANG

IPC: G06T1/20 , G06F9/38 , G06T1/40

Abstract: Die hier beschriebenen Ausführungsformen sind im Allgemeinen auf Verbesserungen bezüglich der Leistungs-, Latenzzeit-, Bandbreiten- und/oder Leistungsfähigkeitsprobleme bezüglich der GPU-Verarbeitung/des Cachings gerichtet. Gemäß einer Ausführungsform enthält ein System ein geistiges Eigentum (IP) eines Produzenten (z. B. ein Medien-IP), einen Rechenkern (z. B. eine GPU oder einen KI-spezifischen Kern der GPU), einen Streaming-Puffer, der logisch zwischen dem Produzenten-IP und dem Rechenkern angeordnet ist. Das Produzenten-IP ist betreibbar, Daten aus dem Speicher zu verbrauchen und die Ergebnisse an den Streaming-Puffer auszugeben. Der Rechenkern ist betreibbar, eine KI-Folgerungsverarbeitung basierend auf den Daten aus dem Streaming-Puffer auszuführen und die Ergebnisse der KI-Folgerungsverarbeitung an den Speicher auszugeben.

Search Results

Country/Region

Patent validity

Application date

Publication (announcement) day

applicant

The country/region where the applicant is located

Inventor

IPC

IPC Department

IPC class

IPC subclass

IPC group

IPC team

Appearance classification