-
公开(公告)号:US09250877B2
公开(公告)日:2016-02-02
申请号:US14033306
申请日:2013-09-20
Applicant: Cray Inc.
Inventor: Heidi Poxon , John Levesque , Luiz DeRose , Brian H. Johnson
CPC classification number: G06F8/4443 , G06F8/30 , G06F8/314 , G06F8/423 , G06F8/443 , G06F8/456 , G06F11/3404 , G06F11/3419 , G06F11/3452
Abstract: A parallelization assistant tool system to assist in parallelization of a computer program is disclosed. The system directs the execution of instrumented code of the computer program to collect performance statistics information relating to execution of loops within the computer program. The system provides a user interface for presenting to a programmer the performance statistics information collected for a loop within the computer program so that the programmer can prioritize efforts to parallelize the computer program. The system generates inlined source code of a loop by aggressively inlining functions substantially without regard to compilation performance, execution performance, or both. The system analyzes the inlined source code to determine the data-sharing attributes of the variables of the loop. The system may generate compiler directives to specify the data-sharing attributes of the variables.
Abstract translation: 公开了一种用于协助计算机程序并行化的并行化辅助工具系统。 系统指导计算机程序的检测代码的执行,以收集与计算机程序中的循环执行相关的性能统计信息。 该系统提供一个用户界面,用于向程序员呈现为计算机程序内的一个循环收集的性能统计信息,以便程序员可以将努力的优先次序并行化计算机程序。 系统通过积极地内联函数,基本上不考虑编译性能,执行性能或两者,生成循环的内联源代码。 系统分析内联源代码以确定循环变量的数据共享属性。 系统可以生成编译器指令来指定变量的数据共享属性。
-
公开(公告)号:US09946654B2
公开(公告)日:2018-04-17
申请号:US15335041
申请日:2016-10-26
Applicant: Cray Inc.
Inventor: Sanyam Mehta , James Robert Kohn , Daniel Jonathan Ernst , Heidi Lynn Poxon , Luiz DeRose
IPC: G06F12/00 , G06F12/0862 , G06F12/1045
CPC classification number: G06F12/0862 , G06F12/1054 , G06F2212/602 , G06F2212/68
Abstract: A method for prefetching data into a cache is provided. The method allocates an outstanding request buffer (“ORB”). The method stores in an address field of the ORB an address and a number of blocks. The method issues prefetch requests for a degree number of blocks starting at the address. When a prefetch response is received for all the prefetch requests, the method adjusts the address of the next block to prefetch and adjusts the number of blocks remaining to be retrieved and then issues prefetch requests for a degree number of blocks starting at the adjusted address. The prefetching pauses when a maximum distance between the reads of the prefetched blocks and the last prefetched block is reached. When a read request for a prefetched block is received, the method resumes prefetching when a resume criterion is satisfied.
-
公开(公告)号:US20180074963A1
公开(公告)日:2018-03-15
申请号:US15335041
申请日:2016-10-26
Applicant: Cray Inc.
Inventor: Sanyam Mehta , James Robert Kohn , Daniel Jonathan Ernst , Heidi Lynn Poxon , Luiz DeRose
IPC: G06F12/0862 , G06F12/1045
CPC classification number: G06F12/0862 , G06F12/1054 , G06F2212/602 , G06F2212/68
Abstract: A method for prefetching data into a cache is provided. The method allocates an outstanding request buffer (“ORB”). The method stores in an address field of the ORB an address and a number of blocks. The method issues prefetch requests for a degree number of blocks starting at the address. When a prefetch response is received for all the prefetch requests, the method adjusts the address of the next block to prefetch and adjusts the number of blocks remaining to be retrieved and then issues prefetch requests for a degree number of blocks starting at the adjusted address. The prefetching pauses when a maximum distance between the reads of the prefetched blocks and the last prefetched block is reached. When a read request for a prefetched block is received, the method resumes prefetching when a resume criterion is satisfied.
-
公开(公告)号:US20190042435A1
公开(公告)日:2019-02-07
申请号:US15913749
申请日:2018-03-06
Applicant: Cray Inc.
Inventor: Sanyam Mehta , James Robert Kohn , Daniel Jonathan Ernst , Heidi Lynn Poxon , Luiz DeRose
IPC: G06F12/0862 , G06F12/1045
Abstract: A method for prefetching data into a cache is provided. The method allocates an outstanding request buffer (“ORB”). The method stores in an address field of the ORB an address and a number of blocks. The method issues prefetch requests for a degree number of blocks starting at the address. When a prefetch response is received for all the prefetch requests, the method adjusts the address of the next block to prefetch and adjusts the number of blocks remaining to be retrieved and then issues prefetch requests for a degree number of blocks starting at the adjusted address. The prefetching pauses when a maximum distance between the reads of the prefetched blocks and the last prefetched block is reached. When a read request for a prefetched block is received, the method resumes prefetching when a resume criterion is satisfied.
-
公开(公告)号:US10185659B2
公开(公告)日:2019-01-22
申请号:US15374114
申请日:2016-12-09
Applicant: Cray, Inc.
Inventor: Heidi Lynn Poxon , William Homer , David W. Oehmke , Luiz DeRose , Clayton D. Andreasen , Sanyam Mehta
Abstract: A system is provided for allocating memory for data of a program for execution by a computer system with a multi-tier memory that includes LBM and HBM. The system accesses a data structure map that maps data structures of the program to the memory addresses within an address space of the program to which the data structures are initially allocated. The system executes the program to collect statistics relating to memory requests and memory bandwidth utilization of the program. The system determines an extent to which each data structure is used by a high memory utilization portion of the program based on the data structure map and the collected statistics. The system generates a memory allocation plan that favors allocating data structures in HBM based on the extent to which the data structures are used by a high memory utilization portion of the program.
-
公开(公告)号:US20180322064A1
公开(公告)日:2018-11-08
申请号:US16034216
申请日:2018-07-12
Applicant: Cray, Inc.
Inventor: Heidi Lynn Poxon , William Homer , David W. Oehmke , Luiz DeRose , Clayton D. Andreasen , Sanyam Mehta
IPC: G06F12/0871 , G06F12/02
Abstract: A system is provided for allocating memory for data of a program for execution by a computer system with a multi-tier memory that includes LBM and HBM. The system accesses a data structure map that maps data structures of the program to the memory addresses within an address space of the program to which the data structures are initially allocated. The system executes the program to collect statistics relating to memory requests and memory bandwidth utilization of the program. The system determines an extent to which each data structure is used by a high memory utilization portion of the program based on the data structure map and the collected statistics. The system generates a memory allocation plan that favors allocating data structures in HBM based on the extent to which the data structures are used by a high memory utilization portion of the program.
-
公开(公告)号:US20160110174A1
公开(公告)日:2016-04-21
申请号:US14978211
申请日:2015-12-22
Applicant: Cray Inc.
Inventor: Heidi Poxon , John Levesque , Luiz DeRose , Brian H. Johnson
CPC classification number: G06F8/4443 , G06F8/30 , G06F8/314 , G06F8/423 , G06F8/443 , G06F8/456 , G06F11/3404 , G06F11/3419 , G06F11/3452
Abstract: A parallelization assistant tool system to assist in parallelization of a computer program is disclosed. The system directs the execution of instrumented code of the computer program to collect performance statistics information relating to execution of loops within the computer program. The system provides a user interface for presenting to a programmer the performance statistics information collected for a loop within the computer program so that the programmer can prioritize efforts to parallelize the computer program. The system generates inlined source code of a loop by aggressively inlining functions substantially without regard to compilation performance, execution performance, or both. The system analyzes the inlined source code to determine the data-sharing attributes of the variables of the loop. The system may generate compiler directives to specify the data-sharing attributes of the variables.
-
公开(公告)号:US10761820B2
公开(公告)日:2020-09-01
申请号:US14978211
申请日:2015-12-22
Applicant: Cray Inc.
Inventor: Heidi Poxon , John Levesque , Luiz DeRose , Brian H. Johnson
Abstract: A parallelization assistant tool system to assist in parallelization of a computer program is disclosed. The system directs the execution of instrumented code of the computer program to collect performance statistics information relating to execution of loops within the computer program. The system provides a user interface for presenting to a programmer the performance statistics information collected for a loop within the computer program so that the programmer can prioritize efforts to parallelize the computer program. The system generates inlined source code of a loop by aggressively inlining functions substantially without regard to compilation performance, execution performance, or both. The system analyzes the inlined source code to determine the data-sharing attributes of the variables of the loop. The system may generate compiler directives to specify the data-sharing attributes of the variables.
-
公开(公告)号:US20190163637A9
公开(公告)日:2019-05-30
申请号:US15913749
申请日:2018-03-06
Applicant: Cray Inc.
Inventor: Sanyam Mehta , James Robert Kohn , Daniel Jonathan Ernst , Heidi Lynn Poxon , Luiz DeRose
IPC: G06F12/0862 , G06F12/1045
Abstract: A method for prefetching data into a cache is provided. The method allocates an outstanding request buffer (“ORB”). The method stores in an address field of the ORB an address and a number of blocks. The method issues prefetch requests for a degree number of blocks starting at the address. When a prefetch response is received for all the prefetch requests, the method adjusts the address of the next block to prefetch and adjusts the number of blocks remaining to be retrieved and then issues prefetch requests for a degree number of blocks starting at the adjusted address. The prefetching pauses when a maximum distance between the reads of the prefetched blocks and the last prefetched block is reached. When a read request for a prefetched block is received, the method resumes prefetching when a resume criterion is satisfied.
-
公开(公告)号:US10303610B2
公开(公告)日:2019-05-28
申请号:US15913749
申请日:2018-03-06
Applicant: Cray Inc.
Inventor: Sanyam Mehta , James Robert Kohn , Daniel Jonathan Ernst , Heidi Lynn Poxon , Luiz DeRose
IPC: G06F12/00 , G06F12/0862 , G06F12/1045 , G06F12/0886
Abstract: A method for prefetching data into a cache is provided. The method allocates an outstanding request buffer (“ORB”). The method stores in an address field of the ORB an address and a number of blocks. The method issues prefetch requests for a degree number of blocks starting at the address. When a prefetch response is received for all the prefetch requests, the method adjusts the address of the next block to prefetch and adjusts the number of blocks remaining to be retrieved and then issues prefetch requests for a degree number of blocks starting at the adjusted address. The prefetching pauses when a maximum distance between the reads of the prefetched blocks and the last prefetched block is reached. When a read request for a prefetched block is received, the method resumes prefetching when a resume criterion is satisfied.
-
-
-
-
-
-
-
-
-