-
公开(公告)号:US09740529B1
公开(公告)日:2017-08-22
申请号:US14560509
申请日:2014-12-04
Applicant: The Mathworks, Inc.
Inventor: Chun-Yu Shei , Girish Venkataramani
CPC classification number: G06F9/5027 , G06F8/4441 , H03K19/17776
Abstract: A system and method for optimizing a system design that includes two or more components, where at least one component is to be implemented using a constrained resource. From an initial schedule, the resource having a longest span time between a start busy time slot and an end busy time slot is identified. The schedule for the other resources is then also extended to the span time. The resulting design can be made synchronous by inserting up-sampler and down-sampler function blocks before and after any strongly connected components.
-
公开(公告)号:US10949182B2
公开(公告)日:2021-03-16
申请号:US15816377
申请日:2017-11-17
Applicant: The MathWorks, Inc.
Inventor: Girish Venkataramani , Rama P. Kokku , Jayaprabha Shankar , James L. Brock , Chun-Yu Shei , Vijaya Raghavan
IPC: G06F8/41
Abstract: Systems and methods generate code from a source program where the generated code may be compiled and executed on a Graphics Processing Unit (GPU). A parallel loop analysis check may be performed on regions of the source program identified for parallelization. One or more optimizations also may be applied to the source program that convert mathematical operations into a parallel form. The source program may be partitioned into segments for execution on a host and a device. Kernels may be created for the segments to be executed on the device. The size of the kernels may be determined, and memory transfers between the host and device may be optimized.
-
公开(公告)号:US20180136912A1
公开(公告)日:2018-05-17
申请号:US15816606
申请日:2017-11-17
Applicant: The MathWorks, Inc.
Inventor: Girish Venkataramani , Rama P. Kokku , Jayaprabha Shankar , James L. Brock , Chun-Yu Shei , Vijaya Raghavan , Yaohung Tsai
CPC classification number: G06F8/35 , G06F8/20 , G06F8/30 , G06F9/44563 , G06N3/04 , G06N3/0454 , G06N3/0481 , G06N3/08 , G06N3/10 , G06N3/105
Abstract: Systems and methods may automatically generate code for deep learning networks. The systems methods may provide a code generation framework for generating target specific code. The code generation framework may include one or more predefined class hierarchies for constructing objects of the generated code. The objects of the class hierarchies may provide an interface to predefined libraries of deep learning functions optimized for use on a target platform. The systems and methods may perform one or more optimizations on the code being generated.
-
公开(公告)号:US10157045B2
公开(公告)日:2018-12-18
申请号:US15816606
申请日:2017-11-17
Applicant: The MathWorks, Inc.
Inventor: Girish Venkataramani , Rama P. Kokku , Jayaprabha Shankar , James L. Brock , Chun-Yu Shei , Vijaya Raghavan , Yaohung Tsai
Abstract: Systems and methods may automatically generate code for deep learning networks. The systems methods may provide a code generation framework for generating target specific code. The code generation framework may include one or more predefined class hierarchies for constructing objects of the generated code. The objects of the class hierarchies may provide an interface to predefined libraries of deep learning functions optimized for use on a target platform. The systems and methods may perform one or more optimizations on the code being generated.
-
公开(公告)号:US20180157471A1
公开(公告)日:2018-06-07
申请号:US15816377
申请日:2017-11-17
Applicant: The MathWorks, Inc.
Inventor: Girish Venkataramani , Rama P. Kokku , Jayaprabha Shankar , James L. Brock , Chun-Yu Shei , Vijaya Raghavan
IPC: G06F8/41
CPC classification number: G06F8/452 , G06F8/4434 , G06F8/4441 , G06F8/445 , G06F8/456 , G06F8/458
Abstract: Systems and methods generate code from a source program where the generated code may be compiled and executed on a Graphics Processing Unit (GPU). A parallel loop analysis check may be performed on regions of the source program identified for parallelization. One or more optimizations also may be applied to the source program that convert mathematical operations into a parallel form. The source program may be partitioned into segments for execution on a host and a device. Kernels may be created for the segments to be executed on the device. The size of the kernels may be determined, and memory transfers between the host and device may be optimized.
-
-
-
-