METHODS AND APPARATUS TO IMPROVE RUNTIME PERFORMANCE OF SOFTWARE EXECUTING ON A HETEROGENEOUS SYSTEM

    公开(公告)号:US20190317880A1

    公开(公告)日:2019-10-17

    申请号:US16455486

    申请日:2019-06-27

    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed improve runtime performance of software executing on a heterogeneous system. An example apparatus includes a feedback interface to collect a performance characteristic of the heterogeneous system associated with a compiled version of a block of code at a first runtime, the compiled version executed according to a function designating successful execution of the compiled version on the heterogeneous system, the heterogeneous system including a first processing element and a second processing element different than the first processing element; a performance analyzer to determine a performance delta based on the performance characteristic and the function; and a machine learning modeler to, prior to a second runtime, adjust a cost model of the first processing element based on the performance delta, the adjusted cost model to cause a reduction in the performance delta to improve runtime performance of the heterogeneous system.

    Methods and apparatus for runtime multi-scheduling of software executing on a heterogeneous system

    公开(公告)号:US10908884B2

    公开(公告)日:2021-02-02

    申请号:US16455379

    申请日:2019-06-27

    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed for runtime scheduling of software executing on a heterogeneous system. An example apparatus includes in response to a variant compiler to generate a representation of an algorithm in a domain-specific language (DSL), a compilation auto-scheduler to generate a schedule based on configurations for processing elements of the heterogeneous system, the processing elements including at least a first and a second processing element, the variant compiler to compile variant binaries based on the schedule, each of the variant binaries associated with the algorithm in the DSL, the variant binaries including a first variant binary corresponding to the first processing element and a second variant binary corresponding to the second processing element, and an application compiler to generate a fat binary including a runtime scheduler to select one or more of the variant binaries to execute a workload based on the schedule.

    METHODS AND APPARATUS TO OPTIMIZE EXECUTION OF A MACHINE LEARNING MODEL

    公开(公告)号:US20190325314A1

    公开(公告)日:2019-10-24

    申请号:US16456863

    申请日:2019-06-28

    Abstract: Methods, apparatus, systems and articles of manufacture to optimize execution of a machine learning model are disclosed. An example apparatus includes a quantizer to quantize a layer of a model based on an execution constraint, the layer of the model represented by a matrix. A packer is to pack the quantized layer of the matrix to create a packed layer represented by a packed matrix, the packed matrix having non-zero values of the matrix grouped together along at least one of a row or a column of the matrix. A blocker is to block the packed layer into a blocked layer by dividing the non-zero values in the packed matrix into blocks. A fuser is to fuse the blocked layer into a pipeline. A packager is to package the pipeline into a binary.

Patent Agency Ranking