-
1.
公开(公告)号:US20240005135A1
公开(公告)日:2024-01-04
申请号:US18135958
申请日:2023-04-18
Applicant: Intel Corporation
Inventor: Avishaii Abuhatzera , Om Ji Omer , Ritwika Chowdhury , Lance Hacking
Abstract: An apparatus to facilitate accelerating neural networks with low precision-based multiplication and exploiting sparsity in higher order bits is disclosed. The apparatus includes a processor comprising a re-encoder to re-encode a first input number of signed input numbers represented in a first precision format as part of a machine learning model, the first input number re-encoded into two signed input numbers of a second precision format, wherein the first precision format is a higher precision format than the second precision format. The processor further includes a multiply-add circuit to perform operations in the first precision format using the two signed input numbers of the second precision format; and a sparsity hardware circuit to reduce computing on zero values at the multiply-add circuit, wherein the processor to execute the machine learning model using the re-encoder, the multiply-add circuit, and the sparsity hardware circuit.
-
2.
公开(公告)号:US09189302B2
公开(公告)日:2015-11-17
申请号:US14167688
申请日:2014-01-29
Applicant: Intel Corporation
Inventor: Lance Hacking
CPC classification number: G06F9/542 , G06F11/30 , G06F11/3466 , G06F2201/86 , G06F2201/88 , G06F2201/885
Abstract: A technique to monitor events within a computer system or integrated circuit. In one embodiment, a software-accessible event monitoring storage and hardware-specific monitoring logic are selectable and their corresponding outputs may be monitored by accessing a counter to count events corresponding to each of software-accessible storage and hardware-specific monitoring logic.
Abstract translation: 监控计算机系统或集成电路中的事件的技术。 在一个实施例中,可以选择软件可访问事件监视存储和硬件特定的监视逻辑,并且可以通过访问计数器来监视其对应的输出,以计数与软件可访问的存储和硬件特定的监视逻辑中的每一个对应的事件。
-
3.
公开(公告)号:US20200226203A1
公开(公告)日:2020-07-16
申请号:US16833210
申请日:2020-03-27
Applicant: Intel Corporation
Inventor: Biji George , Om Ji Omer , Dipan Kumar Mandal , Cormac Brick , Lance Hacking , Sreenivas Subramoney , Belliappa Kuttanna
IPC: G06F17/16
Abstract: A disclosed apparatus to multiply matrices includes a compute engine. The compute engine includes multipliers in a two dimensional array that has a plurality of array locations defined by columns and rows. The apparatus also includes a plurality of adders in columns. A broadcast interconnect between a cache and the multipliers broadcasts a first set of operand data elements to multipliers in the rows of the array. A unicast interconnect unicasts a second set of operands between a data buffer and the multipliers. The multipliers multiply the operands to generate a plurality of outputs, and the adders add the outputs generated by the multipliers.
-
公开(公告)号:US09372768B2
公开(公告)日:2016-06-21
申请号:US14141099
申请日:2013-12-26
Applicant: Intel Corporation
Inventor: Jeremy Conner , Sabar Souag , Karunakara Kotary , Victor Ruybalid , Noel Eck , Ramana Rachakonda , Sankaran Menon , Lance Hacking
CPC classification number: G06F11/2268 , G06F11/26 , G06F11/3024 , G06F11/3082 , G06F11/3086 , G06F11/3476 , G06F11/3656 , G06F2201/835 , G06F2201/865
Abstract: Techniques of debugging a computing system are described herein. The techniques may include generating debug data at agents in the computing system. The techniques may include recording the debug data at a storage element, wherein the storage element is disposed in a non-core portion of the circuit interconnect accessible to the agents.
Abstract translation: 这里描述了调试计算系统的技术。 这些技术可以包括在计算系统中的代理处生成调试数据。 这些技术可以包括将调试数据记录在存储元件处,其中存储元件设置在代理可访问的电路互连的非核心部分中。
-
公开(公告)号:US20240220785A1
公开(公告)日:2024-07-04
申请号:US18408716
申请日:2024-01-10
Applicant: Intel Corporation
Inventor: Gautham Chinya , Huichu Liu , Arnab Raha , Debabrata Mohapatra , Cormac Brick , Lance Hacking
CPC classification number: G06N3/063 , G06F9/3814 , G06F9/3877 , G06F9/4498 , G06F9/5027 , G06N5/04
Abstract: Methods and systems include a neural network system that includes a neural network accelerator comprising. The neural network accelerator includes multiple processing engines coupled together to perform arithmetic operations in support of an inference performed using the deep neural network system. The neural network accelerator also includes a schedule-aware tensor data distribution circuitry or software that is configured to load tensor data into the multiple processing engines in a load phase, extract output data from the multiple processing engines in an extraction phase, reorganize the extracted output data, and store the reorganized extracted output data to memory.
-
公开(公告)号:US11907827B2
公开(公告)日:2024-02-20
申请号:US16456707
申请日:2019-06-28
Applicant: Intel Corporation
Inventor: Gautham Chinya , Huichu Liu , Arnab Raha , Debabrata Mohapatra , Cormac Brick , Lance Hacking
CPC classification number: G06N3/063 , G06F9/3814 , G06F9/3877 , G06F9/4498 , G06F9/5027 , G06N5/04
Abstract: Methods and systems include a neural network system that includes a neural network accelerator. The neural network accelerator includes multiple processing engines coupled together to perform arithmetic operations in support of an inference performed using the deep neural network system. The neural network accelerator also includes a schedule-aware tensor data distribution circuitry or software that is configured to load tensor data into the multiple processing engines in a load phase, extract output data from the multiple processing engines in an extraction phase, reorganize the extracted output data, and store the reorganized extracted output data to memory.
-
7.
公开(公告)号:US20200320375A1
公开(公告)日:2020-10-08
申请号:US16909295
申请日:2020-06-23
Applicant: Intel Corporation
Inventor: Avishaii Abuhatzera , Om Ji Omer , Ritwika Chowdhury , Lance Hacking
Abstract: An apparatus to facilitate accelerating neural networks with low precision-based multiplication and exploiting sparsity in higher order bits is disclosed. The apparatus includes a processor comprising a re-encoder to re-encode a first input number of signed input numbers represented in a first precision format as part of a machine learning model, the first input number re-encoded into two signed input numbers of a second precision format, wherein the first precision format is a higher precision format than the second precision format. The processor further includes a multiply-add circuit to perform operations in the first precision format using the two signed input numbers of the second precision format; and a sparsity hardware circuit to reduce computing on zero values at the multiply-add circuit, wherein the processor to execute the machine learning model using the re-encoder, the multiply-add circuit, and the sparsity hardware circuit.
-
公开(公告)号:US12288153B2
公开(公告)日:2025-04-29
申请号:US18408716
申请日:2024-01-10
Applicant: Intel Corporation
Inventor: Gautham Chinya , Huichu Liu , Arnab Raha , Debabrata Mohapatra , Cormac Brick , Lance Hacking
Abstract: Methods and systems include a neural network system that includes a neural network accelerator. The neural network accelerator includes multiple processing engines coupled together to perform arithmetic operations in support of an inference performed using the deep neural network system. The neural network accelerator also includes a schedule-aware tensor data distribution circuitry or software that is configured to load tensor data into the multiple processing engines in a load phase, extract output data from the multiple processing engines in an extraction phase, reorganize the extracted output data, and store the reorganized extracted output data to memory.
-
公开(公告)号:US11714998B2
公开(公告)日:2023-08-01
申请号:US16909295
申请日:2020-06-23
Applicant: Intel Corporation
Inventor: Avishaii Abuhatzera , Om Ji Omer , Ritwika Chowdhury , Lance Hacking
CPC classification number: G06N3/063 , G06N3/0454 , G06N3/088
Abstract: An apparatus to facilitate accelerating neural networks with low precision-based multiplication and exploiting sparsity in higher order bits is disclosed. The apparatus includes a processor comprising a re-encoder to re-encode a first input number of signed input numbers represented in a first precision format as part of a machine learning model, the first input number re-encoded into two signed input numbers of a second precision format, wherein the first precision format is a higher precision format than the second precision format. The processor further includes a multiply-add circuit to perform operations in the first precision format using the two signed input numbers of the second precision format; and a sparsity hardware circuit to reduce computing on zero values at the multiply-add circuit, wherein the processor to execute the machine learning model using the re-encoder, the multiply-add circuit, and the sparsity hardware circuit.
-
公开(公告)号:US20230004430A1
公开(公告)日:2023-01-05
申请号:US17856968
申请日:2022-07-02
Applicant: Intel Corporation
Inventor: Richard Richmond , Eric Luk , Lingdan Zeng , Lance Hacking , Alessandro Palla , Mohamed Elmalaki , Sara Almalih
Abstract: Technology for estimating neural network (NN) power profiles includes obtaining a plurality of workloads for a compiled NN model, the plurality of workloads determined for a hardware execution device, determining a hardware efficiency factor for the compiled NN model, and generating, based on the hardware efficiency factor, a power profile for the compiled NN model on one or more of a per-layer basis or a per-workload basis. The hardware efficiency factor can be determined on based on a hardware efficiency measurement and a hardware utilization measurement, and can be determined on a per-workload basis. A configuration file can be provided for generating the power profile, and an output visualization of the power profile can be generated. Further, feedback information can be generated to perform one or more of selecting a hardware device, optimizing a breakdown of workloads, optimizing a scheduling of tasks, or confirming a hardware device design.
-
-
-
-
-
-
-
-
-