Abstract:
Systems and methods may configure a programmable logic device to efficiently run a deep learning (DL) network. Architecture code and algorithmic code may be generated. The architecture code may define convolutional and fully connected processor cores structured to run the layers of a Deep Neural Network (DNN). The processor cores may be interconnected by a First In First Out (FIFO) memory. The architecture code may also define stride-efficient memories for implementing convolution. The algorithmic code may include configuration instructions for running the DNN's layers at the processor cores. The algorithmic code may also include a schedule for executing the configuration instructions on the processor cores, for moving network parameters to the processor cores, and for transferring outputs between the layers.
Abstract:
Systems and methods may configure a programmable logic device to efficiently run a deep learning (DL) network. Architecture code and algorithmic code may be generated. The architecture code may define convolutional and fully connected processor cores structured to run the layers of a Deep Neural Network (DNN). The processor cores may be interconnected by a First In First Out (FIFO) memory. The architecture code may also define stride-efficient memories for implementing convolution. The algorithmic code may include configuration instructions for running the DNN's layers at the processor cores. The algorithmic code may also include a schedule for executing the configuration instructions on the processor cores, for moving network parameters to the processor cores, and for transferring outputs between the layers.
Abstract:
A method may include receiving functional model information regarding a set of functional blocks associated with a functional model. The functional model may include a streaming algorithm for exchanging streaming data. The method may include receiving architectural model information regarding physical devices included in a target device from a hardware-software co-design platform. The physical devices may include a software based processing device and a hardware based processing device. The method may include mapping the functional blocks to the physical devices to allow the streaming data to be communicated between the software based processing device and the hardware based processing device. The method may include generating a streaming interface to model communication of the streaming data between the software based processing device and the hardware based processing device. The method may include generating computer code for implementing the functional model on the target device and outputting the computer code.
Abstract:
Systems and methods optimize hardware description generated from a graphical model automatically. The system may include an optimizer. The optimizer may add a serializer component and a deserializer component to the model. The serializer component may receive parallel data and may produce serial data. The serializer may introduce one or more idle cycles into the serial data being produced. The deserializer component may receive serial data and may produce parallel data. The serializer and deserializer components may receive and generate control signals. The control signals may include a valid signal for indicating valid data elements of the serial and parallel data, and a start the start signal for indicating the beginning of a new frame or cycle when constructing parallel data from serial data.
Abstract:
A device is configured to receive optimization information associated with a model, determine an amount of delay to be inserted into the model, and determine a sampling factor by which a first data rate associated with a signal is to be modified into a second data rate. The device is configured to determine a region of interest, insert an upsampling block that upsamples the signal entering the region of interest based on the sampling factor, and insert a downsampling block, associated with a unit of delay, which downsamples the signal exiting the region of interest based on the sampling factor. The device is configured to convert the unit of delay into a fast delay block, corresponding to the amount of delay, and insert the fast delay block in the region of interest. The device is configured to generate code associated with the model, and provide the code.
Abstract:
A device generates a model associated with a multi-rate system. The multi-rate system includes a system associated with a clock rate and a sample rate, and the clock rate is greater than the sample rate. The device identifies the clock rate of the multi-rate system based on the model, and identifies a portion, of the model, associated with the sample rate. The device applies clock rate pipelining to adjust the sample rate associated with the portion of the model so that the sample rate substantially equals the clock rate, and generates code associated with the model and the applied clock rate pipelining.