NAGIOS: RODERIC FUNCIONANDO

A Novel Systolic Parallel Hardware Architecture for the FPGA Acceleration of Feedforward Neural Networks

Repositori DSpace/Manakin

Valencià Castellano

IMPORTANT: Aquest repositori està en una versió antiga des del 3/12/2023. La nova instal.lació está en https://roderic.uv.es/

A Novel Systolic Parallel Hardware Architecture for the FPGA Acceleration of Feedforward Neural Networks

Mostra el registre complet de l'element

Visualització (12.60Mb)

Medus, Leandro Daniel; Iakymchuk, Taras; Francés Villora, José Vicente; Bataller Mompean, Manuel; Rosado Muñoz, Alfredo

Aquest document és un/a article, creat/da en: 2019

New chips for machine learning applications appear, they are tuned for a specific topology, being efficient by using highly parallel designs at the cost of high power or large complex devices. However, the computational demands of deep neural networks require flexible and efficient hardware architectures able to fit different applications, neural network types, number of inputs, outputs, layers, and units in each layer, making the migration from software to hardware easy. This paper describes novel hardware implementing any feedforward neural network (FFNN): multilayer perceptron, autoencoder, and logistic regression. The architecture admits an arbitrary input and output number, units in layers, and a number of layers. The hardware combines matrix algebra concepts with serial-parallel computation. It is based on a systolic ring of neural processing elements (NPE), only requiring as many NPEs as neuron units in the largest layer, no matter the number of layers. The use of resources grows linearly with the number of NPEs. This versatile architecture serves as an accelerator in real-time applications and its size does not affect the system clock frequency. Unlike most approaches, a single activation function block (AFB) for the whole FFNN is required. Performance, resource usage, and accuracy for several network topologies and activation functions are evaluated. The architecture reaches 550 MHz clock speed in a Virtex7 FPGA. The proposed implementation uses 18-bit fixed point achieving similar classification performance to a floating point approach. A reduced weight bit size does not affect the accuracy, allowing more weights in the same memory. Different FFNN for Iris and MNIST datasets were evaluated and, for a real-time application of abnormal cardiac detection, a x256 acceleration was achieved. The proposed architecture can perform up to 1980 Giga operations per second (GOPS), implementing the multilayer FFNN of up to 3600 neurons per layer in a single chip. The architecture can be extended to bigger capacity devices or multi-chip by the simple NPE ring extension.

Veure al catàleg Trobes