Accelerator Architecture For Secure And Energy Efficient Machine Learning PDF Download

Are you looking for read ebook online? Search for your book and save it on your Kindle device, PC, phones or tablets. Download Accelerator Architecture For Secure And Energy Efficient Machine Learning PDF full book. Access full book title Accelerator Architecture For Secure And Energy Efficient Machine Learning.

Accelerator Architecture for Secure and Energy Efficient Machine Learning

Accelerator Architecture for Secure and Energy Efficient Machine Learning
Author: Mohammad Hossein Samavatian
Publisher:
Total Pages: 0
Release: 2022
Genre: Computer architecture
ISBN:

Download Accelerator Architecture for Secure and Energy Efficient Machine Learning Book in PDF, ePub and Kindle

ML applications are driving the next computing revolution. In this context both performance and security are crucial. We propose hardware/software co-design solutions for addressing both. First, we propose RNNFast, an accelerator for Recurrent Neural Networks (RNNs). RNNs are particularly well suited for machine learning problems in which context is important, such as language translation. RNNFast leverages an emerging class of non-volatile memory called domain-wall memory (DWM). We show that DWM is very well suited for RNN acceleration due to its very high density and low read/write energy. RNNFast is very efficient and highly scalable, with a flexible mapping of logical neurons to RNN hardware blocks. The accelerator is designed to minimize data movement by closely interleaving DWM storage and computation. We compare our design with a state-of-the-art GPGPU and find 21.8X higher performance with 70X lower energy. Second, we brought ML security into ML accelerator design for more efficiency and robustness. Deep Neural Networks (DNNs) are employed in an increasing number of applications, some of which are safety-critical. Unfortunately, DNNs are known to be vulnerable to so-called adversarial attacks. In general, the proposed defenses have high overhead, some require attack-specific re-training of the model or careful tuning to adapt to different attacks. We show that these approaches, while successful for a range of inputs, are insufficient to address stronger, high-confidence adversarial attacks. To address this, we propose HASI and DNNShield, two hardware-accelerated defenses that adapt the strength of the response to the confidence of the adversarial input. Both techniques rely on approximation or random noise deliberately introduced into the model. HASI uses direct noise injection into the model at inference. DNNShield uses approximation that relies on dynamic and random sparsification of the DNN model to achieve inference approximation efficiently and with fine-grain control over the approximation error. Both techniques use the output distribution characteristics of noisy/sparsified inference compared to a baseline output to detect adversarial inputs. We show an adversarial detection rate of 86% when applied to VGG16 and 88% when applied to ResNet50, which exceeds the detection rate of the state-of-the-art approaches, with a much lower overhead. We demonstrate a software/hardware-accelerated FPGA prototype, which reduces the performance impact of HASI and DNNShield relative to software-only CPU and GPU implementations.


Compact and Fast Machine Learning Accelerator for IoT Devices

Compact and Fast Machine Learning Accelerator for IoT Devices
Author: Hantao Huang
Publisher: Springer
Total Pages: 149
Release: 2018-12-07
Genre: Technology & Engineering
ISBN: 9811333238

Download Compact and Fast Machine Learning Accelerator for IoT Devices Book in PDF, ePub and Kindle

This book presents the latest techniques for machine learning based data analytics on IoT edge devices. A comprehensive literature review on neural network compression and machine learning accelerator is presented from both algorithm level optimization and hardware architecture optimization. Coverage focuses on shallow and deep neural network with real applications on smart buildings. The authors also discuss hardware architecture design with coverage focusing on both CMOS based computing systems and the new emerging Resistive Random-Access Memory (RRAM) based systems. Detailed case studies such as indoor positioning, energy management and intrusion detection are also presented for smart buildings.


Energy-efficient ASIC Accelerators for Machine/deep Learning Algorithms

Energy-efficient ASIC Accelerators for Machine/deep Learning Algorithms
Author: Minkyu Kim
Publisher:
Total Pages: 120
Release: 2019
Genre: Algorithms
ISBN:

Download Energy-efficient ASIC Accelerators for Machine/deep Learning Algorithms Book in PDF, ePub and Kindle

In this work, to reduce computation without accuracy degradation, an energy-efficient deep convolutional neural network (DCNN) accelerator is proposed based on a novel conditional computing scheme and integrates convolution with subsequent max-pooling operations. This way, the total number of bit-wise convolutions could be reduced by ~2x, without affecting the output feature values. This work also has been developing an optimized dataflow that exploits sparsity, maximizes data re-use and minimizes off-chip memory access, which can improve upon existing hardware works. The total off-chip memory access can be saved by 2.12x. Preliminary results of the proposed DCNN accelerator achieved a peak 7.35 TOPS/W for VGG-16 by post-layout simulation results in 40nm. A number of recent efforts have attempted to design custom inference engine based on various approaches, including the systolic architecture, near memory processing, and in-meomry computing concept. This work evaluates a comprehensive comparison of these various approaches in a unified framework. This work also presents the proposed energy-efficient in-memory computing accelerator for deep neural networks (DNNs) by integrating many instances of in-memory computing macros with an ensemble of peripheral digital circuits, which supports configurable multibit activations and large-scale DNNs seamlessly while substantially improving the chip-level energy efficiency. Proposed accelerator is fully designed in 65nm, demonstrating ultralow


Hardware Accelerators for Machine Learning: From 3D Manycore to Processing-in-Memory Architectures

Hardware Accelerators for Machine Learning: From 3D Manycore to Processing-in-Memory Architectures
Author: Aqeeb Iqbal Arka
Publisher:
Total Pages: 0
Release: 2022
Genre: Machine learning
ISBN:

Download Hardware Accelerators for Machine Learning: From 3D Manycore to Processing-in-Memory Architectures Book in PDF, ePub and Kindle

Big data applications such as - deep learning and graph analytics require hardware platforms that are energy-efficient yet computationally powerful. 3D manycore architectures are the key to efficiently executing such compute- and data-intensive applications. Through silicon via (TSV)-based 3D manycore system is a promising solution in this direction as it enables integration of disparate heterogeneous computing cores on a single system. Recent industry trends show the viability of 3D integration in real products (e.g., Intel Lakefield SoC Architecture, the AMD Radeon R9 Fury X graphics card, and Xilinx Virtex-7 2000T/H580T, etc.). However, the achievable performance of conventional through-silicon-via (TSV)-based 3D systems is ultimately bottlenecked by the horizontal wires (wires in each planar die). Moreover, current TSV 3D architectures suffer from thermal limitations. Hence, TSV-based architectures do not realize the full potential of 3D integration. Monolithic 3D (M3D) integration, a breakthrough technology to achieve "More Moore and More Than Moore," and opens up the possibility of designing cores and associated network routers using multiple layers by utilizing monolithic inter-tier vias (MIVs) and hence, reducing the effective wire length. Compared to TSV-based 3D ICs, M3D offers the "true" benefits of vertical dimension for system integration: the size of a MIV used in M3D is over 100x smaller than a TSV. However, designing these new architectures often involves optimizingmultiple conflicting objectives (e.g., performance, thermal, etc.) due to thepresence of a mix of computing elements and communication methodologies; each with a different requirement for high performance. To overcome the difficult optimization challenges due to the large design space and complex interactions among the heterogeneous components (CPU, GPU, Last Level Cache, etc.) in an M3D-based manycore chip, Machine Learning algorithms can be explored as a promising solution to this problem and. The first part of this dissertation focuses on the design of high-performance and energy-efficient architectures for big-data applications, enabled by M3D vertical integration and data-driven machine learning algorithms. As an example, we consider heterogeneous manycore architectures with CPUs, GPUs, and Cache as the choice of hardware platform in this part of the work. The disparate nature of these processing elements introduces conflicting design requirements that need to be satisfied simultaneously. Moreover, the on-chip traffic pattern exhibited by different big-data applications (like many-to-few-to-many in CPU/GPU-based manycore architectures) need to be incorporated in the design process for optimal power-performance trade-off. In this dissertation, we first design a M3D-enabled heterogeneous manycore architecture and we demonstrate the efficacy of machine learning algorithms for efficiently exploring a large design space. For large design space exploration problems, the proposed machine learning algorithm can find good solutions in significantly less amount of time than exiting state-of-the-art counterparts. However, the M3D-enabled heterogeneous manycore architecture is still limited by the inherent memory bandwidth bottlenecks of traditional von-Neumann architectures. As a result, later in this dissertation, we focus on Processing-in-Memory (PIM) architectures tailor-made to accelerate deep learning applications such as Graph Neural Networks (GNNs) as such architectures can achieve massive data parallelism and do not suffer from memory bandwidth-related issues. We choose GNNs as an example workload as GNNs are more complex compared to traditional deep learning applications as they simultaneously exhibit attributes of both deep learning and graph computations. Hence, it is both compute- and data-intensive in nature. The high amount of data movement required by GNN computation poses a challenge to conventional von-Neuman architectures (such as CPUs, GPUs, and heterogeneous system-on-chips (SoCs)) as they have limited memory bandwidth. Hence, we propose the use of PIM-based non-volatile memory such as Resistive Random Access Memory (ReRAM). We leverage the efficient matrix operations enabled by ReRAMs and design manycore architectures that can facilitate the unique computation and communication needs of large-scale GNN training. We then exploit various techniques such as regularization methods to further accelerate GNN training ReRAM-based manycore systems. Finally, we streamline the GNN training process by reducing the amount of redundant information in both the GNN model and the input graph.Overall, this work focuses on the design challenges of high-performance and energy-efficient manycore architectures for machine learning applications. We propose novel architectures that use M3D or ReRAM-based PIM architectures to accelerate such applications. Moreover, we focus on hardware/software co-design to ensure the best possible performance.


Hardware Accelerators in Data Centers

Hardware Accelerators in Data Centers
Author: Christoforos Kachris
Publisher: Springer
Total Pages: 280
Release: 2018-08-21
Genre: Technology & Engineering
ISBN: 3319927922

Download Hardware Accelerators in Data Centers Book in PDF, ePub and Kindle

This book provides readers with an overview of the architectures, programming frameworks, and hardware accelerators for typical cloud computing applications in data centers. The authors present the most recent and promising solutions, using hardware accelerators to provide high throughput, reduced latency and higher energy efficiency compared to current servers based on commodity processors. Readers will benefit from state-of-the-art information regarding application requirements in contemporary data centers, computational complexity of typical tasks in cloud computing, and a programming framework for the efficient utilization of the hardware accelerators.


Research Infrastructures for Hardware Accelerators

Research Infrastructures for Hardware Accelerators
Author: Yakun Sophia Shao
Publisher: Springer Nature
Total Pages: 85
Release: 2022-05-31
Genre: Technology & Engineering
ISBN: 3031017501

Download Research Infrastructures for Hardware Accelerators Book in PDF, ePub and Kindle

Hardware acceleration in the form of customized datapath and control circuitry tuned to specific applications has gained popularity for its promise to utilize transistors more efficiently. Historically, the computer architecture community has focused on general-purpose processors, and extensive research infrastructure has been developed to support research efforts in this domain. Envisioning future computing systems with a diverse set of general-purpose cores and accelerators, computer architects must add accelerator-related research infrastructures to their toolboxes to explore future heterogeneous systems. This book serves as a primer for the field, as an overview of the vast literature on accelerator architectures and their design flows, and as a resource guidebook for researchers working in related areas.


Efficient Machine Learning Acceleration at the Edge

Efficient Machine Learning Acceleration at the Edge
Author: Wojciech Romaszkan
Publisher:
Total Pages: 0
Release: 2023
Genre:
ISBN:

Download Efficient Machine Learning Acceleration at the Edge Book in PDF, ePub and Kindle

My thesis is a result of a confluence of several trends that have emerged in recent years. First, the rapid proliferation of deep learning across the application and hardware landscapes is creating an immense demand for computing power. Second, the waning of Moore's Law is paving the way for domain-specific acceleration as a means of delivering performance improvements. Third, deep learning's inherent error tolerance is reviving long-forgotten approximate computing paradigms. Fourth, latency, energy, and privacy considerations are increasingly pushing deep learning towards edge inference, with its stringent deployment constraints. All of the above have created a unique, once-in-a-generation opportunity for accelerated widespread adoption of new classes of hardware and algorithms, provided they can deliver fast, efficient, and accurate deep learning inference within a tight area and energy envelope. One approach towards efficient machine learning acceleration that I have explored attempts to push a neural network model size to its absolute minimum. 3PXNet - Pruned, Permuted, Packed XNOR Networks combines two widely used model compression techniques: binarization and sparsity to deliver usable models with a size down to single kilobytes. It uses an innovative combination of weight permutation and packing to create structured sparsity that can be implemented efficiently in both software and hardware. 3PXNet has been deployed as an open-source library targeting microcontroller-class devices with various software optimizations, further improving runtime and storage requirements. The second line of work I have pursued is the application of stochastic computing (SC). It is an approximate, stream-based computing paradigm enabling extremely area-efficient implementations of basic arithmetic operations such as multiplication and addition. SC has been enjoying a renaissance over the past few years due to its unique synergy with deep learning. On the one hand, SC makes it possible to implement extremely dense multiply-accumulate (MAC) computational fabric well suited towards computing large linear algebra kernels, which are the bread-and-butter of deep neural networks. On the other hand, those neural networks exhibit immense approximation tolerance levels, making SC a viable implementation candidate. However, several issues need to be solved to make the SC acceleration of neural networks feasible. The area efficiency comes at the cost of long stream processing latency. The conversion cost between fixed-point and stochastic representations can cancel out the gains from computation efficiency if not managed correctly. The above issues lead to a question on how to design an accelerator architecture that best takes advantage of SC's benefits and minimizes its shortcomings. To address this, I proposed the ACOUSTIC (Accelerating Convolutional Neural Networks through Or-Unipolar Skipped Stochastic Computing) architecture and its extension - GEO (Generation and Execution Optimized Stochastic Computing Accelerator for Neural Networks). ACOUSTIC is an architecture that tries to maximize SC's compute density to amortize conversion costs and memory accesses, delivering system-level reduction in inference energy and latency. It has taped out and demonstrated in silicon, using a 14nm fabrication process. GEO addresses some of the shortcomings of ACOUSTIC. Through the introduction of near-memory computation fabric, GEO enables a more flexible selection of dataflows. Novel progressive buffering scheme unique to SC lowers the reliance on high memory bandwidth. Overall, my work tries to approach accelerator design from the systems perspective, making it stand apart from most recent SC publications targeting point improvements in the computation itself. As an extension to the above line of work, I have explored the combination of SC and sparsity, to apply it to new classes of applications, and enable further benefits. I have proposed the first SC accelerator that supports weight sparsity - SASCHA (Sparsity-Aware Stochastic Computing Hardware Architecture for Neural Network Acceleration), which can improve performance on pruned neural networks, while maintaining the throughput when processing dense ones. SASCHA solves a series of unique, non-trivial challenges of combining SC with sparsity. On the other hand, I have also designed an architecture for accelerating event-based camera object tracking - SCIMITAR. Event-based cameras are relatively new imaging devices which only transmit information about pixels that have changed in brightness, resulting in very high input sparsity. SCIMITAR combines SC with computing-in-memory (CIM), and, through a series of architectural optimizations, is able to take advantage of this new data format to deliver low-latency object detection for tracking applications.


Efficient Processing of Deep Neural Networks

Efficient Processing of Deep Neural Networks
Author: Vivienne Sze
Publisher: Springer Nature
Total Pages: 254
Release: 2022-05-31
Genre: Technology & Engineering
ISBN: 3031017668

Download Efficient Processing of Deep Neural Networks Book in PDF, ePub and Kindle

This book provides a structured treatment of the key principles and techniques for enabling efficient processing of deep neural networks (DNNs). DNNs are currently widely used for many artificial intelligence (AI) applications, including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Therefore, techniques that enable efficient processing of deep neural networks to improve key metrics—such as energy-efficiency, throughput, and latency—without sacrificing accuracy or increasing hardware costs are critical to enabling the wide deployment of DNNs in AI systems. The book includes background on DNN processing; a description and taxonomy of hardware architectural approaches for designing DNN accelerators; key metrics for evaluating and comparing different designs; features of DNN processing that are amenable to hardware/algorithm co-design to improve energy efficiency and throughput; and opportunities for applying new technologies. Readers will find a structured introduction to the field as well as formalization and organization of key concepts from contemporary work that provide insights that may spark new ideas.


Energy Efficiency and Robustness of Advanced Machine Learning Architectures

Energy Efficiency and Robustness of Advanced Machine Learning Architectures
Author: Alberto Marchisio
Publisher:
Total Pages: 0
Release: 2024-11-26
Genre: Computers
ISBN: 9781032855509

Download Energy Efficiency and Robustness of Advanced Machine Learning Architectures Book in PDF, ePub and Kindle

This book tackles these challenges by exploiting the unique features of advanced ML models and investigates cross-layer concepts and techniques to engage both hardware and software-level methods to build robust and energy-efficient architectures for these advanced ML networks.


Design and Generation of Efficient Hardware Accelerators for Sparse and Dense Tensor Computations

Design and Generation of Efficient Hardware Accelerators for Sparse and Dense Tensor Computations
Author: Nitish Kumar Srivastava
Publisher:
Total Pages: 140
Release: 2020
Genre:
ISBN:

Download Design and Generation of Efficient Hardware Accelerators for Sparse and Dense Tensor Computations Book in PDF, ePub and Kindle

Tensor algebra lives at the heart of big data applications. Where classical machine learning techniques such as embedding generation in recommender systems, dimensionality reduction and latent Dirichlet allocation make use of multi-dimensional tensor factorizations, deep learning techniques such as convolutional neural networks, recurrent neural networks and graph learning use tensor computations primarily in the form of matrix-matrix and matrix-vector multiplications. The tensor computations often used in many of these fields operate on sparse data where most of the elements are zeros. Traditionally, tensor computations have been performed on CPUs and GPUs, both of which have low energy-efficiency as they allocate excessive hardware resources to flexibly support various workloads. However, with the end of Moore's law and Dennard scaling, one can no longer expect more and faster transistors for the same dollar and power budget. This has led to an ever-growing need for energy-efficient and high-performance hardware that has resulted in a recent surge of interest in application-specific, domain-specific and behavior-specific accelerators, which sacrifice generality for higher performance and energy efficiency. In this dissertation, I explore hardware specialization for tensor computations by building programmable accelerators. A central theme in my dissertation is determining common spatial optimizations, computation and memory access patterns, and building efficient storage formats and hardware for tensor computations. First, I present T2S-Tensor, a language and compilation framework for productively generating high-performance systolic arrays for dense tensor computations. Then I present a versatile accelerator, Tensaurus, that can accelerate both dense and mixed sparse-dense tensor computations. Here, I also introduce a new sparse storage format that allows accessing sparse data in a vectorized and streaming fashion and thus achieves high memory bandwidth utilization for sparse tensor kernels. Finally, I present a novel sparse-sparse matrix multiplication accelerator, MatRaptor, designed using a row-wise product approach. I also show how these different hardware specialization techniques outperform CPUs, GPUs and state-of-the-art accelerators in both energy efficiency and performance.