18728

Posts

Jan, 27

AccUDNN: A GPU Memory Efficient Accelerator for Training Ultra-deep Deep Neural Networks

Typically, Ultra-deep neural network(UDNN) tends to yield high-quality model, but its training process is usually resource intensive and time-consuming. Modern GPU’s scarce DRAM capacity is the primary bottleneck that hinders the trainability and the training efficiency of UDNN. In this paper, we present "AccUDNN", an accelerator that aims to make the utmost use of finite […]
Jan, 27

Efficient Implementation and Optimization of Geometric Multigrid Operations in the LIFT Framework

Geometric Multigrid (GMG) is an efficient method to solve partial differential equations. It consists of four operations (smooth, residual, restrict and prolongate), which are applied iteratively. All four operations are stencil computations and hence benefit from being executed on many-core architectures like GPUs. Programs executed on a GPU are usually written using low-level programming approaches […]
Jan, 27

Direct N-body code on low-power embedded ARM GPUs

This work arises on the environment of the ExaNeSt project aiming at design and development of an exascale ready supercomputer with low energy consumption profile but able to support the most demanding scientific and technical applications. The ExaNeSt compute unit consists of densely-packed low-power 64-bit ARM processors, embedded within Xilinx FPGA SoCs. SoC boards are […]
Jan, 20

Exploring FPGA-specific Optimizations for Irregular OpenCL Applications

OpenCL is emerging as a high-level hardware description language to address the productivity challenges of developing applications on FPGAs. Unlike traditional hardware description languages (HDLs), OpenCL provides an abstract interface to facilitate high productivity, enabling end users to rapidly describe the required computations, including parallelism and data movement, to create custom hardware accelerators for their […]
Jan, 20

AutoPhase: Compiler Phase-Ordering for High Level Synthesis with Deep Reinforcement Learning

The performance of the code generated by a compiler depends on the order in which the optimization passes are applied. In the context of high-level synthesis, the quality of the generated circuit relates directly to the code generated by the front-end compiler. Unfortunately, choosing a good order–often referred to as the phase-ordering problem–is an NP-hard […]
Jan, 20

Automatic acceleration of Numpy applications on GPUs and multicore CPUs

Frameworks like Numpy are a popular choice for application developers from varied fields such as image processing to bio-informatics to machine learning. Numpy is often used to develop prototypes or for deployment since it provides efficient implementation for operations involving arrays. Such an approach requires every operation to be executed eagerly. The result of each […]
Jan, 20

A Survey of FPGA Based Deep Learning Accelerators: Challenges and Opportunities

With the rapid development of in-depth learning, neural network and deep learning algorithms have been widely used in various fields, e.g., image, video and voice processing. However, the neural network model is getting larger and larger, which is expressed in the calculation of model parameters. Although a wealth of existing efforts on GPU platforms currently […]
Jan, 20

Tango: A Deep Neural Network Benchmark Suite for Various Accelerators

Deep neural networks (DNNs) have been proving the effectiveness in various computing fields. To provide more efficient computing platforms for DNN applications, it is essential to have evaluation environments that include assorted benchmark workloads. Though a few DNN benchmark suites have been recently released, most of them require to install proprietary DNN libraries or resource-intensive […]
Jan, 13

Vulkan 1.1.97 – A Specification (with all registered Vulkan extensions)

This document, referred to as the "Vulkan Specification" describes the Vulkan Application Programming Interface (API). Vulkan is a C99 API designed for explicit control of low-level graphics and compute functionality. The Vulkan specification is intended for use by both implementors of the API and application developers seeking to make use of the API, forming a […]
Jan, 13

Auto-tuned OpenCL kernel co-execution in OmpSs for heterogeneous systems

The emergence of heterogeneous systems has been very notable recently. The nodes of the most powerful computers integrate several compute accelerators, like GPUs. Profiting from such node configurations is not a trivial endeavour. OmpSs is a framework for task based parallel applications, that allows the execution of OpenCl kernels on different compute devices. However, it […]
Jan, 13

Exact Selectivity Computation for Modern In-Memory Database Query Optimization

Selectivity estimation remains a critical task in query optimization even after decades of research and industrial development. Optimizers rely on accurate selectivities when generating execution plans. They maintain a large range of statistical synopses for efficiently estimating selectivities. Nonetheless, small errors — propagated exponentially — can lead to severely sub-optimal plans—especially, for complex predicates. Database […]
Jan, 13

BitCracker: BitLocker meets GPUs

BitLocker is a full-disk encryption feature available in recent Windows versions. It is designed to protect data by providing encryption for entire volumes and it makes use of a number of different authentication methods. In this paper we present a solution, named BitCracker, to attempt the decryption, by means of a dictionary attack, of memory […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: