12980

Posts

Oct, 18

Cholla : A New Massively-Parallel Hydrodynamics Code For Astrophysical Simulation

We present Cholla (Computational Hydrodynamics On ParaLLel Architectures), a new three-dimensional hydrodynamics code that harnesses the power of graphics processing units (GPUs) to accelerate astrophysical simulations. Cholla models the Euler equations on a static mesh using state-of-the-art techniques, including the unsplit Corner Transport Upwind (CTU) algorithm, a variety of exact and approximate Riemann solvers, and […]
Oct, 18

Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition

Long short-term memory (LSTM) based acoustic modeling methods have recently been shown to give state-of-the-art performance on some speech recognition tasks. To achieve a further performance improvement, in this research, deep extensions on LSTM are investigated considering that deep hierarchical model has turned out to be more efficient than a shallow one. Motivated by previous […]
Oct, 16

The Distribution of OpenCL Kernel Execution Across Multiple Devices

Many computer systems now include both CPUs and programmable GPUs. OpenCL, a new programming framework, can program individual CPUs or GPUs; however, distributing a problem across multiple devices is more difficult. This thesis contributes three OpenCL runtimes that automatically distribute a problem across multiple devices: DualCL and m2sOpenCL, which distribute tasks across a single system’s […]
Oct, 16

OpenCL Implementation of Montgomery Multiplication on FPGA

Galois Field arithmetic has been used very frequently in popular security and error-correction applications. Montgomery multiplication is among the suitable methods used for accelerating modular multiplication, which is the most time consuming basic arithmetic operation. Montgomery multiplication is also suitable to be implemented in parallel. OpenCL, which is a portable, heterogeneous and parallel programming framework, […]
Oct, 16

Parallel Programming and Compressed Material Data for an Eulerian Code

We describe the problem of iterating over mesh zones and iterating over material data within a zone, in the context of relatively new compute architectures. We present an example for how this can be done in a way that is portable across parallel programming environments and can be made to perform well. We offer a […]
Oct, 16

Multi-GPU Based Lattice Boltzmann Method for Hemodynamic Simulation in Patient-Specific Cerebral Aneurysm

Conducting lattice Boltzmann method on GPU has been proved to be an effective manner to gain a significant performance benefit, thus the GPU or multi-GPU based lattice Boltzmann method is considered as a promising and competent candidate in the study of large-scale complex fluid flows. In this work, a multi-GPU based lattice Boltzmann algorithm coupled […]
Oct, 16

Pipelined Iterative Solvers with Kernel Fusion for Graphics Processing Units

We revisit the implementation of iterative solvers on discrete graphics processing units and demonstrate the benefit of implementations using extensive kernel fusion for pipelined formulations over conventional implementations of classical formulations. The proposed implementations with both CUDA and OpenCL are freely available in ViennaCL and achieve up to three-fold performance gains when compared to other […]
Oct, 16

5th International Conference on Bioscience, Biochemistry and Bioinformatics, ICBBB 2015

Last Round Deadline: 2014-11-15 Publication: Submitted conference papers will be reviewed by technical committees of the Conference.ICBBB 2015 papers will be published in: *WIT Transactions on Biomedicine and Health (ISSN: 1743-3525), all the papers published by WIT Press which will be indexed by EI Compendex and SCOPUS. *International Journal of Bioscience, Biochemistry and Bioinformatics (IJBBB, […]
Oct, 14

A Case Study of OpenCL on an Android Mobile GPU

An observation in supercomputing in the past decade illustrates the transition of pervasive commodity products being integrated with the world’s fastest system. Given today’s exploding popularity of mobile devices, we investigate the possibilities for high performance mobile computing. Because parallel processing on mobile devices will be the key element in developing a mobile and computationally […]
Oct, 14

Synthetic Aperture Radar imaging on a CUDA-enabled mobile platform

This paper presents the details of a Synthetic Aperture Radar (SAR) imaging on the smallest CUDA-capable platform available, the Jetson TK1. The results indicate that GPU accelerated embedded platforms have considerable potential for this type of workload and in conjunction with low power consumption, light weight and standard programming tools, could open new horizons in […]
Oct, 14

A Complete and Efficient CUDA-Sharing Solution for HPC Clusters

In this paper we detail the key features, architectural design, and implementation of rCUDA, an advanced framework to enable remote and transparent GPGPU acceleration in HPC clusters. rCUDA allows decoupling GPUs from nodes, forming pools of shared accelerators, which brings enhanced flexibility to cluster configurations. This opens the door to configurations with fewer accelerators than […]
Oct, 14

Random Address Permute-Shift Technique for the Shared Memory on GPUs

The Discrete Memory Machine (DMM) is a theoretical parallel computing model that captures the essence of memory access to the shared memory of a streaming multiprocessor on CUDA-enabled GPUs. The DMM has w memory banks that constitute a shared memory, and w threads in a warp try to access them at the same time. However, […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: