This paper presents an implementation of different matrix-matrix multiplication routines in OpenCL. We utilize the high-performance GEMM (GEneral Matrix-Matrix Multiply) implementation from our previous work for the present implementation of other matrix-matrix multiply routines in Level-3 BLAS (Basic Linear Algebra Subprograms). The other routines include SYMM (Symmetric Matrix-Matrix Multiply), SYRK (Symmetric Rank-K Update), SYR2K (Symmetric […]

October 29, 2014 by hgpu

The particle-mesh spreading operation maps a value at an arbitrary particle position to contributions at regular positions on a mesh. This operation is often used when a calculation involving irregular positions is to be performed in Fourier space. We study several approaches for particle mesh spreading on GPUs. A central concern is the use of […]

October 29, 2014 by hgpu

Today, heterogeneous computing has truly reshaped the way scientists think and approach high-performance computing (HPC). Hardware accelerators such as general-purpose graphics processing units (GPUs) and Intel Many Integrated Core (MIC) architecture continue to make in-roads in accelerating large-scale scientific applications. These advancements, however, introduce new sets of challenges to the scientific community such as: selection […]

October 29, 2014 by hgpu

We describe the neural-network training framework used in the Kaldi speech recognition toolkit, which is geared towards training DNNs with large amounts of training data using multiple GPU-equipped or multi-core machines. In order to be as hardware-agnostic as possible, we needed a way to use multiple machines without generating excessive network traffic. Our method is […]

October 29, 2014 by hgpu

Graphics Processing Units (GPUs) are highly parallel shared memory microprocessors, and as such, they are prone to the same concurrency considerations as their traditional multicore CPU counterparts. In this thesis, we consider shared memory consistency, i.e. what values can be read when issued concurrently with writes on current GPU hardware. While memory consistency has been […]

October 27, 2014 by hgpu

Using GPUs as general-purpose processors has revolutionized parallel computing by offering, for a large and growing set of algorithms, massive data-parallelization on desktop machines. As an obstacle to widespread adoption, programming GPUs has remained difficult due to the need of using low-level control of the hardware to achieve good performance. This paper suggests a programming […]

October 27, 2014 by hgpu

The existing matrix palette algorithms for skeletal animation are accelerated by the technique GPGPU based on GLSL or CUDA. Because GLSL is extended from graphics library OpenGL, it couples the rendering and calculations together closely and forces itself not convenient to reuse, meanwhile CUDA is designed only for NVIDIA GPUs. In this paper GPGPU based […]

October 25, 2014 by hgpu

This work introduces a bilevel, stochastic optimization problem aimed at robust, regional evacuation network design and shelter location under uncertain hazards. A regional planner, acting as a Stackelberg leader, chooses among evacuation-route contraflow operation and shelter location to minimize the expected risk exposure to evacuees. Evacuees then seek an equilibrium with respect to risk exposure […]

October 25, 2014 by hgpu

Data compression is the process of representing information in a compact form, in order to reduce the storage requirements and, hence, communication bandwidth. It has been one of the critical enabling technologies for the ongoing digital multimedia revolution for decades. In the variable-length encoding (VLE) compression method, most frequently occurring symbols are replaced by codes […]

October 25, 2014 by hgpu

In recent years, processing and exploration of time series has experienced a noticeable interest. Growing volumes of data and needs of efficient processing pushed the research in new directions, including hardware based solutions. Graphics Processing Units (GPU) have significantly more applications than just rendering images. They are also used in general purpose computing to solve […]

October 24, 2014 by hgpu

In this work, we present an extension of Gaussian process (GP) models with sophisticated parallelization and GPU acceleration. The parallelization scheme arises naturally from the modular computational structure w.r.t. datapoints in the sparse Gaussian process formulation. Additionally, the computational bottleneck is implemented with GPU acceleration for further speed up. Combining both techniques allows applying Gaussian […]

October 24, 2014 by hgpu