7397

Posts

Mar, 18

Improving Cache Locality for Ray Casting with CUDA

In this paper, we present an acceleration method for texture-based ray casting on the compute unified device architecture (CUDA) compatible graphics processing unit (GPU). Since ray casting is a memory-intensive application, our method increases the hit rate of the texture cache during rendering. To achieve this, our method dynamically selects the width and height of […]
Mar, 18

Towards user transparent parallel multimedia computing on GPU-clusters

The research area of Multimedia Content Analysis (MMCA) considers all aspects of the automated extraction of knowledge from multimedia archives and data streams. To satisfy the increasing computational demands of MMCA problems, the use of High Performance Computing (HPC) techniques is essential. As most MMCA researchers are not HPC experts, there is an urgent need […]
Mar, 18

Enabling Fast, Noncontiguous GPU Data Movement in Hybrid MPI+GPU Environments

Lack of efficient and transparent interaction with GPU data in hybrid MPI GPU environments challenges GPU acceleration of largescale scientific and engineering computations. A particular challenge is the efficient transfer of noncontiguous data to and from GPU memory. MPI supports such transfers through the use of datatypes, however an efficient means of utilizing datatypes for […]
Mar, 18

Usable assembly language for GPUs: a success story

The NVIDIA compilers nvcc and ptxas leave the programmer with only very limited control over register allocation, register spills, instruction selection, and instruction scheduling. In theory a programmer can gain control by writing an entire kernel in van der Laan’s cudasm assembly language, but this requires tedious, error-prone tracking of register assignments. This paper introduces […]
Mar, 18

VOCL: An Optimized Environment for Transparent Virtualization of Graphics Processing Units

Graphics processing units (GPUs) have been widely used for general purpose computation acceleration. However, current programming models such as CUDA and OpenCL can support GPUs only on the local computing node, where the application execution is tightly coupled to the physical GPU hardware. In this work, we propose a virtual OpenCL (VOCL) framework to support […]
Mar, 16

Globally scheduled real-time multiprocessor systems with GPUs

Graphics processing units, GPUs, are powerful processors that can offer significant performance advantages over traditional CPUs. The last decade has seen rapid advancement in GPU computational power and generality. Recent technologies make it possible to use GPUs as co-processors to CPUs. The performance advantages of GPUs can be great, often outperforming traditional CPUs by orders […]
Mar, 16

CUDA 2D Stencil Computations for the Jacobi Method

We are witnessing the consolidation of the GPUs streaming paradigm in parallel computing. This paper explores stencil operations in CUDA to optimize on GPUs the Jacobi method for solving Laplace’s differential equation. The code keeps constant the access pattern through a large number of loop iterations, that way being representative of a wide set of […]
Mar, 16

Parallel Sparse Linear Algebra for Multi-core and Many-core Platforms: Parallel Solvers and Preconditioners

Partial differential equations are typically solved by means of finite difference, finite volume or finite element methods resulting in large, highly coupled, ill-conditioned and sparse (non-)linear systems. In order to minimize the computing time we want to exploit the capabilities of modern parallel architectures. The rapid hardware shifts from single core to multi-core and many-core […]
Mar, 16

Developing a CUDA solver for large sparse matrices for MARIN

This masters thesis has been written for the degree of Master of Science in Applied Mathematics at the faculty of Electrical Engineering, Mathematics and Computer Sciences of Delft University of Technology. The report ends a nine month internship carried out at Maritime Research Institute Netherlands (MARIN). MARIN supplies innovative products for the offshore industry and […]
Mar, 16

Multi-platform Linear Algebra

HiFlow3 is a multi-purpose finite element software providing powerful tools for efficient and accurate solution of a wide range of problems modeled by partial differential equations (PDEs). Based on object-oriented concepts and the full capabilities of C++ the HiFlow3 project follows a modular and generic approach for building efficient parallel numerical solvers. It provides highly […]
Mar, 15

On the Use of Small 2D Convolutions on GPUs

Computing many small 2D convolutions using FFTs is a basis for a large number of applications in many domains in science and engineering, among them electromagnetic diffraction modeling in physics. The GPU architecture seems to be a suitable architecture to accelerate these convolutions, but reaching high application performance requires substantial development time and non-portable optimizations. […]
Mar, 15

Iterative Statistical Kernels on Contemporary GPUs

We present a study of three important kernels that occur frequently in iterative statistical applications: Multi-Dimensional Scaling (MDS), PageRank, and K-Means. We implemented each kernel using OpenCL and evaluated their performance on NVIDIA Tesla and NVIDIA Fermi GPGPU cards using dedicated hardware, and in the case of Fermi, also on the Amazon EC2 cloud-computing environment. […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: