high performance computing on graphics processing units: hgpu.org

Posts

Apr, 1

Optimizing Smith-Waterman algorithm on Graphics Processing Unit

Local Sequence alignment is an important task for bioinformatics. The most widely used algorithm is Smith-Waterman has a quadratic time complexity which is time consuming especially in large biological database search. Many attempts were made to accelerate Smith-Waterman using parallel architecture. In this paper a parallel implementation of Smith Waterman algorithm will be presented. This […]

OpenCL

Mar, 28

Programming Massively Parallel Architectures using MARTE: a Case Study

Nowadays, several industrial applications are being ported to parallel architectures. These applications take advantage of the potential parallelism provided by multiple core processors. Many-core processors, especially the GPUs(Graphics Processing Unit), have led the race of floating-point performance since 2003. While the performance improvement of general- purpose microprocessors has slowed significantly, the GPUs have continued to […]

OpenCL

Mar, 18

Using Parallel Computing for the Display and Simulation of the Space Debris Environment

Parallelism is becoming the leading paradigm in today’s computer architectures. In order to take full advantage of this development, new algorithms have to be specifically designed for parallel execution while many old ones have to be upgraded accordingly. One field in which parallel computing has been firmly established for many years is computer graphics. Calculating […]

OpenCL

Mar, 17

Language virtualization for heterogeneous parallel computing

As heterogeneous parallel systems become dominant, application developers are being forced to turn to an incompatiblemix of low level programming models (e.g. OpenMP, MPI, CUDA, OpenCL). However, these models do little to shield developers from the difficult problems of parallelization, data decomposition and machine-specific details. Most programmersare having a difficult time using these programming models […]

Mar, 9

Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems

Ocelot is a dynamic compilation framework designed to map the explicitly data parallel execution model used by NVIDIA CUDA applications onto diverse multithreaded platforms. Ocelot includes a dynamic binary translator from Parallel Thread eXecution ISA (PTX) to many-core processors that leverages the Low Level Virtual Machine (LLVM) code generator to target x86 and other ISAs. […]

CUDA

Mar, 7

Object-oriented stream programming using aspects

High-performance parallel programs that efficiently utilize heterogeneous CPU+GPU accelerator systems require tuned coordination among multiple program units. However, using current programming frameworks such as CUDA leads to tangled source code that combines code for the core computation with that for device and computational kernel management, data transfers between memory spaces, and various optimizations. In this […]

CUDA

Mar, 6

Speculative Execution on Multi-GPU Systems

The lag of parallel programming models and languages behind the advance of heterogeneous many-core processors has left a gap between the computational capability of modern systems and the ability of applications to exploit them. Emerging programming models, such as CUDA and OpenCL, force developers to explicitly partition applications into components (kernels) and assign them to […]

CUDA

Mar, 2

Mixed-Precision GPU-Multigrid Solvers with Strong Smoothers

In this chapter, we present efficient fine-grained parallelization techniques for robust multigrid solvers, in particular for numerically strong, inherently sequential smoothing operators. We apply them to sparse ill-conditioned linear systems of equations that arise from grid-based discretization techniques like finite differences, volumes and elements. Our exemplary results demonstrate both the numerical and runtime performance of […]

CUDA

Mar, 1

GPU Computation Using Mathematica and CUDA webinar

The webinar will provide an overview and use cases for CUDA and OpenCL, as well as a tutorial on how to use CUDA from within Mathematica. Topics: Overview of GPU, CUDA, and OpenCL Image Processing on the GPU Programming the GPU Using Mathematica GPU Programming Workflow within Mathematica

Feb, 21

Data parallel loop statement extension to CUDA: GpuC

In recent years, Graphics Processing Units (GPUs) have emerged as a powerful accelerator for general-purpose computations. GPUs are attached to every modern desktop and laptop host CPU as graphics accelerators. GPUs have over a hundred cores with lots of parallelism. Initially, they were used only for graphics applications such as image processing and video games. […]

CUDA

Feb, 20

Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs

In this paper we describe techniques for compiling fine-grained SPMD-threaded programs, expressed in programming models such as OpenCL or CUDA, to multicore execution platforms. Programs developed for manycore processors typically express finer thread-level parallelism than is appropriate for multicore platforms. We describe options for implementing fine-grained threading in software, and find that reasonable restrictions on […]

CUDA

•

OpenCL

Feb, 19

Decoupled Access/Execute Metaprogramming for GPU-Accelerated Systems

We describe the evaluation of several implementations of a simple image processing filter on an NVIDIA GTX 280 card. Our experimental results show that performance depends significantly on low-level details such as data layout and iteration space mapping which complicate code development and maintenance. We propose extending a CUDA or OpenCL like model with decoupled […]

CUDA

•

OpenCL