high performance computing on graphics processing units: hgpu.org

Posts

Oct, 11

Fast Surface Extraction and Visualization of Medical Images using OpenCL and GPUs

Marching Cubes (MC) is an algorithm that extracts surfaces from volumetric data. It is used extensively in visualization and analysis of medical data from modalities like CT and MR, often after a 3D segmentation of the interesting structures is performed. Traditional implementations of MC on modern CPUs are slow, using several seconds (even minutes) to […]

CUDA

•

OpenCL

Oct, 6

Multi-core programming with OpenCL: performance and portability: OpenCL in a memory bound scenario

With the advent of multi-core processors desktop computers have become multiprocessors requiring parallel programming to be utilized efficiently. Efficient and portable parallel programming of future multi-core processors and GPUs is one of today’s most important challenges within computer science. Okuda Laboratory at The University of Tokyo in Japan focuses on solving engineering challenges with parallel […]

CUDA

•

OpenCL

Oct, 6

Accelerating a climate physics model with OpenCL

Open Computing Language (OpenCL) is fast becoming the standard for heterogeneous parallel computing. It is designed to run on CPUs, GPUs, and other accelerator architectures. By implementing a real world application, a solar radiation model component widely used in climate and weather models, we show that OpenCL multi-threaded programming and execution model can dramatically increase […]

OpenCL

Oct, 4

Comparing Parallel Simulation of Social Agents using Cilk and OpenCL

Recent advances in wireless/mobile communication and body worn sensors, together with ambient intelligence and seamless integrated pervasive technology have paved the way for applications operating based on social signals, i. e., sensing and processing of group behavior, interpersonal relationships, or emotions. Thinking in large, it should be apparent that modeling social systems allowing to study […]

OpenCL

Oct, 4

Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL

We report on the acceleration of annual daylighting simulations for fenestration systems in the Radiance raytracing program. The algorithm was optimized to reduce both the redundant data input/output operations and the floating-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications was implemented using parallel computing on a graphics processing unit. We […]

OpenCL

Oct, 3

Parallel SAT-Solving with OpenCL

In the last few decades there have been substantial improvements in approaches for solving the Boolean satisfiability problem. Many of these improvements consisted in elaborating on existing algorithms. On the side of the complete solvers this led to more efficient branching heuristics and the use of watched literals for unit propagation; incomplete solvers on the […]

OpenCL

Oct, 3

Heterogeneous Computing with OpenCL

Heterogeneous Computing with OpenCL teaches OpenCL and parallel programming for complex systems that may include a variety of device architectures: multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units (APUs) such as AMD Fusion technology. Designed to work on multiple platforms and with wide industry support, OpenCL will help you more effectively program for a heterogeneous […]

OpenCL

Oct, 3

An OpenCL Fast Fourier Transformation

This paper describes an implementation strategy in preparation for an implementation of an OpenCL FFT. The two most essential factors (memory bandwidth and locality) that are crucial to obtain high performance on a GPU for an FFT implementation are highlighted. Theoretical upper bounds for performance in terms of the locality factor are derived. An implementation […]

OpenCL

Oct, 3

An Auto-tuning Solution to Data Streams Clustering in OpenCL

Due to its applicability to numerous types of data, including telephone records, web documents, and click streams, the data stream model has recently attracted attention. For analysis of such data, it is crucial to process the data in a single pass, or a small number of passes, using little memory. This paper provides an OpenCL […]

CUDA

•

OpenCL

Oct, 1

CheCL: Transparent Checkpointing and Process Migration of OpenCL Applications

We propose a new transparent checkpoint/restart (CPR) tool, named CheCL, for high performance and dependable GPU computing. CheCL can perform CPR on an OpenCL application program without any modification and recompilation of its code. A conventional checkpointing system fails to checkpoint a process if the process uses OpenCL. Therefore, in CheCL, every API call is […]

OpenCL

Oct, 1

A Comprehensive Performance Comparison of CUDA and OpenCL

This paper presents a comprehensive performance comparison between CUDA and OpenCL. We have selected 16 benchmarks ranging from synthetic applications to real-world ones. We make an extensive analysis of the performance gaps taking into account programming models, optimization strategies, architectural details, and underlying compilers. Our results show that, for most applications, CUDA performs at most […]

CUDA

•

OpenCL

Sep, 27

A framework to implement a multifrontal scheme on GPU architectures with OpenCL

In this work we analyze an open-source multifrontal solver implementation (UMFPACK) and modify it to transfer the computation load on an OpenCL device, typically a GPU. To achieve this result the dbOpenCL library has been created, which allows a neat integration of OpenCL code into existent C or C++ code. An analysis and pro ling […]

OpenCL