high performance computing on graphics processing units: hgpu.org

Posts

Nov, 18

Fast Gather-based Construction of Stereoscopic Images Using Reprojection

We developed a very fast reprojection technique to generate stereoscopic images from a 2D image with depth information. The technique is gather-based and therefore very fast on current graphics hardware. The depth information is sampled at a specific offset which provides the depth to reproject from the left or right camera to the center camera. […]

OpenGL

Nov, 18

Accelerating The Cloud with Heterogeneous Computing

Heterogeneous multiprocessors that combine multiple CPUs and GPUs on a single die are posed to become commonplace in the market. As seen recently from the high performance computing community, leveraging a GPU can yield performance increases of several orders of magnitude. We propose using GPU acceleration to greatly speed up cloud management tasks in VMMs. […]

OpenCL

Nov, 18

Auto-tunable GPU BLAS

OpenCL is fast becoming the preferred framework used to make programs for heterogeneous platforms consisting of at least one CPU and one or more accelerators. The GPU being readily available in almost all computers, it is the most common accelerator in use.Good libraries are important to reduce development time and to make particular development environments, […]

OpenCL

Nov, 18

The Multi2Sim Simulation Framework: A CPU-GPU Model for Heterogeneous Computing

Multi2Sim is a simulation framework for heterogeneous computing, including models for superscalar, multithreaded, multicore, and graphics processors. Multi2Sim is an application-only simulator, which allows one or more applications to be run on top of it without booting a guest operating system first. In this chapter, an introduction to Multi2Sim is presented, and it is shown […]

OpenCL

Nov, 18

Optimizing the multipole-to-local operator in the fast multipole method for graphical processing units

This paper presents a number of algorithms to run the fast multipole method (FMM) on NVIDIA CUDA-capable graphical processing units (GPUs) (Nvidia Corporation, Sta. Clara, CA, USA). The FMM is a class of methods to compute pairwise interactions between N particles for a given error tolerance and with computational cost of O(N). The methods described […]

CUDA

Nov, 18

Neon: A Domain-Specific Programming Language for Image Processing

Neon is a high-level domain-specific programming language for writing efficient image processing programs which can run on either the CPU or the GPU. End users write Neon programs in a C# programming environment. When the Neon program is executed, our optimizing code generator outputs human-readable source files for either the CPU or GPU. These source […]

Nov, 17

Dax Toolkit: A Proposed Framework for Data Analysis and Visualization at Extreme Scale

Experts agree that the exascale machine will comprise processors that contain many cores, which in turn will necessitate a much higher degree of concurrency. Software will require a minimum of a 1,000 times more concurrency. Most parallel analysis and visualization algorithms today work by partitioning data and running mostly serial algorithms concurrently on each data […]

CUDA

Nov, 17

Compilation for Heterogeneous Computing: Automating Analyses, Transformations and Decisions

Hardware accelerators, such as fpga boards or gpu, are an interesting alternative or a valuable complement to classic multi-core processors for computational-intensive software. However it proves to be both costly and difficult to use legacy applications with these new heterogeneous targets. In particular, existing compilers are generally targeted toward code generation for sequential processors and […]

CUDA

Nov, 17

Programming Future Parallel Architectures with Haskell and Intel ArBB

New parallel architectures, such as Cell, Intel MIC, GPUs, and tiled architectures, enable high performance but are often hard to program. What is needed is a bridge between high-level programming models where programmers are most productive and modern parallel architectures. We propose that that bridge is Embedded Domain Specific Languages (EDSLs). One attractive target for […]

CUDA

Nov, 17

Scientific GPU Programming with Data-Flow Languages

Graphical Processing Units or GPUs are processors used primarily to render images from computer models for domains ranging from gaming to design engineering. As the generation of very accurate images often in real time is extremely computationally intensive, they have developed into extremely powerful processors. To achieve this they have relied on being able to […]

CUDA

Nov, 17

FPGA and ASIC Convergence

The growing demands on multimedia applications and high-speed high-quality telecommunication systems with real-time constrains oriented to portable, low power consumption, devices, have being driven technologies development, methodologies and design flows of embedded systems during the last years. Through the analysis of design methodologies and strategies facing multi-core, reconfigurability and power consumption challenges, this educational survey […]

Nov, 17

Characterization and Transformation of Unstructured Control Flow in GPU Applications

Hardware and compiler techniques for mapping data-parallel programs with divergent control flow to SIMD architectures have recently enabled the emergence of new GPGPU programming models such as CUDA and OpenCL. Although this technology is widely used, commodity GPUs use different schemes to implement it, and the performance limitations of these different schemes under real workloads […]

CUDA

•

OpenCL