high performance computing on graphics processing units: hgpu.org

Posts

Jan, 20

Automated Techniques for Enabling Efficient MPI Application Migration

Applications that use the MPI standard have additional dependencies related to the MPI implementation. When migrating an MPI code to a new computing site, the binary will not run if these dependencies are not resolved by properly configuring the new site. In this work, we present techniques that automatically resolve dependencies before runtime and enable […]

Jan, 20

Experiences in Teaching a Specialty Multicore Computing Course

We detail the design and experiences in delivering a specialty multicore computing course whose materials are openly available. The course ambitiously covers three multicore programming paradigms: shared memory (OpenMP), device (CUDA) and message passing (RCCE), and involves significant practical work on their respective platforms: an UltraSPARC T2, Fermi GPU and the Single-Chip Cloud Computer. Specialized […]

CUDA

Jan, 20

Stochastic Progressive Photon Mapping for Dynamic Scenes

Stochastic Progressive Photon Mapping (SPPM) is a method to simulate consistent global illumination. It is especially useful for complicated light paths like caustics seen through a glass surface. Up to now, SPPM can only be applied to a static scene and noise-free images require hours to compute. Our approach is to extend this method to […]

Jan, 20

An Energy-Efficient Heterogeneous System for Embedded Learning and Classification

Embedded learning applications in automobiles, surveillance, robotics, and defense are computationally intensive, and process large amounts of real-time data. Systems for such workloads have to balance stringent performance constraints within limited power budgets. High performance computer processing units (CPUs) and graphics processing units (GPUs) cannot be used in an embedded platform due to power issues. […]

Jan, 20

Parallel Volume Rendering for Large Scientific Data

Data sets of immense size are regularly generated by large scale computing resources. Even among more traditional methods for acquisition of volume data, such as MRI and CT scanners, data which is too large to be effectively visualized on standard workstations is now commonplace. One solution to this problem is to employ a ‘visualization cluster,’ […]

OpenGL

Jan, 20

Leveraging on High-Performance Computing and Cloud Technologies in Digital Libraries: A Case Study

With the emergence of high-performance computing instances in the cloud, massive scale computations have become available to technically every organization. Digital libraries typically employ a data-intensive infrastructure, but given the resources, advanced services based on data and text mining could be developed. A fundamental issue is the ease of development and integration of such services. […]

Jan, 20

Efficient fMRI Analysis and Clustering on GPUs

Graphics processing units (GPUs) traditionally have been used to accelerate only parts of the graphics pipelines. The emergence of the new age GPUs as highly parallel, multi-threaded and many core processor systems with the ability to perform general purpose computations has opened doors for new form of heterogeneous computing where the GPU and CPU can […]

CUDA

Jan, 20

Coherent Spatiotemporal Filtering, Upsampling and Rendering of RGBZ Videos

Sophisticated video processing effects require both image and geometry information. We explore the possibility to augment a video camera with a recent infrared time-of-flight depth camera, to capture high-resolution RGB and low-resolution, noisy depth at video frame rates. To turn such a setup into a practical RGBZ video camera, we develop efficient data filtering techniques […]

Jan, 19

Dynamically Managed Data for CPU-GPU Architectures

GPUs are flexible parallel processors capable of accelerating real applications. To exploit them, programmers must ensure a consistent program state between the CPU and GPU memories by managing data. Manually managing data is tedious and error-prone. In prior work on automatic CPU-GPU data management, alias analysis quality limits performance, and type-inference quality limits applicability. This […]

CUDA

Jan, 19

Combining approximate inference methods for efficient learning on large computer clusters

An important challenge in machine learning is to develop learning algorithms that can handle large amounts of data at a realistically large scale. This entails not only the development of algorithms that can be efficiently trained to infer parameters of the model in a given dataset, but also demands careful thought about the tools (both […]

OpenCL

Jan, 19

Accelerated Large-Scale Multiple Sequence Alignment

BACKGROUND: Multiple sequence alignment (MSA) is a fundamental analysis method used in bioinformatics and many comparative genomic applications. Prior MSA acceleration attempts with reconfigurable computing have only addressed the first stage of progressive alignment and consequently exhibit performance limitations according to Amdahl’s Law. This work is the first known to accelerate the third stage of […]

Jan, 19

A High Performance Parallel FDTD Method Enhanced By Using SSE Instruction Set

In this paper, we introduce a hardware acceleration technique for the parallel Finite Difference Time Domain (FDTD) method using the SSE (Streaming SIMD (Single Instruction Multiple Data) Extensions) instruction set. The implementation of SSE instruction set to parallel FDTD method has been achieved the significant improvement on the simulation performance. The benchmarks of the SSE […]

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Automated Techniques for Enabling Efficient MPI Application Migration

Experiences in Teaching a Specialty Multicore Computing Course

Stochastic Progressive Photon Mapping for Dynamic Scenes

An Energy-Efficient Heterogeneous System for Embedded Learning and Classification

Parallel Volume Rendering for Large Scientific Data

Leveraging on High-Performance Computing and Cloud Technologies in Digital Libraries: A Case Study

Efficient fMRI Analysis and Clustering on GPUs

Coherent Spatiotemporal Filtering, Upsampling and Rendering of RGBZ Videos

Dynamically Managed Data for CPU-GPU Architectures

Combining approximate inference methods for efficient learning on large computer clusters

Accelerated Large-Scale Multiple Sequence Alignment

A High Performance Parallel FDTD Method Enhanced By Using SSE Instruction Set

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)