high performance computing on graphics processing units: hgpu.org

Posts

Mar, 1

Optimization of Molecular Dynamics Simulation Code and Applications to Biomolecular Systems

The performance of molecular dynamics (MD) software such as GROMACS is limited by the software’s ability to perform force calculations. The largest part of this is for nonbonded interactions such as between water molecules and water molecules and solute. The determination of nonbonded interactions may account for over 90% of the total computation and real […]

CUDA

•

OpenCL

Mar, 1

Virtualizing Deep Neural Networks for Memory-Efficient Neural Network Design

The most widely used machine learning frameworks require users to carefully tune their memory usage so that the deep neural network (DNN) fits into the DRAM capacity of a GPU. This restriction hampers a researcher’s flexibility to study different machine learning algorithms, forcing them to either use a less desirable network architecture or parallelize the […]

CUDA

Mar, 1

GPU performance analysis of a nodal discontinuous Galerkin method for acoustic and elastic models

Finite element schemes based on discontinuous Galerkin methods possess features amenable to massively parallel computing accelerated with general purpose graphics processing units (GPUs). However, the computational performance of such schemes strongly depends on their implementation. In the past, several implementation strategies have been proposed. They are based exclusively on specialized compute kernels tuned for each […]

CUDA

Feb, 25

GPU Robot Motion Planning using Semi-Infinite Nonlinear Programming

We propose a many-core GPU implementation of robotic motion planning formulated as a semi-infinite optimization program. Our approach computes the constraints and their gradients in parallel, and feeds the result to a nonlinear optimization solver running on the CPU. To ensure the continuous satisfaction of our constraints, we use polynomial approximations over time intervals. Because […]

CUDA

Feb, 25

Auto-Vectorizing a Large-scale Production Unstructured-mesh CFD Application

For modern x86 based CPUs with increasingly longer vector lengths, achieving good vectorization has become very important for gaining higher performance. Using very explicit SIMD vector programming techniques has been shown to give near optimal performance, however they are difficult to implement for all classes of applications particularly ones with very irregular memory accesses and […]

CUDA

Feb, 25

DIANNE: Distributed Artificial Neural Networks for the Internet of Things

Nowadays artificial neural networks are widely used to accurately classify and recognize patterns. An interesting application area is the Internet of Things (IoT), where physical things are connected to the Internet, and generate a huge amount of sensor data that can be used for a myriad of new, pervasive applications. Neural networks’ ability to comprehend […]

CUDA

Feb, 25

ANTS2 package: simulation and experimental data processing for Anger camera type detectors

ANTS2 is a simulation and data processing package developed for position sensitive detectors with Anger camera type readout. The simulation module of ANTS2 is based on ROOT package from CERN, which is used to store the detector geometry and to perform 3D navigation. The module is capable of simulating particle sources, performing particle tracking, generating […]

CUDA

Feb, 25

Parallel Approaches to Shortest-Path Problems for Multilevel Heterogeneous Computing

Many graph algorithms have given solution to the problem of finding shortest paths between nodes in a graph. These problems are considered among the fundamental combinatorial optimization problems. They have many applications, such as car/robot navigation systems, traffic simulations, tramp steamer problem, courier-scheduling optimization, Internet route planners, web searching, or exploiting arbitrage opportunities in currency […]

CUDA

Feb, 23

The 3rd Int. Conference on Robotics and Mechatronics (ICROM), 2016

★Place：Quality Hotel，Singapore 201 Balestier Road Singapore 329926 | Tel: (65)6355 9988 | Fax: (65) 6255 0998 ★★KEYNOTE★★ 1. Prof. Hubert Roth, Siegen University, Germany 2. Prof. Shujiro Dohta, Okayama University of Science, Japan 3. Prof. Wei-Hsin Liao, Chinese University of Hong Kong, Hong Kong 4. Prof. Zhang Shanyong, Sam, Nanyang Technological University, Singapore ★★ All […]

Feb, 23

First Int. Workshop on Pattern Recognition (IWPR 2016), 2016

Publication: Submitted and accepted papers will be published by SPIE. Indexing: Scopus, Ei Compendex, ISI, Inspec, Google Scholar. Sponsored by: University of Toyama, Japan Hosei University, Japan Kogakuin University, Japan Teikyo University, Japan North Carolina Agricultural and Technical State University, USA Hainan University, China Keynote Speakers: Prof. Chiharu Ishll, Hosei University, Japan Prof. Genci Capi,University […]

Feb, 23

Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL

OpenCL is a portable interface that can be used to program cluster nodes with heterogeneous compute devices. The OpenCL specification tightly binds its workflow abstraction, or "command queue", to a specific device for the entire program. For best performance, the user has to find the ideal queue-device mapping at command queue creation time, an effort […]

OpenCL

Feb, 23

VirtCL: a framework for OpenCL device abstraction and management

The interest in using multiple graphics processing units (GPUs) to accelerate applications has increased in recent years. However, the existing heterogeneous programming models (e.g., OpenCL) abstract details of GPU devices at the per-device level and require programmers to explicitly schedule their kernel tasks on a system equipped with multiple GPU devices. Unfortunately, multiple applications running […]

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Optimization of Molecular Dynamics Simulation Code and Applications to Biomolecular Systems

Virtualizing Deep Neural Networks for Memory-Efficient Neural Network Design

GPU performance analysis of a nodal discontinuous Galerkin method for acoustic and elastic models

GPU Robot Motion Planning using Semi-Infinite Nonlinear Programming

Auto-Vectorizing a Large-scale Production Unstructured-mesh CFD Application

DIANNE: Distributed Artificial Neural Networks for the Internet of Things

ANTS2 package: simulation and experimental data processing for Anger camera type detectors

Parallel Approaches to Shortest-Path Problems for Multilevel Heterogeneous Computing

The 3rd Int. Conference on Robotics and Mechatronics (ICROM), 2016

First Int. Workshop on Pattern Recognition (IWPR 2016), 2016

Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL

VirtCL: a framework for OpenCL device abstraction and management

Recent source codes

Specx: Speculative task-based runtime system

Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

KISim: Kubernetes Intelligent Scheduling Simulator

Hardware Compute Partitioning on NVIDIA GPUs for Composable Systems

Efficient GPU Implementation of Multi-Precision Integer Division

ParEval: A Parallel Code Evaluation Benchmark

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

exa-AMD: Exascale Accelerated Materials Discovery

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

Most viewed papers (last 30 days)