high performance computing on graphics processing units: hgpu.org

Posts

Jul, 16

Accelerating Eulerian Fluid Simulation With Convolutional Networks

Real-time simulation of fluid and smoke is a long standing problem in computer graphics, where state-of-the-art approaches require large compute resources, making real-time applications often impractical. In this work, we propose a data-driven approach that leverages the approximation power of deep-learning methods with the precision of standard fluid solvers to obtain both fast and highly […]

CUDA

Jul, 13

GPU Accelerated Discrete Element Method (DEM) Molecular Dynamics for Conservative, Faceted Particle Simulations

Faceted shapes, such as polyhedra, are commonly found in systems of nanoscale, colloidal, and granular particles. Many interesting physical phenomena, like crystal nucleation and growth, vacancy motion, and glassy dynamics are challenging to model in these systems because they require detailed dynamical information at the individual particle level. Within the granular materials community the Discrete […]

CUDA

Jul, 13

Survey of Domain-Specific Languages for FPGA Computing

High-performance FPGA programming has typically been the exclusive domain of a small band of specialized hardware developers. They are capable of reasoning about implementation concerns at the register-transfer level (RTL) which is analogous to assembly-level programming in software. Sometimes these developers are required to push further down to manage even lower levels of abstraction closer […]

Jul, 13

OpenFace: A general-purpose face recognition library with mobile applications

Cameras are becoming ubiquitous in the Internet of Things (IoT) and can use face recognition technology to improve context. There is a large accuracy gap between today’s publicly available face recognition systems and the state-of-the-art private face recognition systems. This paper presents our OpenFace face recognition library that bridges this accuracy gap. We show that […]

CUDA

Jul, 13

LU, QR, and Cholesky factorizations: Programming Model, Performance Analysis and Optimization Techniques for the Intel Knights Landing Xeon Phi

A wide variety of heterogeneous compute resources, ranging from multicore CPUs to GPUs and coprocessors, are available to modern computers, making it challenging to design unified numerical libraries that efficiently and productively use all these varied resources. For example, in order to efficiently use Intel’s Knights Langing (KNL) processor, the next-generation of Xeon Phi architectures, […]

Jul, 13

The Vectorization of the Tersoff Multi-Body Potential: An Exercise in Performance Portability

Molecular dynamics simulations, an indispensable research tool in computational chemistry and materials science, consume a significant portion of the supercomputing cycles around the world. We focus on multi-body potentials and aim at achieving performance portability. Compared with well-studied pair potentials, multibody potentials deliver increased simulation accuracy but are too complex for effective compiler optimization. Because […]

CUDA

Jul, 11

Using Deep Convolutional Neural Networks in Monte Carlo Tree Search

Deep Convolutional Neural Networks have revolutionized Computer Go. Large networks have emerged as state-of-the-art models for move prediction and are used not only as stand-alone players but also inside Monte Carlo Tree Search to select and bias moves. Using neural networks inside the tree search is a challenge due to their slow execution time even […]

CUDA

Jul, 11

Comparing Parallel Hardware Architectures for Visually Guided Robot Navigation

Local visual homing methods are a family of algorithms for visually guided navigation on mobile robots. Within this family, the so-called Min-Warping algorithm yields very precise results but is rather compute-intensive. For this reason, we developed several implementations of this algorithm for different parallel hardware architectures (multi-core CPUs with SIMD extensions, GPUs, FPGA) to arrive […]

OpenCL

Jul, 11

Large Scale GPU Accelerated PPMLR-MHD Simulations for Space Weather Forecast

PPMLR-MHD is a new magnetohydrodynamics (MHD) model used to simulate the interactions of the solar wind with the magnetosphere, which has been proved to be the key element of the space weather cause-and-effect chain process from the Sun to Earth. Compared to existing MHD methods, PPMLR-MHD achieves the advantage of high order spatial accuracy and […]

CUDA

Jul, 11

Deep Learning for Mortgage Risk

This paper analyzes multi-period mortgage risk at loan and pool levels using an unprecedented dataset of over 120 million prime and subprime mortgages originated across the United States between 1995 and 2014, which includes the individual characteristics of each loan, monthly updates on loan performance over the life of a loan, and a number of […]

CUDA

Jul, 11

Fast Predictive Image Registration

We present a method to predict image deformations based on patch-wise image appearance. Specifically, we design a patch-based deep encoder-decoder network which learns the pixel/voxel-wise mapping between image appearance and registration parameters. Our approach can predict general deformation parameterizations, however, we focus on the large deformation diffeomorphic metric mapping (LDDMM) registration model. By predicting the […]

CUDA

Jul, 11

[Serbian] The Methods and Procedures for Accelerating Operations and Queries in Large Database Systems and Data Warehouse (Big Data Systems)

The research topic of this doctoral thesis is the possibility of establishing a model for big data systems with corresponding software- hardware architectures to support sensor networks and IoT devices. The developed model is based on energy efficient, heterogeneous, massively parallelised SoC hardware platforms, with the support of software application architecture (such as openCL) for […]

OpenCL

high performance computing on graphics processing units: hgpu.org

Posts

Accelerating Eulerian Fluid Simulation With Convolutional Networks

GPU Accelerated Discrete Element Method (DEM) Molecular Dynamics for Conservative, Faceted Particle Simulations

Survey of Domain-Specific Languages for FPGA Computing

OpenFace: A general-purpose face recognition library with mobile applications

LU, QR, and Cholesky factorizations: Programming Model, Performance Analysis and Optimization Techniques for the Intel Knights Landing Xeon Phi

The Vectorization of the Tersoff Multi-Body Potential: An Exercise in Performance Portability

Using Deep Convolutional Neural Networks in Monte Carlo Tree Search

Comparing Parallel Hardware Architectures for Visually Guided Robot Navigation

Large Scale GPU Accelerated PPMLR-MHD Simulations for Space Weather Forecast

Deep Learning for Mortgage Risk

Fast Predictive Image Registration

[Serbian] The Methods and Procedures for Accelerating Operations and Queries in Large Database Systems and Data Warehouse (Big Data Systems)

Recent source codes

tritonBLAS: A Lightweight Triton-based General Matrix Multiplication (GEMM) Library

hls4ml: Machine learning on FPGAs using HLS

ThunderKittens: Tile primitives for speedy kernels

NVIDIA Nemotron Parse 1.1

Iris: AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming

HipKittens: Fast and Furious AMD Kernels

Fortran xDSL dialects

mt4g: Memory Topology 4 GPUs

Falcon: GPU-Based Floating-point Adaptive Lossless Compression

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization

Most viewed papers (last 30 days)