high performance computing on graphics processing units: hgpu.org

Posts

Oct, 13

Adaptive Simulation of Large-Scale Ocean Surface

Physically-driven methods of simulating fluid dynamics and frequencybased ocean surface synthesis methods are of long-standing interest for the field of computer graphics. However, they have been historically used separately or without any interaction between them. This thesis focuses on the possibility of combining the approaches into one adaptive solution by proposing methods for unified surface […]

CUDA

Oct, 13

A Hybrid CPU/GPU Pattern-Matching Algorithm for Deep Packet Inspection

The large quantities of data now being transferred via high-speed networks have made deep packet inspection indispensable for security purposes. Scalable and low-cost signature-based network intrusion detection systems have been developed for deep packet inspection for various software platforms. Traditional approaches that only involve central processing units (CPUs) are now considered inadequate in terms of […]

CUDA

Oct, 13

Optimal Piecewise Linear Function Approximation for GPU-based Applications

Many computer vision and human-computer interaction applications developed in recent years need evaluating complex and continuous mathematical functions as an essential step toward proper operation. However, rigorous evaluation of this kind of functions often implies a very high computational cost, unacceptable in real-time applications. To alleviate this problem, functions are commonly approximated by simpler piecewise-polynomial […]

CUDA

Oct, 13

Fast and Accurate Poisson Denoising with Optimized Nonlinear Diffusion

The degradation of the acquired signal by Poisson noise is a common problem for various imaging applications, such as medical imaging, night vision and microscopy. Up to now, many state-of-the-art Poisson denoising techniques mainly concentrate on achieving utmost performance, with little consideration for the computation efficiency. Therefore, in this study we aim to propose an […]

Oct, 11

The 4th International Conference on Control, Robotics and Informatics (ICCRI), 2015

2015 The 4th International Conference on Control, Robotics and Informatics (ICCRI 2015) will be held in Tokyo, Japan, during December 26-27, 2015. The conference is sponsored by Science and Engineering Institute and The University of Texas at Dallas, USA. More, please visit: http://www.iccri.org/ (SCOPUS & Ei Compendex) ICCRI 2015 conference proceedings will be published by […]

Oct, 11

IEEE International Conference on Big Data Analysis (ICBDA), 2016

Dear Scholars and Researchers, Warmest Greetings from ICBDA2016! This is 2016 IEEE International Conference on Big Data Analysis (ICBDA 2016) conference committee. We are very pleased to tell you that ICBDA2016 will be held in Hang Zhou, China during March 12-14, 2016. Publication After a careful reviewing process, all accepted papers after proper registration and […]

Oct, 11

6th International Conference on Bioscience, Biochemistry and Bioinformatics (ICBBB), 2016

General Introduction • Index: All the accepted papers will be published in the volume of MATEC Web of Conferences (ISSN: 2261-236X), which is indexed by Ei Compendex, Inspec, DOAJ, CPCI (Web of Science) and Scopus. • Famous professors as Keynote Speakers: Prof. Orawan Siriratpiriya, Chulalongkorn University (ARRIC), Thailand (the oldest university considered the most prestigious […]

Oct, 11

Meta-programming and Multi-stage Programming for GPGPUs

GPGPUs and other accelerators are becoming a mainstream asset for high-performance computing. Raising the programmability of such hardware is essential to enable users to discover, master and subsequently use accelerators in day-to-day simulations. Furthermore, tools for high-level programming of parallel architectures are becoming a great way to simplify the exploitation of such systems. For this […]

CUDA

Oct, 11

GPU Accelarated Multi-Block Lattice Boltzmann Solver for Viscous Flow Problems

We developed a lattice Boltzmann Solver, which can be used for the solution of low Reynolds number flow problems. Then, we modified it to run on Graphical Processing Unit using Compute Unified Device Architecture, which is a parallel computing platform and programming model created by NVIDIA. Comparison of the results that we obtained on Graphical […]

CUDA

Oct, 11

Performance Analysis of an Astrophysical Simulation Code on the Intel Xeon Phi Architecture

We have developed the astrophysical simulation code XFLAT to study neutrino oscillations in supernovae. XFLAT is designed to utilize multiple levels of parallelism through MPI, OpenMP, and SIMD instructions (vectorization). It can run on both CPU and Xeon Phi co-processors based on the Intel Many Integrated Core Architecture (MIC). We analyze the performance of XFLAT […]

Oct, 11

Accelerating the D3Q19 Lattice Boltzmann Model with OpenACC and MPI

Multi-GPU implementations of the Lattice Boltzmann method are of practical interest as they allow the study of turbulent flows on large-scale simulations at high Reynolds numbers. Although programming GPUs, and in general power-efficient accelerators, typically guarantees high performances, the lack of portability in their low-level programming models implies significant efforts for maintainability and porting of […]

Oct, 11

GPU acceleration of preconditioned solvers for ill-conditioned linear systems

In this work we study the implementations of deflation and preconditioning techniques for solving ill-conditioned linear systems using iterative methods. Solving such systems can be a time-consuming process because of the jumps in the coefficients due to large difference in material properties. We have developed implementations of the iterative methods with these preconditioning techniques on […]

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Adaptive Simulation of Large-Scale Ocean Surface

A Hybrid CPU/GPU Pattern-Matching Algorithm for Deep Packet Inspection

Optimal Piecewise Linear Function Approximation for GPU-based Applications

Fast and Accurate Poisson Denoising with Optimized Nonlinear Diffusion

The 4th International Conference on Control, Robotics and Informatics (ICCRI), 2015

IEEE International Conference on Big Data Analysis (ICBDA), 2016

6th International Conference on Bioscience, Biochemistry and Bioinformatics (ICBBB), 2016

Meta-programming and Multi-stage Programming for GPGPUs

GPU Accelarated Multi-Block Lattice Boltzmann Solver for Viscous Flow Problems

Performance Analysis of an Astrophysical Simulation Code on the Intel Xeon Phi Architecture

Accelerating the D3Q19 Lattice Boltzmann Model with OpenACC and MPI

GPU acceleration of preconditioned solvers for ill-conditioned linear systems

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)