high performance computing on graphics processing units: hgpu.org

Posts

May, 29

Simple sorting algorithm test based on CUDA

With the development of computing technology, CUDA has become a very important tool. In computer programming, sorting algorithm is widely used. There are many simple sorting algorithms such as enumeration sort, bubble sort and merge sort. In this paper, we test some simple sorting algorithm based on CUDA and draw some useful conclusions.

CUDA

May, 29

International Conference on Biomedical Signal and Image Processing (ICBIP 2015), 2015

Submission Deadline: 2015-06-30 Topics: • Audio and acoustic signal processing • Bio-imaging and biomedical signal processing • Signal processing education systems • Speech processing • Industry technology tracks • Information forensics and security • Machine learning for signal processing • Localisation and tracking • Multimedia signal processing • Sensor array and multichannel signal • Design […]

May, 28

Particle-in-Cell Laser-Plasma Simulation on Xeon Phi Coprocessors

This paper concerns development of a high-performance implementation of the Particle-in-Cell method for plasma simulation on Intel Xeon Phi coprocessors. We discuss suitability of the method for Xeon Phi architecture and present our experience of porting and optimization of the existing parallel Particle-in-Cell code PICADOR. Direct porting with no code modification gives performance on Xeon […]

May, 28

Revisiting Actor Programming in C++

The actor model of computation has gained significant popularity over the last decade. Its high level of abstraction makes it appealing for concurrent applications in parallel and distributed systems. However, designing a real-world actor framework that subsumes full scalability, strong reliability, and high resource efficiency requires many conceptual and algorithmic additives to the original model. […]

OpenCL

May, 28

Towards Modeling Energy Consumption of Xeon Phi

In the push for exascale computing, energy efficiency is of utmost concern. System architectures often adopt accelerators to hasten application execution at the cost of power. The Intel Xeon Phi co-processor is unique accelerator that offers application designers high degrees of parallelism, energy-efficient cores, and various execution modes. To explore the vast number of available […]

May, 28

Implementing a Photorealistic Rendering System using GLSL

Ray tracing on GPUs is becoming quite common these days. There are many publicly available documents on how to implement basic ray tracing on GPUs for spheres and implicit surfaces. We even have some general frameworks for ray tracing on GPUs. We however hardly find details on how to implement more complex ray tracing algorithms […]

OpenGL

May, 28

Analysis of GPU Parallel Computing based on Matlab

Matlab is very widely used in scientific computing, but Matlab computational efficiency is lower than C language program. In order to improve the computing speed, some toolbox can use GPU to accelerate the computation. This paper describes GPU working principle, our experiments and results analysis of parallel computing by using GPU based on Matlab. Experimental […]

CUDA

May, 28

Citrix Ready Technical Webinar with NVIDIA

Blow your designers minds by combining 3D apps with XenDesktop and NVIDIA GPUs Got key users with 3D apps? Already familiar with XenDesktop? Want to blow your designers minds by combining 3D apps with Desktop Virtualization? Your key users drive your company IP. Desktop Virtualization with NVIDIA graphics will drive your key users’ productivity. Learn […]

May, 25

DjiNN and Tonic: DNN as a Service and Its Implications for Future Warehouse Scale Computers

As applications such as Apple Siri, Google Now, Microsoft Cortana, and Amazon Echo continue to gain traction, webservice companies are adopting large deep neural networks (DNN) for machine learning challenges such as image processing, speech recognition, natural language processing, among others. A number of open questions arise as to the design of a server platform […]

CUDA

May, 25

On the Choice of Tensor Estimation for Corner Detection, Optical Flow and Denoising

Many image processing methods such as corner detection,optical flow and iterative enhancement make use of image tensors. Generally, these tensors are estimated using the structure tensor. In this work we show that the gradient energy tensor can be used as an alternativeto the structure tensor in several cases. We apply the gradient energy tensor to […]

CUDA

•

OpenGL

May, 25

Nucleation of nanoparticles in a coarse grained fluid using OpenCL

In this thesis, the nucleation rate of almost hard spheres in a course-grained fluid is measured to study the effects of an explicit solvent on the nucleation rate. Previous measurements show a discrepancy between physical measurements and simulations, where the latter all used implicit solvents. In this thesis, the fluid is approximated using Stochastic Rotation […]

OpenCL

May, 25

ACCTuner: OpenACC Auto-Tuner For Accelerated Scientific Applications

We optimize parameters in OpenACC clauses for a stencil evaluation kernel executed on Graphical Processing Units (GPUs) using a variety of machine learning and optimization search algorithms, individually and in hybrid combinations, and compare execution time performance to the best possible obtained from brute force search. Several auto-tuning techniques – historic learning, random walk, simulated […]

high performance computing on graphics processing units: hgpu.org

Posts

Simple sorting algorithm test based on CUDA

International Conference on Biomedical Signal and Image Processing (ICBIP 2015), 2015

Particle-in-Cell Laser-Plasma Simulation on Xeon Phi Coprocessors

Revisiting Actor Programming in C++

Towards Modeling Energy Consumption of Xeon Phi

Implementing a Photorealistic Rendering System using GLSL

Analysis of GPU Parallel Computing based on Matlab

Citrix Ready Technical Webinar with NVIDIA

DjiNN and Tonic: DNN as a Service and Its Implications for Future Warehouse Scale Computers

On the Choice of Tensor Estimation for Corner Detection, Optical Flow and Denoising

Nucleation of nanoparticles in a coarse grained fluid using OpenCL

ACCTuner: OpenACC Auto-Tuner For Accelerated Scientific Applications

Recent source codes

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

Most viewed papers (last 30 days)