high performance computing on graphics processing units: hgpu.org

Posts

Oct, 26

22nd ACME Conference on Computational Mechanics

The purpose of this conference is to share state-of-the-art research findings and experience across the full range of Computational Mechanics. The conference organising committee is particularly keen to encourage the participation of young researchers, including PhD students and research assistants. The Conference will emphasize on recent developments in the field of Computational Mechanics through a […]

Oct, 26

Scalable Simulation of Tsunamis Generated by Submarine Landslides on GPU clusters

In this work we describe a GPU implementation of a first order two-layer Savage-Hutter type model introduced by E. D. Fernandez-Nieto et al in 2008 to simulate tsunamis generated by underwater landslides using the CUDA framework over structured meshes. We also describe an extension of this implementation which exploits the parallel power of a GPU […]

CUDA

Oct, 26

A New Approach of Performance Analysis of Certain Graph Algorithms

Computer Network based problems often require searching a node from another and finding a path from one node to another. To solve this we use graph algorithms. Solving these problems takes a lot of time and knowledge when solved manually. For this purpose graph algorithms where devised and solving these problems became easier but the […]

CUDA

Oct, 26

A Parallel Depth-aided Exemplar-based Inpainting for Real-time View Synthesis on GPU

Synthesizing new images from given image pair and their corresponding depth maps is an essential function for many 3D video applications. Exemplar-based inpainting methods have been proposed in recent years to be used to restore newly synthesized images by strategically filling the missing pixels which don’t have any references due to occlusion. Due to the […]

CUDA

Oct, 25

A Datalog Engine for GPUs

We present the design and evaluation of a Datalog engine for execution in Graphics Processing Units (GPUs). The engine evaluates recursive and non-recursive Datalog queries using a bottom-up approach based on typical relational operators. It includes a memory management scheme that automatically swaps data between memory in the host platform (a multicore) and memory in […]

CUDA

Oct, 25

Online Performance Projection for Clusters with Heterogeneous GPUs

We present a fully automated approach to project the relative performance of an OpenCL program over different GPUs. Performance projections can be made within a small amount of time, and the projection overhead stays relatively constant with the input data size. As a result, the technique can help runtime tools make dynamic decisions about which […]

OpenCL

Oct, 25

An Empirical Study of Intel Xeon Phi

With at least 50 cores, Intel Xeon Phi is a true many-core architecture. Featuring fairly powerful cores, two cache levels, and very fast interconnections, the Xeon Phi can get a theoretical peak of 1000 GFLOPs and over 240 GB/s. These numbers, as well as its flexibility – it can be used both as a coprocessor […]

Oct, 25

GGAS: Global GPU Address Spaces for Efficient Communication in Heterogeneous Clusters

Modern GPUs are powerful high-core-count processors, which are no longer used solely for graphics applications, but are also employed to accelerate computationally intensive general-purpose tasks. For utmost performance, GPUs are distributed throughout the cluster to process parallel programs. In fact, many recent high-performance systems in the TOP500 list are heterogeneous architectures. Despite being highly effective […]

CUDA

Oct, 25

Efficient SDS Simulations on Multi-GPU Nodes of XSEDE High-end Clusters

Efficiently studying Sodium Dodecyl Sulfate (SDS) molecules’ formations in the presence of different molar concentrations on high-end GPU clusters whose nodes share accelerators exposes us to several challenges, including the need to dynamically adapt the job lengths. Neither virtualization nor lightweight OS solutions can easily support generality, portability, and maintainability in concert. Our solution complements […]

CUDA

Oct, 24

Parallel GPU algorithms for alternate-triangular finite difference schemes

Parallel algorithms for modern high performance computing systems are required for fast modelling of high dimensional convection-diffusion processes in air. Such algorithms, designed for alternate-triangular finite difference splitting schemes applied to convection-diffusion equation, have been considered. An algorithm for single GPU systems and an algorithm for clusters with graphical processors has been described, algorithms’ performance […]

OpenCL

Oct, 24

Modeling system for GPU parallel tasks performance simulation

A flexible and extensible simulation tool architecture, called gpusim, is proposed for heterogeneous grid systems with graphics accelerators. The tool is based on open source Java framework GridSim. Checking for models adequacy and their initial investigation has been performed using known examples of parallel computation problems. The tool allows choosing the most optimal setting parameters […]

CUDA

Oct, 24

A Framework for Management of Distributed Data Processing and Event Selection for the Icecube Neutrino Observatory

IceCube is a one-gigaton neutrino detector designed to detect high-energy cosmic neutrinos. It is located at the geographic South Pole and was completed at the end of 2010. Simulation and data processing for IceCube requires a significant amount of computational power. We describe the design and functionality of IceProd, a management system based on Python, […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

22nd ACME Conference on Computational Mechanics

Scalable Simulation of Tsunamis Generated by Submarine Landslides on GPU clusters

A New Approach of Performance Analysis of Certain Graph Algorithms

A Parallel Depth-aided Exemplar-based Inpainting for Real-time View Synthesis on GPU

A Datalog Engine for GPUs

Online Performance Projection for Clusters with Heterogeneous GPUs

An Empirical Study of Intel Xeon Phi

GGAS: Global GPU Address Spaces for Efficient Communication in Heterogeneous Clusters

Efficient SDS Simulations on Multi-GPU Nodes of XSEDE High-end Clusters

Parallel GPU algorithms for alternate-triangular finite difference schemes

Modeling system for GPU parallel tasks performance simulation

A Framework for Management of Distributed Data Processing and Event Selection for the Icecube Neutrino Observatory

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)