high performance computing on graphics processing units: hgpu.org

Posts

Dec, 24

Toolchain for programming, simulating and studying the XMT many-core architecture

The Explicit Multi-Threading (XMT) is a general-purpose many-core computing platform, with the vision of a 1000-core chip that is easy to program but does not compromise on performance. This paper presents a publicly available tool chain for XMT, complete with a highly configurable cycle-accurate simulator and an optimizing compiler. The XMT tool chain has matured […]

Dec, 24

High Performance and Scalable Radix Sorting: A case study of implementing dynamic parallelism for GPU computing

The need to rank and order data is pervasive, and many algorithms are fundamentally dependent upon sorting and partitioning operations. Prior to this work, GPU stream processors have been perceived as challenging targets for problems with dynamic and global data-dependences such as sorting. This paper presents: (1) a family of very efficient parallel algorithms for […]

CUDA

Dec, 24

Efficient Use of In-Game Ray-Tracing Techniques

Ray-tracing is a computational demanding image generation technique capable of create photo-realistic images. Due to its high demand of computational power, ray-tracing is not used for realtime applications, however, with the massive parallel capabilities of current Graphics Processing Units, and the fact that ray tracing is a suitable application for parallel processing, the use of […]

CUDA

•

OpenGL

Dec, 24

Sorting on GPUs for large scale datasets: A thorough comparison

Although sort has been extensively studied in many research works, it still remains a challenge in particular if we consider the implications of novel processor technologies such as manycores (i.e. GPUs, Cell/BE, multicore, etc.). In this paper, we compare different algorithms for sorting integers on stream multiprocessors and we discuss their viability on large datasets […]

CUDA

Dec, 24

An Experiment in Parallelizing the Fast Fourier Transform

We present the parallel implementation of two new algorithms developed for the discrete cosine transform. These algorithms support the new interleaved fast Fourier transform method. Our techniques were realized using the MPI standard library and executed on a variety of equipment for comparison. The results indicate a promising fresh direction in the search for efficient […]

CUDA

Dec, 24

Quasi-maximum Accuracy Floating-point Computations with GPGPU for Applications in Digital Signal Processing

An idea of the use of two accumulators for improvement of the precision of floating-point computations with graphic processing units (GPUs) is presented in this paper for applications in digital signal processing. The increase of the precision of computations does not need any increase of the length of the data words. This is particularly important […]

CUDA

Dec, 24

Using Artificial Intelligence in Computational Games

This paper presents the conceptual definition of a framework to help in the construction of intelligent games, where Artificial Intelligence Techniques could be inserted in the game in an easier way. The requirements analysis of the main AI techniques are presented as well as the preliminary results of this framework.

CUDA

Dec, 24

Scalable multi-GPU implementation of the MAGFLOW simulator

We have developed a robust and scalable multi-GPU (Graphics Processing Unit) version of the cellular-automaton-based MAGFLOW lava simulator. The cellular automaton is partitioned into strips that are assigned to different GPUs, with minimal overlapping. For each GPU, a host thread is launched to manage allocation, deallocation, data transfer and kernel launches; the main host thread […]

CUDA

Dec, 24

Image processing algorithm optimization with CUDA for Pure Data

Image Processing Production lines featuring industrial vision are becoming more and more widespread. That kind of automation needs systems able to capture pictures, analyze and learn from them in order to take appropriate action. These processes are often heavy and applied to high-definition images with important frame rate. Powerful calculators are thus needed to follow […]

CUDA

Dec, 24

Simbuca, using a graphics card to simulate Coulomb interactions in a penning trap

In almost all cases, N-body simulations are limited by the computation time available. Coulomb interaction calculations scale with O(N^2) with N the number of particles. Approximation methods exist already to reduce the computation time to O(NlogN), although calculating the interaction still dominates the total simulation time. We present Simbuca, a simulation package for thousands of […]

CUDA

Dec, 23

GPU-based parallel computing for the simulation of complex multibody systems with unilateral and bilateral constraints: an overview

This work reports on advances in large-scale multibody dynamics simulation facilitated by the use of the Graphics Processing Unit (GPU). A description of the GPU execution model along with its memory spaces is provided to illustrate its potential parallel scientific computing. The equations of motion associated with the dynamics of large system of rigid bodies […]

CUDA

Dec, 23

SIMD Floating Point Extension for Ray Tracing

In the last decade, the importance of graphics capabilities have become very important in the mobile market. As a result low power embedded solutions for mobile devices have been eveloped to run computationally intensive graphics applications, which extensively uses floating point calculations. The work proposed in this thesis target the extension of the Silicon Hive […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Toolchain for programming, simulating and studying the XMT many-core architecture

High Performance and Scalable Radix Sorting: A case study of implementing dynamic parallelism for GPU computing

Efficient Use of In-Game Ray-Tracing Techniques

Sorting on GPUs for large scale datasets: A thorough comparison

An Experiment in Parallelizing the Fast Fourier Transform

Quasi-maximum Accuracy Floating-point Computations with GPGPU for Applications in Digital Signal Processing

Using Artificial Intelligence in Computational Games

Scalable multi-GPU implementation of the MAGFLOW simulator

Image processing algorithm optimization with CUDA for Pure Data

Simbuca, using a graphics card to simulate Coulomb interactions in a penning trap

GPU-based parallel computing for the simulation of complex multibody systems with unilateral and bilateral constraints: an overview

SIMD Floating Point Extension for Ray Tracing

Recent source codes

XaaS containers

microSYCL: SYCL micro-benchmarks repository

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

Most viewed papers (last 30 days)