high performance computing on graphics processing units: hgpu.org

Posts

Mar, 20

Mersenne Twister Random Number Generation on FPGA, CPU and GPU

Random number generation is a very important operation in computational science e.g. in Monte Carlo simulations methods. It is also a computationally intensive operation especially for high quality random number generation. In this paper, we present the design and implementation of a parallel implementation of one of the most widely used random number generators, namely […]

CUDA

Mar, 20

Performance Evaluation of the NVIDIA GeForce 8800 GTX GPU for Machine Learning

NVIDIA have released a new platform (CUDA) for general purpose computing on their graphical processing units (GPU). This paper evaluates use of this platform for statistical machine learning applications. The transfer rates to and from the GPU are measured, as is the performance of matrix vector operations on the GPU. An implementation of a sparse […]

CUDA

Mar, 20

Solving Large Regression Problems using an Ensemble of GPU-accelerated ELMs

This paper presents an approach that allows for performing regression on large data sets in reasonable time. The main component of the approach consists in speeding up the slowest operation of the used algorithm by running it on the Graphics Processing Unit (GPU) of the video card, instead of the processor (CPU). The experiments show […]

CUDA

Mar, 20

Joint Forces: From Multithreaded Programming to GPU Computing

Desktop software developers interest in graphics hardware is increasing as a result of modern graphics cards’ capabilities to act as compute devices that augment the main processor. This capability means parallel computing is no longer a dedicated task for the CPU. A trend toward heterogeneous computing combines the main processor and graphics processing unit (GPU). […]

Mar, 19

Mean Shift Parallel Tracking on GPU

We propose a parallel Mean Shift (MS) tracking algorithm on Graphics Processing Unit (GPU) using Compute Unified Device Architecture (CUDA). Traditional MS algorithm uses a large number of color histogram, say typically 16x16x16, which makes parallel implementation infeasible. We thus employ K-Means clustering to partition the object color space that enables us to represent color […]

CUDA

Mar, 19

Method for simulation of coastal terrain on GPU

The shader in the GPU are widely used to model coastal terrain, but the created terrain are of great similarity and unable to embody the differences of coastal features. To overcome the above disadvantage, we present a new modeling method for created terrain based on sketch map. Through specifying the coastal features type, the proposed […]

Mar, 19

Exploring utilisation of GPU for database applications

This study is devoted to exploring possible applications of GPU technology for acceleration of the database access. We use the n-gram based approximate text search engine as a test bed for GPU based acceleration algorithms. Two solutions – hybrid CPU/GPU and pure GPU algorithms for query processing are studied and compared with the baseline CPU […]

CUDA

Mar, 19

Efficient and Quality Contouring Algorithms on the GPU

Interactive isosurface extraction has recently become possible through successful efforts to map algorithms such as Marching Cubes (MC) and Marching Tetrahedra (MT) to modern Graphics Processing Unit (GPU) architectures. Other isosurfacing algorithms, however, are not so easily portable to GPUs, either because they involve more complex operations or because they are not based on discrete […]

Mar, 19

Implementing a GPU Programming Model on a non-GPU Accelerator Architecture

Parallel codes are written primarily for the purpose of performance. It is highly desirable that parallel codes be portable between parallel architectures without significant performance degradation or code rewrites. While performance portability and its limits have been studied thoroughly on single processor systems, this goal has been less extensively studied and is more difficult to […]

CUDA

Mar, 19

Scalable Multi Agent Simulation on the GPU

We present a unique and elegant graphics hardware realization of multi agent simulation. Specifically, we adapted Velocity Obstacles that suits well parallel computation on single instruction, multiple thread, SIMT, type architecture. We explore hash based nearest neighbors search to considerably optimize the algorithm when mapped on to the GPU. Moreover, to alleviate inefficiencies of agent […]

CUDA

Mar, 19

Password Recovery for RAR Files Using CUDA

Driven by the insatiable demand of real-time graphics, especially from the market of computer games, Graphics Processing Unit (GPU) is becoming a major computing horsepower during recent years since the performance of GPU is surpassing that of the contemporary CPU. This paper presents our study on how to efficiently recover the passwords for encrypted RAR […]

CUDA

Mar, 19

Password recovery for encrypted ZIP archives using GPUs

Protecting data by passwords in documents such as DOC, PDF or RAR, ZIP archives has been demonstrated to be weak under dictionary attacks. Time for recovering the passwords of such documents mainly depends on two factors: the size of the password search space and the computing power of the underline system. In this paper, we […]

CUDA

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

gpu_tracker: Python package for tracking and profiling GPU utilization in both desktop and high-performance computing environments

high performance computing on graphics processing units: hgpu.org

Posts

Mersenne Twister Random Number Generation on FPGA, CPU and GPU

Performance Evaluation of the NVIDIA GeForce 8800 GTX GPU for Machine Learning

Solving Large Regression Problems using an Ensemble of GPU-accelerated ELMs

Joint Forces: From Multithreaded Programming to GPU Computing

Mean Shift Parallel Tracking on GPU

Method for simulation of coastal terrain on GPU

Exploring utilisation of GPU for database applications

Efficient and Quality Contouring Algorithms on the GPU

Implementing a GPU Programming Model on a non-GPU Accelerator Architecture

Scalable Multi Agent Simulation on the GPU

Password Recovery for RAR Files Using CUDA

Password recovery for encrypted ZIP archives using GPUs

Recent source codes

SimSYCL: Synchronous, single-threaded, library-only SYCL implementation for debugging and verification

GPU plugin for PySCF

QArray

Celerity: High-level C++ for Accelerator Clusters

gpu_tracker: Context manager and CLI that tracks the computational-resource-usage of a code block or shell command, particularly the GPU usage

CIFAR-10 Airbench: 94% on CIFAR-10 in 3.29 second

LOOPer: a polyhedral compiler for expressing fast and portable data parallel algorithms

OpenMC Monte Carlo Code

Polygeist: C/C++ frontend for MLIR

Parallel Gaussian process with kernel approximation in CUDA

Most viewed papers (last 30 days)