7232

Posts

Feb, 11

Scalability of Self-organizing Maps on a GPU cluster using OpenCL and CUDA

We evaluate a novel implementation of a Self-Organizing Map (SOM) on a Graphics Processing Unit (GPU) cluster. Using various combinations of OpenCL, CUDA, and two different graphics cards, we demonstrate the scalability of the SOM implementation on one to eight GPUs. Results indicate that while the algorithm scales well with the number of training samples […]
Feb, 10

Automatic Performance Optimization in ViennaCL for GPUs

Highly parallel computing architectures such as graphics processing units (GPUs) pose several new challenges for scientific computing, which have been absent on single core CPUs. However, a transition from existing serial code to parallel code for GPUs often requires a considerable amount of effort. The Vienna Computing Library (ViennaCL) presented in the beginning of this […]
Feb, 10

Customizing Instruction Set Extensible Reconfigurable Processors using GPUs

Many reconfigurable processors allow their instruction sets to be tailored according to the performance requirements of target applications. They have gained immense popularity in recent years because of this flexibility of adding custom instructions. However, most design automation algorithms for instruction set customization (like enumerating and selecting the optimal set of custom instructions) are computationally […]
Feb, 10

Ensemble K-means on multi-core architectures

Ensemble problems uses multiple models generated from a data set to improve the correctness and ensure faster convergence. The use of multiple models makes ensemble problems computationally intensive. In this paper, we explore the parallelization of ensemble problems on modern multicore hardware like CPUs and GPUs. We use the K-means clustering algorithm as a case […]
Feb, 10

Implementing Molecular Dynamics on Hybrid High Performance Computers – Particle-Particle Particle-Mesh

The use of accelerators such as graphics processing units (GPUs) has become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high-performance computers, machines with nodes containing more than one type of floating-point processor (e.g. CPU and GPU), are now becoming more […]
Feb, 10

Real-Time SAH BVH Construction for Ray Tracing Dynamic Scenes

This work is aimed at the development of effective algorithms for building of full SAH BVH trees on GPU in real-time. In this work it is presupposed that all the scene objects are represented by a number of triangles (the so-called "triangle soup"), at the same time the arbitrary changes in the geometry are allowed […]
Feb, 9

Accelerating H.264 Advanced Video Coding with GPU/CUDA Technology

With the rise of streaming media on the Internet and the YouTube revolution, the demand for online videos is costing companies a significant amount of bandwidth. To alleviate the resources needed for streaming media, video compression removes redundant information and minimizes the loss in quality experienced by a human audience. In response to the need […]
Feb, 9

Parallel Semi-Implicit Time Integrators

In this paper, we further develop a family of parallel time integrators known as Revisionist Integral Deferred Correction methods (RIDC) to allow for the semi-implicit solution of time dependent PDEs. Additionally, we show that our semi-implicit RIDC algorithm can harness the computational potential of multiple general purpose graphical processing units (GPGPUs) by utilizing existing CUBLAS […]
Feb, 9

The Boat Hull Model: Adapting the Roofline Model to Enable Performance Prediction for Parallel Computing

Multi-core and many-core were already major trends for the past six years, and are expected to continue for the next decades. With these trends of parallel computing, it becomes increasingly difficult to decide on which architecture to run a given application. In this work, we use an algorithm classification to predict performance prior to algorithm […]
Feb, 9

CudaRF: A CUDA-based Implementation of Random Forests

Machine learning algorithms are frequently applied in data mining applications. Many of the tasks in this domain concern high-dimensional data. Consequently, these tasks are often complex and computationally expensive. This paper presents a GPU-based parallel implementation of the Random Forests algorithm. In contrast to previous work, the proposed algorithm is based on the compute unified […]
Feb, 9

Real-time simulation of a spiking neural network model of the basal ganglia circuitry using general purpose computing on graphics processing units

Real-time simulation of a biologically realistic spiking neural network is necessary for evaluation of its capacity to interact with real environments. However, the real-time simulation of such a neural network is difficult due to its high computational costs that arise from two factors: (1) vast network size and (2) the complicated dynamics of biologically realistic […]
Feb, 8

Auto-Generation and Auto-Tuning of 3D Stencil Codes on GPU Clusters

This paper develops and evaluates search and optimization techniques for auto-tuning 3D stencil (nearest-neighbor) computations on GPUs. Observations indicate that parameter tuning is necessary for heterogeneous GPUs to achieve optimal performance with respect to a search space. Our proposed framework takes a most concise specification of stencil behavior from the user as a single formula, […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: