7842

Posts

Apr, 25

An Efficient Work-Distribution Strategy for Gridding Radio-Telescope Data on GPUs

This paper presents a novel work-distribution strategy for GPUs, that efficiently convolves radio-telescope data onto a grid, one of the most time-consuming processing steps to create a sky image. Unlike existing work-distribution strategies, this strategy keeps the number of device-memory accesses low, without incurring the overhead from sorting or searching within telescope data. Performance measurements […]
Apr, 25

The Bones Source-to-Source Compiler Manual

Recent advances in multi-core and many-core processors requires programmers to exploit an increasing amount of parallelism from their applications. Data parallel languages such as CUDA and OpenCL make it possible to take advantage of such processors, but still require a large amount of effort from programmers. To address the challenge of parallel programming, we introduce […]
Apr, 25

Comparison of Different Parallel Implementaions of the 2+1-Dimensional KPZ Model and the 3-Dimensional KMC Model

We show that efficient simulations of the Kardar-Parisi-Zhang interface growth in 2 + 1 dimensions and of the 3-dimensional Kinetic Monte Carlo of thermally activated diffusion can be realized both on GPUs and modern CPUs. In this article we present results of different implementations on GPUs using CUDA and OpenCL and also on CPUs using […]
Apr, 21

Multicore Processing for Classification and Clustering Algorithms

Data Mining algorithms such as classification and clustering are the future of computation, though multidimensional data-processing is required. People are using multicore processors with GPU’s. Most of the programming languages doesn’t provide multiprocessing facilities and hence wastage of processing resources. Clustering and classification algorithms are more resource consuming. In this paper we have shown strategies […]
Apr, 19

Algorithm Construction for GPGPU

Today every personal computer and almost every work-related computer has a GPU powerful enough to be used as a supplementary computational device. One framework which enables utilization of this is called OpenCL. We asked the question how one writes efficient algorithms on these GPGPU devices. We found that there are two major ways to run […]
Apr, 18

Maximize Performance on GPUs Using the Rake-based Optimization: A Case Study

In this paper, we analyze the trade-offs encountered when minimizing the total execution time using the rake-based applications on GPUs. We use clustering data streams as a case study, and present a rake-based implementation for it, making it more efficient in terms of memory usage. In order to maximize performance for different problem sizes and […]
Apr, 17

Auto-tuning interactive ray tracing using an analytical GPU architecture model

This paper presents a method for auto-tuning interactive ray tracing on GPUs using a hardware model. Getting full performance from modern GPUs is a challenging task. Workloads which require a guaranteed performance over several runs must select parameters for the worst performance of all runs. Our method uses an analytical GPU performance model to predict […]
Apr, 16

Fast GPU-based fluid simulations using SPH

Graphical Processing Units (GPUs) are massive floating-point stream processors, and through the recent development of tools such as CUDA and OpenCL it has become possible to fully utilize them for scientific computing. We have developed an open-source CUDA-based acceleration framework for 3D Computational Fluid Dynamics (CFD) using Smoothed Particle Hydrodynamics (SPH). This paper describes the […]
Apr, 16

Heterogeneous Highly Parallel Implementation of Matrix Exponentiation Using GPU

The vision of super computer at every desk can be realized by powerful and highly parallel CPUs or GPUs or APUs. Graphics processors once specialized for the graphics applications only, are now used for the highly computational intensive general purpose applications. Very expensive GFLOPs and TFLOP performance has become very cheap with the GPGPUs. Current […]
Apr, 10

Hadoop+Aparapi: Making heterogenous MapReduce programming easier

Lately, programmers have started to take advantage of GPU capabilities of cloud-based machines. Using the GPUs can decrease the number of nodes required to perform the computation by increasing the productivity per node. We combine Hadoop, a widely-used MapReduce framework, with Aparapi, a new Java-to-OpenCL conversion tool from AMD. We propose an easy-to-use API which […]
Apr, 9

A Study of Productivity and Performance of Modern Vector Processors

This bachelor thesis carries out a case study describing the performance and productivity of modern vector processors such as graphics processing units (GPUs) and central processing units (CPUs) based on three different computational routines arising from a magnetoencephalography application. I apply different programming paradigms to these routines targeting either the CPU or the GPU. Furthermore, […]
Mar, 31

Nested Data-Parallelism on the GPU

Graphics processing units (GPUs) provide both memory bandwidth and arithmetic performance far greater than that available on CPUs, but, because of their Single-Instruction-Multiple-Data (SIMD) architecture, they are hard to program. Most of the programs ported to GPUs thus far use traditional data-level parallelism, performing only operations that operate uniformly over vectors. Porting algorithms that do […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: