Posts
Mar, 19
Spatial Join with R-Tree on Graphics Processing Units
Spatial operations such as spatial join combine two objects on spatial predicates. It is different from relational join because objects have multi dimensions and spatial join consumes large execution time. Recently, many researches tried to find methods to improve the execution time. Parallel spatial join is one method to improve the execution time. Comparison between […]
Mar, 19
Analysis of the Performance of the Fish School Search Algorithm Running in Graphic Processing Units
Fish School Search (FSS) is a computational intelligence technique invented by Bastos-Filho and Lima-Neto in 2007 and first presented in Bastos-Filho et al. (2008). FSS was conceived to solve search problems and it is based on the social behavior of schools of fish. In the FSS algorithm, the search space is bounded and each possible […]
Mar, 19
GPU Enhanced Simulation of Angiogenesis
In the paper we present the use of graphic processor units to accelerate the most time-consuming stages of a simulation of angiogenesis and tumor growth. By the use of advanced CUDA mechanisms such as shared memory, textures and atomic operations, we managed to speed up the CUDA kernels by a factor of 57x. However, in […]
Mar, 19
Parallelization of Particle Filter Algorithms
This paper presents the parallelization of the particle filter algorithm in a single target video tracking application. In this document we demonstrate the process by which we parallelized the particle filter algorithm, beginning with a MATLAB implementation. The final CUDA program provided approximately 71x speedup over the initial MATLAB implementation.
Mar, 19
Computational Intelligence on Consumer Games and Graphics Hardware CIGPU-2012
The fifth International workshop and tutorial on Computational Intelligence on Consumer Games and Graphics Hardware (CIGPU 2012) will be held as a hybrid special session of the IEEE WCCI 2012 conference in Brisbane, 10-15 June 2012. WCCI 2012, the IEEE world congress on computational intelligence, joins together three international conferences: IJCNN 2012, FUZZ-IEEE 2012 and […]
Mar, 18
Improving Cache Locality for Ray Casting with CUDA
In this paper, we present an acceleration method for texture-based ray casting on the compute unified device architecture (CUDA) compatible graphics processing unit (GPU). Since ray casting is a memory-intensive application, our method increases the hit rate of the texture cache during rendering. To achieve this, our method dynamically selects the width and height of […]
Mar, 18
Towards user transparent parallel multimedia computing on GPU-clusters
The research area of Multimedia Content Analysis (MMCA) considers all aspects of the automated extraction of knowledge from multimedia archives and data streams. To satisfy the increasing computational demands of MMCA problems, the use of High Performance Computing (HPC) techniques is essential. As most MMCA researchers are not HPC experts, there is an urgent need […]
Mar, 18
Enabling Fast, Noncontiguous GPU Data Movement in Hybrid MPI+GPU Environments
Lack of efficient and transparent interaction with GPU data in hybrid MPI GPU environments challenges GPU acceleration of largescale scientific and engineering computations. A particular challenge is the efficient transfer of noncontiguous data to and from GPU memory. MPI supports such transfers through the use of datatypes, however an efficient means of utilizing datatypes for […]
Mar, 18
Usable assembly language for GPUs: a success story
The NVIDIA compilers nvcc and ptxas leave the programmer with only very limited control over register allocation, register spills, instruction selection, and instruction scheduling. In theory a programmer can gain control by writing an entire kernel in van der Laan’s cudasm assembly language, but this requires tedious, error-prone tracking of register assignments. This paper introduces […]
Mar, 18
VOCL: An Optimized Environment for Transparent Virtualization of Graphics Processing Units
Graphics processing units (GPUs) have been widely used for general purpose computation acceleration. However, current programming models such as CUDA and OpenCL can support GPUs only on the local computing node, where the application execution is tightly coupled to the physical GPU hardware. In this work, we propose a virtual OpenCL (VOCL) framework to support […]
Mar, 16
Globally scheduled real-time multiprocessor systems with GPUs
Graphics processing units, GPUs, are powerful processors that can offer significant performance advantages over traditional CPUs. The last decade has seen rapid advancement in GPU computational power and generality. Recent technologies make it possible to use GPUs as co-processors to CPUs. The performance advantages of GPUs can be great, often outperforming traditional CPUs by orders […]
Mar, 16
CUDA 2D Stencil Computations for the Jacobi Method
We are witnessing the consolidation of the GPUs streaming paradigm in parallel computing. This paper explores stencil operations in CUDA to optimize on GPUs the Jacobi method for solving Laplace’s differential equation. The code keeps constant the access pattern through a large number of loop iterations, that way being representative of a wide set of […]