Posts
Jun, 11
Range query processing in a multi-GPU environment
Similarity search has been widely studied in the last years, as it can be applied to several fields such as searching by content in multimedia objects, text retrieval or computational biology. These applications usually work on very large databases that are often indexed off-line to enable the acceleration of online searches. However, to maintain an […]
Jun, 11
CUDAICA: GPU optimization of Infomax-ICA EEG analysis
In recent years Independent Component Analysis (ICA) has become a standard to identify relevant dimensions of the data in neuroscience. ICA is a very reliable method to analyze data but it is, computationally, very costly. The use of ICA for on-line analysis of the data, used in brain computing interfaces, results almost completely prohibitive. We […]
Jun, 11
Solving the Ghost-Gluon System of Yang-Mills Theory on GPUs
We solve the ghost-gluon system of Yang-Mills theory using Graphics Processing Units (GPUs). Working in Landau gauge, we use the Dyson-Schwinger formalism for the mathematical description as this approach is well-suited to directly benefit from the computing power of the GPUs. With the help of a Chebyshev expansion for the dressing functions and a subsequent […]
Jun, 10
Using the GPGPU for Scaling Up Mining Software Repositories
The Mining Software Repositories (MSR) field integrates and analyzes data stored in repositories such as source control and bug repositories to support practitioners. Given the abundance of repository data, scaling up MSR analyses has become a major challenge. Recently, researchers have experimented with conventional techniques like a super-computer or cloud computing, but these are either […]
Jun, 10
Point to point processing of digital images using parallel computing
This paper presents an approach the point to point processing of digital images using parallel computing, particularly for grayscale, brightening, darkening, thresholding and contrast change. The point to point technique applies a transformation to each pixel on image concurrently rather than sequentially. This approach used CUDA as parallel programming tool on a GPU in order […]
Jun, 10
CUDA Kernel Design for GPU-Based Beam Dynamics Simulations
Efficient implementation of general purpose particle tracking on GPUs can result in significant performance benefits to large scale particle tracking and tracking-based accelerator optimization simulations. We present our work on accelerating Argonne National Lab’s accelerator simulation code ELEGANT [1, 2] using CUDA-enabled GPUs [3]. In particular, we provide an overview of beamline elements ported to […]
Jun, 10
S-buffer: Sparsity-aware Multi-fragment Rendering
This work introduces S-buffer, an efficient and memory-friendly gpu-accelerated A-buffer architecture for multi-fragment rendering. Memory is organized into variable contiguous regions for each pixel, thus avoiding limitations set in linked-lists and fixed-array techniques. S-buffer exploits fragment distribution for precise allocation of the needed storage and pixel sparsity (empty pixel ratio) for computing the memory offsets […]
Jun, 10
Measuring the Impact of Configuration Parameters in CUDA Through Benchmarking
The threadblock size and shape choice is one of the most important user decisions when a parallel problem is coded to run in GPU architectures. In fact, threadblock configuration has a significant impact on the global performance of the program. Unfortunately, the programmer has not enough information about the subtle interactions between this choice of […]
Jun, 9
Scaling Fast Multipole Methods up to 4000 GPUs
The Fast Multipole Method (FMM) is a hierarchical N-body algorithm with linear complexity, high arithmetic intensity, high data locality, has hierarchical communication patterns, and no global synchronization. The combination of these features allows the FMM to scale well on large GPU based systems, and to use their compute capability effectively. We present a 1 PFlop/s […]
Jun, 9
Fast Morphological Image Processing Open-Source Extensions for GPU processing with CUDA
GPU architectures offer a significant opportunity for faster morphological image processing, and the NVIDIA CUDA architecture offers a relatively inexpensive and powerful framework for performing these operations. However, the generic morphological erosion and dilation operation in the CUDA NPP library is relatively naive, and performance scales expensively with increasing structuring element size. The objective of […]
Jun, 9
Autotuning Stencil-Based Computations on GPUs
Finite-difference, stencil-based discretization approaches are widely used in the solution of partial differential equations describing physical phenomena. Newton-Krylov iterative methods commonly used in stencil-based solutions generate matrices that exhibit diagonal sparsity patterns. To exploit these structures on modern GPUs, we extend the standard diagonal sparse matrix representation and define new matrix and vector data types […]
Jun, 9
Encapsulated synchronization and load-balance in heterogeneous programming
Programming models and techniques to exploit parallelism in accelerators, such as GPUs, are different from those used in traditional parallel models for shared- or distributed-memory systems. It is a challenge to blend different programming models to coordinate and exploit devices with very different characteristics and computation powers. This paper presents a new extensible framework model […]