Posts
Dec, 19
Speed Records for NTRU
In this paper NTRUEncrypt is implemented for the first time on a GPU using the CUDA platform. As is shown, this operation lends itself perfectly for parallelization and performs extremely well compared to similar security levels for ECC and RSA giving speedups of around three to five orders of magnitude. The focus is on achieving […]
Dec, 18
Accelerating S3D: A GPGPU Case Study
The graphics processor (GPU) has evolved into an appealing choice for high performance computing due to its superior memory bandwidth, raw processing power, and flexible programmability. As such, GPUs represent an excellent platform for accelerating scientific applications. This paper explores a methodology for identifying applications which present significant potential for acceleration. In particular, this work […]
Dec, 18
Accelerating Regularized Iterative CT Reconstruction on Commodity Graphics Hardware (GPU)
Iterative reconstruction algorithms augmented with regularization can produce high-quality reconstructions from few views and even in the presence of significant noise. In this paper we focus on the particularities associated with the GPU acceleration of these. First, we introduce the idea of using exhaustive benchmark tests to determine the optimal settings of various parameters in […]
Dec, 18
Long time-scale simulations of in vivo diffusion using GPU hardware
To address the problem of performing long time simulations of biochemical pathways under in vivo cellular conditions, we have developed a lattice-based, reaction-diffusion model that uses the graphics processing unit (GPU) as a computational co-processor. The method has been specifically designed from the beginning to take advantage of the GPU’s capacity to perform massively parallel […]
Dec, 18
Large-scale FFT on GPU clusters
A GPU cluster is a cluster equipped with GPU devices. Excellent acceleration is achievable for computation-intensive tasks (e. g. matrix multiplication and LINPACK) and bandwidth-intensive tasks with data locality (e. g. finite-difference simulation). Bandwidth-intensive tasks such as large-scale FFTs without data locality are harder to accelerate, as the bottleneck often lies with the PCI between […]
Dec, 18
Shader Performance Analysis on a Modern GPU Architecture
This paper presents an analysis of the performance of the shader processing units in a modern graphics processor unit (GPU) architecture using real graphic applications. The architecture of a modern GPU is described and a simulator and associated framework used to evaluate the architecture is introduced. The paper analyses the effects in performance of different […]
Dec, 18
GPU clusters for high-performance computing
Large-scale GPU clusters are gaining popularity in the scientific computing community. However, their deployment and production use are associated with a number of new challenges. In this paper, we present our efforts to address some of the challenges with building and running GPU clusters in HPC environments. We touch upon such issues as balanced cluster […]
Dec, 18
Accelerating Template-Based Matching on the GPU for AR Applications
Recently researchers have shown that it is possible to use GPU hardware for image processing and computer vision algorithms. We have been exploring how to use GPU hardware to improve marker-based tracking for AR Applications. In this paper we describe our findings and explored issues in the context of a standard fiducial tracking pipeline. We […]
Dec, 18
Accelerating SQL Database Operations on a GPU with CUDA
Prior work has shown dramatic acceleration for various database operations on GPUs, but only using primitives that are not part of conventional database languages such as SQL. This paper implements a subset of the SQLite command processor directly on the GPU. This dramatically reduces the effort required to achieve GPU acceleration by avoiding the need […]
Dec, 18
Efficient, High-Quality Bayer Demosaic Filtering on GPUs
This paper describes a series of optimizations for implementing the high-quality Malvar-He-Cutler Bayer demosaicing filter on a GPU in OpenGL. Applying this filter is the first step in most video-processing pipelines but is generally considered too slow for real time on a CPU. The optimized implementation contains 66% fewer ALU operations than a direct GPU […]
Dec, 18
GPU-based Island Model for Evolutionary Algorithms
The island model for evolutionary algorithms allows to delay the global convergence of the evolution process and encourage diversity. However, solving large size and time-intensive combinatorial optimization problems with the island model requires a large amount of computational resources. GPU computing is recently revealed as a powerful way to harness these resources. In this paper, […]
Dec, 18
Accelerating K-Means on the Graphics Processor via CUDA
In this paper an optimized k-means implementation on the graphics processing unit (GPU) is presented. NVIDIApsilas compute unified device architecture (CUDA), available from the G80 GPU family onwards, is used as the programming environment. Emphasis is placed on optimizations directly targeted at this architecture to best exploit the computational capabilities available. Additionally drawbacks and limitations […]