Posts
Nov, 18
Accelerating convolutions on the sphere with hybrid GPU/CPU kernel splitting
We present a general method for accelerating by more than an order of magnitude the convolution of pixelated function on the sphere with a radially-symmetric kernel. Our method splits the kernel into a compact real-space, and a compact spherical harmonic space component that can then be convolved in parallel using an inexpensive commodity GPU and […]
Nov, 18
Accelerating SystemC Simulations using GPUs
Recent developments in graphics processing unit (GPU) technology has invigorated an interest in using GPUs for accelerating the simulation of SystemC models. SystemC is extensively used for design space exploration, and early performance analysis of hardware systems. SystemC’s reference implementation of the simulation kernel supports a single-threaded simulation kernel. However, modern computing platforms offer substantially […]
Nov, 18
Exploiting Coarse-grained Parallelism in B+ Tree Searches on an APU
B+ tree structured index searches are one of the fundamental database operations and hence, accelerating them is essential. GPUs provide a compelling mix of performance per watt and performance per dollar, and thus are an attractive platform for accelerating B+ tree searches. However, tree search on discrete GPUs presents significant challenges for acceleration due to […]
Nov, 16
Parallel implementations of the MinMin heterogeneous computing scheduler in GPU
This work presents parallel implementations of the MinMin scheduling heuristic for heterogeneous computing using Graphic Processing Units, in order to improve its computational efficiency. The experimental evaluation of the four proposed MinMin variants demonstrates that a significant reduction on the computing times can be attained, allowing to tackle large scheduling scenarios in reasonable execution times.
Nov, 16
Accelerating Fully Homomorphic Encryption Using GPU
As a major breakthrough, in 2009 Gentry introduced the first plausible construction of a fully homomorphic encryption (FHE) scheme. FHE allows the evaluation of arbitrary functions directly on encrypted data on untwisted servers. In 2010, Gentry and Halevi presented the first FHE implementation on an IBM x3500 server. However, this implementation remains impractical due to […]
Nov, 16
Use of CUDA for the Continuous Space Language Model
The training phase of the Continuous Space Language Model (CSLM) was implemented in the NVIDIA hardware/software architecture Compute Unified Device Architecture (CUDA). Implementation was accomplished using a combination of CUBLAS library routines and CUDA kernel calls on three different CUDA enabled devices of varying compute capability and a time savings over the traditional CPU approach […]
Nov, 16
CUDA and OpenCL Implementations of 3D CT Reconstruction for Biomedical Imaging
Biomedical image reconstruction applications with large datasets can benefit from acceleration. Graphic Processing Units(GPUs) are particularly useful in this context as they can produce high fidelity images rapidly. An image algorithm to reconstruct conebeam computed tomography(CT) using two dimensional projections is implemented using GPUs. The implementation takes slices of the target, weighs the projection data […]
Nov, 16
Computer Vision and Image Segmentation Implemented on GPU Using Compute Unified Device Architecture as Applied on Quality Inspection of Pre-etched Printed Circuit Board
Computer vision and image processing continue to expand its area of application. Traditionally, this technology was hosted by a sequential processing paradigm of a Central Processing Unit (CPU). With this implementation in mind limits the usefulness of a device that is capable of parallel processing for several years. At the same time, it has been […]
Nov, 15
Device specialization in heterogeneous multi-GPU environments
In the last few years there have been many activities towards coupling CPUs and GPUs in order to get the most from CPU-GPU heterogeneous systems. One of the main problems that prevent these systems to be exploited in a device-aware manner is the CPU-GPU communication bottleneck, which often doesn’t allow to produce code more efficient […]
Nov, 15
Resolving the conflict between generality and plausibility in verified computation
The area of proof-based verified computation (outsourced computation built atop probabilistically checkable proofs and cryptographic machinery) has lately seen renewed interest. Although recent work has made great strides in reducing the overhead of naive applications of the theory, these schemes still cannot be considered practical. The core issue is that the work for the prover […]
Nov, 15
Fast 3D Structure Localization in Medical Volumes using CUDA-enabled GPUs
Effective and fast localization of anatomical structures is a crucial first step towards automated analysis of medical volumes. In this paper, we propose an iterative approach for structure localization in medical volumes based on the adaptive bandwidth mean-shift algorithm for object detection (ABMSOD). We extend and tune the ABMSOD algorithm, originally used to detect 2D […]
Nov, 15
Accelerating the Gillespie Exact Stochastic Simulation Algorithm Using Hybrid Parallel Execution on Graphics Processing Units
The Gillespie Stochastic Simulation Algorithm (GSSA) and its variants are cornerstone techniques to simulate reaction kinetics in situations where the concentration of the reactant is too low to allow deterministic techniques such as differential equations. The inherent limitations of the GSSA include the time required for executing a single run and the need for multiple […]