Posts
Mar, 27
Adaptive Row-grouped CSR Format for Storing of Sparse Matrices on GPU
We present new adaptive format for storing sparse matrices on GPU. We compare it with several other formats including CUSPARSE which is today probably the best choice for processing of sparse matrices on GPU in CUDA. Contrary to CUSPARSE which works with common CSR format, our new format requires conversion. However, multiplication of sparse-matrix and […]
Mar, 26
OpenMPC: Extended OpenMP for Efficient Programming and Tuning on GPUs
General-Purpose Graphics Processing Units (GPGPUs) provide inexpensive, high performance platforms for compute-intensive applications. However, their programming complexity poses a significant challenge to developers. Even though the CUDA (Compute Unified Device Architecture) programming model offers better abstraction, developing efficient GPGPU code is still complex and error-prone. This paper proposes a directive-based, high-level programming model, called OpenMPC, […]
Mar, 26
Massively Parallel Localization of Pulsed Signal Transitions Using a GPU
Computer clock speeds which had been increasing tremendously over years is now slowing down and has reached its limit of saturation. In order to overcome this saturation of the clock speed, aggressively pursuing optimizations techniques are being developed to get more work done in each clock cycle in favor of parallel computing and concurrent programming. […]
Mar, 26
A Parallel Access Method for Spatial Data Using GPU
Spatial access methods (SAMs) are used for information retrieval in large spatial databases. Many of the SAMs use sequential tree structures to search the result set of the spatial data which are contained in the given query region. In order to improve performance for the SAM, this paper proposes a parallel method using GPU. Since […]
Mar, 26
GPUstore: Harnessing GPU Computing for Storage Systems in the OS Kernel
Many storage systems include computationally expensive components. Examples include encryption for confidentiality, checksums for integrity, and error correcting codes for reliability. As storage systems become larger, faster, and serve more clients, the demands placed on their computational components increase and they can become performance bottlenecks. Many of these computational tasks are inherently parallel: they can […]
Mar, 26
Full system simulation of many-core heterogeneous SoCs using GPU and QEMU semihosting
Modern system-on-chips are evolving towards complex and heterogeneous platforms with general purpose processors coupled with massively parallel manycore accelerator fabrics (e.g. embedded GPUs). Platform developers are looking for efficient full-system simulators capable of simulating complex applications, middleware and operating systems on these heterogeneous targets. Unfortunately current virtual platforms are not able to tackle the complexity […]
Mar, 23
Volume-preserving FFD for programmable graphics hardware
Free-Form Deformation (FFD) is a well established technique for deforming arbitrary object shapes in space. Although more recent deformation techniques have been introduced, among them skeleton-based deformation and cage-based deformation, the simple and versatile nature of FFD is a strong advantage, and justifies its presence in nowadays leading commercial geometric modeling and animation software systems. […]
Mar, 23
Accelerating large-scale simulations of cortical neuronal network development
Cultured dissociated cortical cells grown into networks on multi-electrode arrays are used to investigate neuronal network development, activity, plasticity, response to stimuli, the effects of pharmacological agents, etc. We made a computational model of such a neuronal network and studied the interplay of individual neuron activity, cell culture development, and network behavior. For small networks […]
Mar, 23
CUDA implementation of Wagener’s 2D convex hull PRAM algorithm
This paper describes a CUDA implementation of Wagener’s PRAM convex hull algorithm in two dimensions. It is presented in Knuth’s literate programming style.
Mar, 23
Advanced Programming Platform for efficient use of Data Parallel Hardware
Graphics processing units (GPU) had evolved from a specialized hardware capable to render high quality graphics in games to a commodity hardware for effective processing blocks of data in a parallel schema. This evolution is particularly interesting for scientific groups, which traditionally use mainly CPU as a work horse, and now can profit of the […]
Mar, 23
A Co-Prime Blur Scheme for Data Security in Video Surveillance
This paper presents a novel Coprime Blurred Pair (CBP) model for visual data-hiding for security in camera surveillance. While most previous approaches have focused on completely encrypting the video stream, we introduce a spatial encryption scheme by blurring the image/video contents to create a CBP. Our goal is to obscure detail in public video streams […]
Mar, 22
Impact of asynchronism on GPU accelerated parallel iterative computations
We study the impact of asynchronism on parallel iterative algorithms in the particular context of local clusters of workstations including GPUs. The application test is a classical PDE problem of advection-diffusion-reaction in 3D. We propose an asynchronous version of a previously developed PDE solver using GPUs for the inner computations. The algorithm is tested with […]