Posts
Jan, 17
Solving dense linear systems on platforms with multiple hardware accelerators
In a previous PPoPP paper we showed how the FLAME methodology, combined with the SuperMatrix runtime system, yields a simple yet powerful solution for programming dense linear algebra operations on multicore platforms. In this paper we provide further evidence that this approach solves the programmability problem for this domain by targeting a more complex architecture, […]
Jan, 17
CheCUDA: A Checkpoint/Restart Tool for CUDA Applications
In this paper, a tool named CheCUDA is designed to checkpoint CUDA applications that use GPUs as accelerators. As existing checkpoint/restart implementations do not support checkpointing the GPU status, CheCUDA hooks a part of basic CUDA driver API calls in order to record the status changes on the main memory. At checkpointing, CheCUDA stores the […]
Jan, 17
Evaluating GPUs for network packet signature matching
Modern network devices employ deep packet inspection to enable sophisticated services such as intrusion detection, traffic shaping, and load balancing. At the heart of such services is a signature matching engine that must match packet payloads to multiple signatures at line rates. However, the recent transition to complex regular-expression based signatures coupled with ever-increasing network […]
Jan, 17
Acceleration of Acoustic Emission Signal Processing Algorithms using CUDA Standard
Offline processing of acoustic emission (AE) signal waveforms recorded during a long-term AE monitoring session is a challenging problem in AE testing area. This is due to the fact that today’s AE systems can work with up to hundreds of channels and are able to process tens of thousands of AE events per second. The […]
Jan, 17
Real-Time Non-rigid Registration of Medical Images on a Cooperative Parallel Architecture
Unacceptable execution time of Non-rigid registration (NRR) often presents a major obstacle to its routine clinical use. Parallel computing is an effective way to accelerate NRR. However, development of efficient parallel NRR codes is a very challenging task. One desirable approach is to map the existing sequential algorithm to the parallel architecture to gain speedup […]
Jan, 17
Parallelization Strategies for Ant Colony Optimisation on GPUs
Ant Colony Optimisation (ACO) is an effective population-based meta-heuristic for the solution of a wide variety of problems. As a population-based algorithm, its computation is intrinsically massively parallel, and it is there- fore theoretically well-suited for implementation on Graphics Processing Units (GPUs). The ACO algorithm comprises two main stages: Tour construction and Pheromone update. The […]
Jan, 16
Interactive visual analysis of contrast-enhanced ultrasound data based on local neighborhood statistics
Contrast-enhanced ultrasound (CEUS) has recently become an important technology for lesion detection and characterization in cancer diagnosis. CEUS is used to investigate the perfusion kinetics in tissue over time, which relates to tissue vascularization. In this paper we present a pipeline that enables interactive visual exploration and semi-automatic segmentation and classification of CEUS data. For […]
Jan, 16
An OpenCL framework for heterogeneous multicores with local memory
In this paper, we present the design and implementation of an Open Computing Language (OpenCL) framework that targets heterogeneous accelerator multicore architectures with local memory. The architecture consists of a general-purpose processor core and multiple accelerator cores that typically do not have any cache. Each accelerator core, instead, has a small internal local memory. Our […]
Jan, 16
Using generalized ensemble simulations and Markov state models to identify conformational states
Part of understanding a molecule’s conformational dynamics is mapping out the dominant metastable, or long lived, states that it occupies. Once identified, the rates for transitioning between these states may then be determined in order to create a complete model of the system’s conformational dynamics. Here we describe the use of the MSMBuilder package (now […]
Jan, 16
MPI-CUDA parallelization of a finite-strip program for geometric nonlinear analysis: A hybrid approach
A finite-strip geometric nonlinear analysis is presented for elastic problems involving folded-plate structures. Compared with the standard finite-element method, its main advantages are in data preparation, program complexity, and execution time. The finite-strip method, which satisfies the von Karman plate equations in the nonlinear elastic range, leads to the coupling of all harmonics. However, coupling […]
Jan, 16
A symbolic verifier for CUDA programs
We present a preliminary automated verifier based on mechanical decision procedures which is able to prove functional correctness of CUDA programs and guarantee to detect bugs such as race conditions. We also employ a symbolic partial order reduction (POR) technique to mitigate the interleaving explosion problem.
Jan, 16
Daubechies wavelets for high performance electronic structure calculations: The BigDFT project
In this contribution we will describe in detail a Density Functional Theory method based on a Daubechies wavelets basis set, named BigDFT. We will see that, thanks to wavelet properties, this code shows high systematic convergence properties, very good performances and an excellent efficiency for parallel calculations. BigDFT code operation are also well-suited for GPU […]