Posts
Jul, 15
Near-LSPA Performance at MSA Complexity
The tradeoff between error-correcting performance and numerical complexity of LDPC decoding algorithms is a well-known problem. In this paper we depict the unseen error-floor performance of the Self-Corrected Min-Sum algorithm for long length DVB-S2 codes. We developed a massively parallel simulation using GPUs which allowed a comprehensive BER characterization either in the waterfall or in […]
Jul, 14
Equilibrium and Non-Equilibrium Ising Models by Means of PCA
We propose a unified approach to reversible and irreversible PCA dynamics, and we show that in the case of 1D and 2D nearest neighbour Ising systems with periodic boundary conditions we are able to compute the stationary measure of the dynamics also when the latter is irreversible. We also show how, according to [DPSS12], the […]
Jul, 14
Benchmarking Intel Xeon Phi to Guide Kernel Design
With a minimum of 50 cores, Intel’s Xeon Phi is a true many-core architecture. Featuring fairly powerful cores, two levels of caches, and a very fast interconnection, the Xeon Phi is able to achieve theoretical peak of 1000 GFLOPs and over 240 GB/s. These numbers, as well as its flexibility – it can be used […]
Jul, 13
The CUDA Handbook: A Comprehensive Guide to GPU Programming
The CUDA Handbook begins where CUDA by Example (Addison-Wesley, 2011) leaves off, discussing CUDA hardware and software in greater detail and covering both CUDA 5.0 and Kepler. Every CUDA developer, from the casual to the most sophisticated, will find something here of interest and immediate usefulness. Newer CUDA developers will see how the hardware processes […]
Jul, 13
Identifying the Key Features of Intel Xeon Phi: A Comparative Approach
With the increasing diversity of many-core processors, it becomes more and more difficult to guarantee performance portability with a unified programming model. The main reason lies in the architecture disparities, e.g., CPUs and GPUs have different architectural features from each other, which leads to the differences in performance optimization techniques. Thus, it is of great […]
Jul, 13
Optimized MFCC Feature Extraction on GPU
In this paper, we update our previous research for Mel-Frequency Cepstral Coefficient (MFCC) feature extraction [1] and describe the optimizations required for improving throughput on the Graphics Processing Units (GPU). We not only demonstrate that the feature extraction process is suitable for GPUs and a substantial reduction in computation time can be obtained by performing […]
Jul, 13
GPU Simulation of Radiation in Matter
Parallel programming on GPUs is introduced in the context of simulating collision energy loss and bremsstrahlung for charged particles propagating in matter. The employed Monte Carlo methods and the involved physics are presented, followed by an introduction to the concepts of GPU parallel programming for the Nvidia CUDA architecture. The simulations implemented in C++ and […]
Jul, 13
An acceleration of the algorithm for the nurse rerostering problem on a graphics processing unit
This paper deals with the Nurse Rerostering Problem (NRRP) performed by a parallel algorithm on a Graphics Processing Unit (GPU). This problem is focused on rescheduling of human resources in healthcare, when a roster is disrupted by unexpected circumstances. Our aim is to resolve NRRP in a parallel way to shorten the needed computational time […]
Jul, 12
Parallel Graph Processing on Graphics Processors Made Easy
This paper demonstrates Medusa, a programming framework for parallel graph processing on graphics processors (GPUs). Medusa enables developers to leverage the massive parallelism and other hardware features of GPUs by writing sequential C/C++ code for a small set of APIs. This simplifies the implementation of parallel graph processing on the GPU. The runtime system of […]
Jul, 12
OmniDB: Towards Portable and Efficient Query Processing on Parallel CPU/GPU Architectures
Driven by the rapid hardware development of parallel CPU/GPU architectures, we have witnessed emerging relational query processing techniques and implementations on those parallel architectures. However, most of those implementations are not portable across different architectures, because they are usually developed from scratch and target at a specific architecture. This paper proposes a kernel-adapter based design […]
Jul, 12
Fast PCA-BAsed Face Recognition on GPUs
Face recognition is very important in many applications including surveillance, biometrics, and other domains. Fast face recognition is required if she wants to train or test more images or to increase the resolution of an input image for better accuracy in the recognition. Meanwhile, Graphics Processing Units (GPUs) have become widely available, offering the opportunity […]
Jul, 12
Hidden Surface Removal Using BSP Tree with CUDA
Binary Space Partitioning (BSP) Tree can be used for hidden surface removal. In order to hide invisible surfaces, all surfaces are sorted back to front or front to back order. Traversal of BSP Trees for back to front order of faces requires calculation for all BSP Tree nodes, which can be made in parallel manner. […]