Posts
Apr, 25
High Performance Computing with GPUs
A project was undertaken to improve the performance of a traditional CPU-based sequential program by modifying it for parallel execution in a GPU environment. A speedup of at least 1.5x and the preservation of the program’s accuracy and integrity were outlined as the two key goals of the project. Deal.II, a differential applications analysis library, […]
Apr, 25
Directive-based Approach to Heterogeneous Computing
The main result of my Ph.D dissertation was accULL, an implementation of the OpenACC standard. This implementation is based on two pieces of software I designed, YaCF (Yet Another Compiler Framework) and Frangollo. YaCF is basically a Python StS toolkit, heavily based on the pycparser project. It uses the C99 Frontend with some extensions to […]
Apr, 25
Faster Upper Body Pose Estimation and Recognition Using CUDA
Image processing techniques can be very time consuming when applied linearly on the Central Processing Unit (CPU). Many applications require processing to take place in real-time. The Upper Body Pose Estimation and Recognition system developed by Achmed and Connan has shown to be 88% accurate, but operates at less than real-time on the CPU. This […]
Apr, 25
3rd International Conference on High Performance Computing, HPC-UA 2013
Prospective authors are invited to submit extended abstracts, full papers or poster presentations on topics related to: HPC systems developing, benchmarking and administrating. Programming of HPC systems, scalability of algorithms and programs, heterogeneous programming with GP-GPU, FPGA, and other accelerators. HPC in GRIDs, Clouds, and distributed computing systems. Application of HPC in science and industry. […]
Apr, 23
Reduce, Reuse, Recycle (R^3): a Design Methodology for Sparse Matrix Vector Multiplication on Reconfigurable Platforms
Sparse Matrix Vector Multiplication (SpMV) is an important computational kernel in many scientific computing applications. Pipelining multiply-accumulate operations shifts SpMV from a computationally bounded kernel to an I/O bounded kernel. In this paper, we propose a design methodology and hardware architecture for SpMV that seeks to utilize system memory bandwidth as efficiently as possible, by […]
Apr, 23
Parallel Computing for Accelerated Texture Classification with Local Binary Pattern Descriptors using OpenCL
In this paper, a novel parallelized implementation of rotation invariant texture classification using Heterogeneous Computing Platforms like CPU and Graphics Processing Unit (GPU) is proposed. A complete modeling of the LBP operator as well as its improvised versions of Complete Local Binary Patterns (CLBP) and Multi-scale Local Binary Patterns (MLBP) has been developed on a […]
Apr, 23
Solving Wave Equations on Unstructured Geometries
Waves are all around us – be it in the form of sound, electromagnetic radiation, water waves, or earthquakes. Their study is an important basic tool across engineering and science disciplines. Every wave solver serving the computational study of waves meets a trade-off of two figures of merit–its computational speed and its accuracy. Discontinuous Galerkin […]
Apr, 23
GPU Scripting and Code Generation with PyCUDA
High-level scripting languages are in many ways polar opposites to GPUs. GPUs are highly parallel, subject to hardware subtleties, and designed for maximum throughput, and they offer a tremendous advance in the performance achievable for a significant number of computational problems. On the other hand, scripting languages such as Python favor ease of use over […]
Apr, 23
SW# – GPU enabled exact alignments on genome scale
Sequence alignment is one of the oldest and the most famous problems in bioinformatics. Even after 45 years, for one reason or another, this problem is still actual; current solutions are trade-offs between execution time, memory consumption and accuracy. We purpose SW#, a new CUDA GPU enabled and memory efficient implementation of dynamic programming algorithms […]
Apr, 22
GPU-based Implementation of 128-bit Secure Eta Pairing Over a Binary Field
Eta pairing on a supersingular elliptic curve over the binary field F_2_1223 used to offer 128-bit security, and has been studied extensively for efficient implementations. In this paper, we report our GPU-based implementations of this algorithm on an NVIDIA Tesla C2050 platform. We propose efficient parallel implementation strategies for multiplication, square, square root and inverse […]
Apr, 22
Automatic Parallelization of a Gap Model using Java and OpenCL
Nowadays, scientists are often disappointed by the outcome when parallelizing their simulations, in spite of all the tools at their disposal. They often invest much time and money, and do not obtain the expected speed-up. This can come from many factors going from a wrong parallel architecture choice to a model that simply does not […]
Apr, 22
On the Efficacy of GPU-Integrated MPI for Scientific Applications
Scientific computing applications are quickly adapting to leverage the massive parallelism of GPUs in large-scale clusters. However, the current hybrid programming models require application developers to explicitly manage the disjointed host and GPU memories, thus reducing both efficiency and productivity. Consequently, GPU-integrated MPI solutions, such as MPI-ACC and MVAPICH2-GPU, have been developed that provide unified […]