Posts
Oct, 7
Numerical integration on GPUs for higher order finite elements
The paper considers the problem of implementation on graphics processors of numerical integration routines for higher order finite element approximations. The design of suitable GPU kernels is investigated in the context of general purpose integration procedures, as well as particular example applications. The most important characteristic of the problem investigated is the large variation of […]
Oct, 5
Measurements of performance of hardware and general purpose classical molecular dynamics simulation software
This note presents different measurements of hardware and software performance in classical molecular dynamics (CMD) simulations from 2001 through 2010 obtained from published literature and the internet. Opinion articles by CMD researchers point out that tools developed during that decade to set-up CMD simulations barely increased human productivity. Massively parallel hardware and CMD software running […]
Oct, 5
Speculative Execution of Parallel Programs with Precise Exception Semantics on GPUs
General purpose computing on GPUs (GPGPU) can enable significant performance and energy improvements for certain classes of applications. However, current GPGPU programming models, such as CUDA and OpenCL, are only accessible by systems experts through low-level C/C++ APIs. In contrast, large numbers of programmers use high-level languages, such as Java, due to their productivity advantages […]
Oct, 5
Parametric GPU Code Generation for Affine Loop Programs
Partitioning a parallel computation into finitely sized chunks for effective mapping onto a parallel machine is a critical concern for source-to-source compilation. In the context of OpenCL and CUDA, this translates to the definition of a uniform hyper-rectangular partitioning of the parallel execution space where each partition is subject to a fine-grained distribution of resources […]
Oct, 5
Clock Math – A System for Solving SLEs Exactly
In this paper, we present a GPU-accelerated hybrid system that solves ill-conditioned systems of linear equations exactly. Exactly means without rounding errors due to using integer arithmetics. First, we scale floating-point numbers up to integers, then we solve dozens of SLEs within different modular arithmetics and then we assemble sub-solutions back using the Chinese remainder […]
Oct, 5
GPU Based Generation and Real-Time Rendering of Semi-Procedural Terrain Using Features
Generation and real-time rendering of terrain is a complex and multifaceted problem. Besides the obvious trade-offs between performance and quality, many different generation and rendering solutions exist. Different choices in implementation will result in very different visuals, usability and tools for generation. In this thesis, a fast and intuitive terrain generation method based on sketching […]
Oct, 4
Performance Portability Evaluation for OpenACC on Intel Knights Corner and Nvidia Kepler
OpenACC is a programming standard designed to simplify heterogeneous parallel programming by using directives. Since OpenACC can generate OpenCL and CUDA code, meanwhile running OpenCL on Intel Knight Corner is supported by CAPS HMPP compiler, it is attractive to using OpenACC on hardwares with different underlying microarchitectures. This paper studies how realistic it is to […]
Oct, 4
Facial Expression Recognition – Review
Expression recognition (happy, sad, disgust, surprise, angry, fear expressions) is application of advanced object detection, pattern recognition and classification task. Facial expression recognition techniques detecting emotion of people’ using their facial expressions. This has found applications in technical fields such as Human-computer-Interaction (HCI) and security monitoring. It generally requires fast processing and decision making. Therefore, […]
Oct, 4
Parallel Computing Using GPU for Efficient Traffic Simulation
Parallel Computing can be made possible using the multiple cores of the Graphics Processing Unit (GPU) thanks to the modern programmable GPU models. This allows the use of parallel computing techniques to improve upon the computation time of large scale traffic simulations. This paper proposes the use of a multi-processor algorithm for creating efficient traffic […]
Oct, 4
Advanced Optimization Techniques for Sparse Grids on Modern Heterogeneous Systems
GPU based heterogeneous systems provide a peak performance in the order of TFlop/s and an advantageous ratio between performance and energy consumption. However, reaching high performance on GPUs is often a difficult task. This thesis proposes advanced optimization techniques that allow for efficiently porting a set of sparse grid algorithms to GPUs. The performance obtained […]
Oct, 4
Cudagrind: A Valgrind Extension for CUDA
Valgrind, and specifically the included tool Memcheck, offers an easy and reliable way for checking the correctness of memory operations in programs. This works in an unintrusive way where Valgrind translates the program into intermediate code and executes it on an emulated CPU. The heavy weight tool Memcheck uses this to keep a full shadow […]
Oct, 4
3D Non-Local Means denoising via multi-GPU
Non-Local Means (NLM) algorithm is widely considered as a state-of-the-art denoising filter in many research fields. High computational complexity led to implementations on Graphic Processor Unit (GPU) architectures, which achieve reasonable running times by filtering, slice-by-slice, 3D datasets with a 2D NLM approach. Here we present a fully 3D NLM implementation on a multi-GPU architecture […]