Posts
Oct, 6
Divergence Analysis and Optimizations
The growing interest in GPU programming has brought renewed attention to the Single Instruction Multiple Data (SIMD) execution model. SIMD machines give application developers a tremendous computational power; however, the model also brings restrictions. In particular, processing elements (PEs) execute in lock-step, and may lose performance due to divergences caused by conditional branches. In face […]
Oct, 6
CUDA performance analyzer
GPGPU Computing using CUDA is rapidly gaining ground today. GPGPU has been brought to the masses through the ease of use of CUDA and ubiquity of graphics cards supporting the same. Although CUDA has a low learning curve for programmers familiar with standard programming languages like C, extracting optimum performance from it, through optimizations and […]
Oct, 6
Offloading Java to Graphics Processors
Massively-parallel graphics processors have the potential to offer high performance at low cost. However, at present such devices are largely inaccessible from higher-level languages such as Java. This work allows compilation from Java bytecode by making use of annotations to specify loops for parallel execution. Data copying to and from the GPU is handled automatically. […]
Oct, 6
Parallelisation of Java for Graphics Processors
The aim of the project was to allow extraction and compilation of Java virtual machine bytecode for parallel execution on graphics cards, specifically the NVIDIA CUDA framework, by both explicit and automatic means. The compiler, which was produced, successfully extracts and compiles code from class files into CUDA C++ code, and outputs transformed classes that […]
Oct, 6
Multi-core programming with OpenCL: performance and portability: OpenCL in a memory bound scenario
With the advent of multi-core processors desktop computers have become multiprocessors requiring parallel programming to be utilized efficiently. Efficient and portable parallel programming of future multi-core processors and GPUs is one of today’s most important challenges within computer science. Okuda Laboratory at The University of Tokyo in Japan focuses on solving engineering challenges with parallel […]
Oct, 6
Python for Development of OpenMP and CUDA Kernels for Multidimensional Data
Design of data structures for high performance computing (HPC) is one of the principal challenges facing researchers looking to utilize heterogeneous computing machinery. Heterogeneous systems derive cost, power, and speed efficiency by being composed of the appropriate hardware for the task. Yet, each type of processor requires a specific organization of the application state in […]
Oct, 6
Accelerating a climate physics model with OpenCL
Open Computing Language (OpenCL) is fast becoming the standard for heterogeneous parallel computing. It is designed to run on CPUs, GPUs, and other accelerator architectures. By implementing a real world application, a solar radiation model component widely used in climate and weather models, we show that OpenCL multi-threaded programming and execution model can dramatically increase […]
Oct, 6
Static GPU threads and an improved scan algorithm
Current GPU programming systems automatically distribute the work on all GPU processors based on a set of fixed assumptions, e.g. that all tasks are independent from each other. We show that automatic distribution limits algorithmic design, and demonstrate that manual work distribution hardly adds any overhead. Our Scan+algorithm is an improved scan relying on manual […]
Oct, 6
GPU-based single-cluster algorithm for the simulation of the Ising model
We present the GPU calculation with the common unified device architecture (CUDA) for the Wolff single-cluster algorithm of the Ising model. Proposing an algorithm for a quasi-block synchronization, we realize the Wolff single-cluster Monte Carlo simulation with CUDA. We perform parallel computations for the newly added spins in the growing cluster. As a result, the […]
Oct, 6
Connected-component identification and cluster update on graphics processing units
Cluster identification tasks occur in a multitude of contexts in physics and engineering such as, for instance, cluster algorithms for simulating spin models, percolation simulations, segmentation problems in image processing, or network analysis. While it has been shown that graphics processing units (GPUs) can result in speedups of two to three orders of magnitude as […]
Oct, 5
Profiling Heterogeneous Multi-GPU Systems to Accelerate Cortically Inspired Learning Algorithms
Recent advances in neuroscientific understanding make parallel computing devices modeled after the human neocortex a plausible, attractive, fault-tolerant, and energye-fficient possibility. Such attributes have once again sparked an interest in creating learning algorithms that aspire to reverseengineer many of the abilities of the brain. In this paper we describe a GPGPU-accelerated extension to an intelligent […]
Oct, 5
Democratic Population Decisions Result in Robust Policy-Gradient Learning: A Parametric Study with GPU Simulations
High performance computing on the Graphics Processing Unit (GPU) is an emerging field driven by the promise of high computational power at a low cost. However, GPU programming is a non-trivial task and moreover architectural limitations raise the question of whether investing effort in this direction may be worthwhile. In this work, we use GPU […]

