5902

Posts

Oct, 6

Parallelisation of Java for Graphics Processors

The aim of the project was to allow extraction and compilation of Java virtual machine bytecode for parallel execution on graphics cards, specifically the NVIDIA CUDA framework, by both explicit and automatic means. The compiler, which was produced, successfully extracts and compiles code from class files into CUDA C++ code, and outputs transformed classes that […]
Oct, 6

Multi-core programming with OpenCL: performance and portability: OpenCL in a memory bound scenario

With the advent of multi-core processors desktop computers have become multiprocessors requiring parallel programming to be utilized efficiently. Efficient and portable parallel programming of future multi-core processors and GPUs is one of today’s most important challenges within computer science. Okuda Laboratory at The University of Tokyo in Japan focuses on solving engineering challenges with parallel […]
Oct, 6

Python for Development of OpenMP and CUDA Kernels for Multidimensional Data

Design of data structures for high performance computing (HPC) is one of the principal challenges facing researchers looking to utilize heterogeneous computing machinery. Heterogeneous systems derive cost, power, and speed efficiency by being composed of the appropriate hardware for the task. Yet, each type of processor requires a specific organization of the application state in […]
Oct, 6

Accelerating a climate physics model with OpenCL

Open Computing Language (OpenCL) is fast becoming the standard for heterogeneous parallel computing. It is designed to run on CPUs, GPUs, and other accelerator architectures. By implementing a real world application, a solar radiation model component widely used in climate and weather models, we show that OpenCL multi-threaded programming and execution model can dramatically increase […]
Oct, 6

Static GPU threads and an improved scan algorithm

Current GPU programming systems automatically distribute the work on all GPU processors based on a set of fixed assumptions, e.g. that all tasks are independent from each other. We show that automatic distribution limits algorithmic design, and demonstrate that manual work distribution hardly adds any overhead. Our Scan+algorithm is an improved scan relying on manual […]
Oct, 6

GPU-based single-cluster algorithm for the simulation of the Ising model

We present the GPU calculation with the common unified device architecture (CUDA) for the Wolff single-cluster algorithm of the Ising model. Proposing an algorithm for a quasi-block synchronization, we realize the Wolff single-cluster Monte Carlo simulation with CUDA. We perform parallel computations for the newly added spins in the growing cluster. As a result, the […]
Oct, 6

Connected-component identification and cluster update on graphics processing units

Cluster identification tasks occur in a multitude of contexts in physics and engineering such as, for instance, cluster algorithms for simulating spin models, percolation simulations, segmentation problems in image processing, or network analysis. While it has been shown that graphics processing units (GPUs) can result in speedups of two to three orders of magnitude as […]
Oct, 5

Profiling Heterogeneous Multi-GPU Systems to Accelerate Cortically Inspired Learning Algorithms

Recent advances in neuroscientific understanding make parallel computing devices modeled after the human neocortex a plausible, attractive, fault-tolerant, and energye-fficient possibility. Such attributes have once again sparked an interest in creating learning algorithms that aspire to reverseengineer many of the abilities of the brain. In this paper we describe a GPGPU-accelerated extension to an intelligent […]
Oct, 5

Democratic Population Decisions Result in Robust Policy-Gradient Learning: A Parametric Study with GPU Simulations

High performance computing on the Graphics Processing Unit (GPU) is an emerging field driven by the promise of high computational power at a low cost. However, GPU programming is a non-trivial task and moreover architectural limitations raise the question of whether investing effort in this direction may be worthwhile. In this work, we use GPU […]
Oct, 5

Performance Analysis and Optimisation of the OP2 Framework on Many-core Architectures

This paper presents a benchmarking, performance analysis and optimisation study of the OP2 "active" library, which provides an abstraction framework for the parallel execution of unstructured mesh applications. OP2 aims to decouple the scientific specification of the application from its parallel implementation, and thereby achieve code longevity and near-optimal performance through re-targeting the application to […]
Oct, 5

GPU accelerated 2-D staggered-grid finite difference seismic modelling

The staggered-grid finite difference (FD) method demands significantly computational capability and is inefficient for seismic wave modelling in 2-D viscoelastic media on a single PC. To improve computation speedup, a graphic processing units (GPUs) accelerated method was proposed, for modern GPUs have now become ubiquitous in desktop computers and offer an excellent cost-to-performance-ratio parallelism. The […]
Oct, 5

Applying software-managed caching and CPU/GPU task scheduling for accelerating dynamic workloads

In this talk we address two problems frequently encountered by GPU developers: optimizing memory access for kernels with complex input-dependent access patterns, and mapping the computations to a GPU or a CPU in composite applications with multiple dependent kernels. Both require dynamic adaptation and tuning of execution policies to allow high performance for a wide […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: