5908

Posts

Oct, 7

A Framework for Automatic OpenMP Code Generation

It is always a tedious task to manually analyze and detect parallelism in programs. When we deal with autoparallelism the task becomes more complex. Frameworks such as OpenMP is available through which we can manually annotate the code to realize parallelism and take the advantage of underlying multi-core architecture. But the programmer’s life becomes simple […]
Oct, 7

Implementation of a High Throughput 3GPP Turbo Decoder on GPU

Turbo code is a computationally intensive channel code that is widely used in current and upcoming wireless standards. General-purpose graphics processor unit (GPGPU) is a programmable commodity processor that achieves high performance computation power by using many simple cores. In this paper, we present a 3GPP LTE compliant Turbo decoder accelerator that takes advantage of […]
Oct, 7

Run-time Reconfigurable Multiprocessors

The main advantage in multiprocessors is the performance speedup obtained with parallelism at processor-level. Similarly, the flexibility for application-specific adaptability is the advantage in reconfigurable architectures. To benefit from both these architectures, we present a reconfigurable multiprocessor template, which combines the benefits of parallelism in multiprocessors and flexibility in reconfigurable architectures. A fast, single cycle, […]
Oct, 6

Divergence Analysis and Optimizations

The growing interest in GPU programming has brought renewed attention to the Single Instruction Multiple Data (SIMD) execution model. SIMD machines give application developers a tremendous computational power; however, the model also brings restrictions. In particular, processing elements (PEs) execute in lock-step, and may lose performance due to divergences caused by conditional branches. In face […]
Oct, 6

CUDA performance analyzer

GPGPU Computing using CUDA is rapidly gaining ground today. GPGPU has been brought to the masses through the ease of use of CUDA and ubiquity of graphics cards supporting the same. Although CUDA has a low learning curve for programmers familiar with standard programming languages like C, extracting optimum performance from it, through optimizations and […]
Oct, 6

Offloading Java to Graphics Processors

Massively-parallel graphics processors have the potential to offer high performance at low cost. However, at present such devices are largely inaccessible from higher-level languages such as Java. This work allows compilation from Java bytecode by making use of annotations to specify loops for parallel execution. Data copying to and from the GPU is handled automatically. […]
Oct, 6

Parallelisation of Java for Graphics Processors

The aim of the project was to allow extraction and compilation of Java virtual machine bytecode for parallel execution on graphics cards, specifically the NVIDIA CUDA framework, by both explicit and automatic means. The compiler, which was produced, successfully extracts and compiles code from class files into CUDA C++ code, and outputs transformed classes that […]
Oct, 6

Multi-core programming with OpenCL: performance and portability: OpenCL in a memory bound scenario

With the advent of multi-core processors desktop computers have become multiprocessors requiring parallel programming to be utilized efficiently. Efficient and portable parallel programming of future multi-core processors and GPUs is one of today’s most important challenges within computer science. Okuda Laboratory at The University of Tokyo in Japan focuses on solving engineering challenges with parallel […]
Oct, 6

Python for Development of OpenMP and CUDA Kernels for Multidimensional Data

Design of data structures for high performance computing (HPC) is one of the principal challenges facing researchers looking to utilize heterogeneous computing machinery. Heterogeneous systems derive cost, power, and speed efficiency by being composed of the appropriate hardware for the task. Yet, each type of processor requires a specific organization of the application state in […]
Oct, 6

Accelerating a climate physics model with OpenCL

Open Computing Language (OpenCL) is fast becoming the standard for heterogeneous parallel computing. It is designed to run on CPUs, GPUs, and other accelerator architectures. By implementing a real world application, a solar radiation model component widely used in climate and weather models, we show that OpenCL multi-threaded programming and execution model can dramatically increase […]
Oct, 6

Static GPU threads and an improved scan algorithm

Current GPU programming systems automatically distribute the work on all GPU processors based on a set of fixed assumptions, e.g. that all tasks are independent from each other. We show that automatic distribution limits algorithmic design, and demonstrate that manual work distribution hardly adds any overhead. Our Scan+algorithm is an improved scan relying on manual […]
Oct, 6

GPU-based single-cluster algorithm for the simulation of the Ising model

We present the GPU calculation with the common unified device architecture (CUDA) for the Wolff single-cluster algorithm of the Ising model. Proposing an algorithm for a quasi-block synchronization, we realize the Wolff single-cluster Monte Carlo simulation with CUDA. We perform parallel computations for the newly added spins in the growing cluster. As a result, the […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: