5911

Posts

Oct, 7

Heterogeneous NPACI-Rocks/MPI/CUDA distributed multi-GPGPU application for seeking counterexamples to Beal’s Conjecture: MPI/CUDA integration component

Beal’s Conjecture asserts that if Ax + By = Cz for integers A,B,C > 0 and integers x,y,z > 2, then A, B, and C share a common prime factor. While empirical computational studies by several researchers have established that Beal’s Conjecture holds for all A,B,C,x,y,z < 1000, the truth of the general conjecture remains […]
Oct, 7

Hybrid coherence for scalable multicore architectures

This work describes a cache architecture and memory model for 1000+ core microprocessors. Our approach exploits workload characteristics and programming model assumptions to build a hybrid memory model that incorporates features from both software-managed coherence schemes and hardware cache coherence. The goal is to achieve the scalability found in compute accelerators, which support relaxed ordering […]
Oct, 7

Intel’s Array Building Blocks: A retargetable, dynamic compiler and embedded language

Our ability to create systems with large amount of hardware parallelism is exceeding the average software developer’s ability to effectively program them. This is a problem that plagues our industry. Since the vast majority of the world’s software developers are not parallel programming experts, making it easy to write, port, and debug applications with sufficient […]
Oct, 7

A Framework for Automatic OpenMP Code Generation

It is always a tedious task to manually analyze and detect parallelism in programs. When we deal with autoparallelism the task becomes more complex. Frameworks such as OpenMP is available through which we can manually annotate the code to realize parallelism and take the advantage of underlying multi-core architecture. But the programmer’s life becomes simple […]
Oct, 7

Implementation of a High Throughput 3GPP Turbo Decoder on GPU

Turbo code is a computationally intensive channel code that is widely used in current and upcoming wireless standards. General-purpose graphics processor unit (GPGPU) is a programmable commodity processor that achieves high performance computation power by using many simple cores. In this paper, we present a 3GPP LTE compliant Turbo decoder accelerator that takes advantage of […]
Oct, 7

Run-time Reconfigurable Multiprocessors

The main advantage in multiprocessors is the performance speedup obtained with parallelism at processor-level. Similarly, the flexibility for application-specific adaptability is the advantage in reconfigurable architectures. To benefit from both these architectures, we present a reconfigurable multiprocessor template, which combines the benefits of parallelism in multiprocessors and flexibility in reconfigurable architectures. A fast, single cycle, […]
Oct, 6

Divergence Analysis and Optimizations

The growing interest in GPU programming has brought renewed attention to the Single Instruction Multiple Data (SIMD) execution model. SIMD machines give application developers a tremendous computational power; however, the model also brings restrictions. In particular, processing elements (PEs) execute in lock-step, and may lose performance due to divergences caused by conditional branches. In face […]
Oct, 6

CUDA performance analyzer

GPGPU Computing using CUDA is rapidly gaining ground today. GPGPU has been brought to the masses through the ease of use of CUDA and ubiquity of graphics cards supporting the same. Although CUDA has a low learning curve for programmers familiar with standard programming languages like C, extracting optimum performance from it, through optimizations and […]
Oct, 6

Offloading Java to Graphics Processors

Massively-parallel graphics processors have the potential to offer high performance at low cost. However, at present such devices are largely inaccessible from higher-level languages such as Java. This work allows compilation from Java bytecode by making use of annotations to specify loops for parallel execution. Data copying to and from the GPU is handled automatically. […]
Oct, 6

Parallelisation of Java for Graphics Processors

The aim of the project was to allow extraction and compilation of Java virtual machine bytecode for parallel execution on graphics cards, specifically the NVIDIA CUDA framework, by both explicit and automatic means. The compiler, which was produced, successfully extracts and compiles code from class files into CUDA C++ code, and outputs transformed classes that […]
Oct, 6

Multi-core programming with OpenCL: performance and portability: OpenCL in a memory bound scenario

With the advent of multi-core processors desktop computers have become multiprocessors requiring parallel programming to be utilized efficiently. Efficient and portable parallel programming of future multi-core processors and GPUs is one of today’s most important challenges within computer science. Okuda Laboratory at The University of Tokyo in Japan focuses on solving engineering challenges with parallel […]
Oct, 6

Python for Development of OpenMP and CUDA Kernels for Multidimensional Data

Design of data structures for high performance computing (HPC) is one of the principal challenges facing researchers looking to utilize heterogeneous computing machinery. Heterogeneous systems derive cost, power, and speed efficiency by being composed of the appropriate hardware for the task. Yet, each type of processor requires a specific organization of the application state in […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: