6073

Posts

Oct, 19

10×10: A General-purpose Architectural Approach to Heterogeneity and Energy Efficiency

Two decades of microprocessor architecture driven by quantitative 90/10 optimization has delivered an extraordinary 1000-fold improvement in microprocessor performance, enabled by transistor scaling which improved density, speed, and energy. Recent generations of technology have produced limited benefits in transistor speed and power, so as a result the industry has turned to multicore parallelism for performance […]
Oct, 19

Heterogeneous Accelerated Bioinformatics-Perspectives for Cancer Research

The demand for even higher performance in bioinformatics data analysis continues to grow rapidly as the volumes of data generated by next generation sequencing equipment soar. Traditional acceleration techniques historically used for faster bioinformatics application will individually be insufficient to meet the demand and increased analysis complexity, requiring an integrated heterogeneous accelerated computing environment. Current […]
Oct, 19

A Code Transformation Framework for Scientific Applications on Structured Grids

The combination of expert-tuned code expression and aggressive compiler optimizations is known to deliver the best achievable performance for modern multicore processors. The development and maintenance of these optimized code expressions is never trivial. Tedious and error-prone processes greatly decrease the code developer’s willingness to adopt manually-tuned optimizations. In this paper, we describe a pre-compilation […]
Oct, 19

Pricing composable contracts on the GP-GPU

We present a language for specifying stochastic processes, called SPL. We show that SPL can express the price of a range of financial contracts, including so called exotic options with path dependence and with multiple sources of uncertainty. Jones, Eber and Seward previously presented a language for writing down financial contracts in a compositional manner […]
Oct, 19

AeminiumGPU: A CPU-GPU Hybrid Runtime for the Aeminium Language

Given that CPU clock speeds are stagnating, programmers are resorting to parallelism to improve the performance of their applications. Although such parallelism has usually been attained using either multicore architectures, multiple CPUs and/or clusters of machines, the GPU has since been used as an alternative. GPUs are an interesting resource because they can provide much […]
Oct, 19

The GPU Computing Revolution: From Multi-Core CPUs To Many-Core Graphics Processors

Computer architectures are undergoing their most radical change in a decade. In the past, processor performance has been improved largely by increasing clock speed: the faster the clock speed, the faster a processor can execute instructions, and thus the greater the performance that is delivered to the end user. This drive to greater and greater […]
Oct, 19

GPU Parallel Collections For Scala

A decade ago, graphics processing units have been used specifically for high-speed graphics. Of late, they are becoming more popular as general purpose parallel processors. With the release of CUDA, ATI Stream and OpenCL, programmers can now split their program execution between CPU and GPU, whenever appropriate, resulting in huge performance gain. The cost of […]
Oct, 18

Enabling Traceability in an MDE Approach to Improve Performance of GPU Applications

Graphics Processor Units (GPUs) are known for offering high performance and power efficiency for processing algorithms that suit well to theirmassively parallel architecture. Unfortunately, as parallel programming for thiskind of architecture requires a complex distribution of tasks and data, developersfind it difficult to implement their applications effectively. Although approachesbased on source-to-source and model-to-source transformations have […]
Oct, 18

Characterization of FPGA-based High Performance Computers

As CPU clock frequencies plateau and the doubling of CPU cores per processor exacerbate the memory wall, hybrid core computing, utilizing CPUs augmented with FPGAs and/or GPUs holds the promise of addressing high-performance computing demands, particularly with respect to performance, power and productivity. While traditional approaches to benchmark high-performance computers such as SPEC, took an […]
Oct, 18

Design and Performance of the OP2 Library for Unstructured Mesh Applications

OP2 is an "active" library framework for the solution of unstructured mesh applications. It aims to decouple the scientific specification of an application from its parallel implementation to achieve code longevity and near-optimal performance by re-targeting the back-end to different multi-core/many-core hardware. This paper presents the design of the OP2 code generation and compiler framework […]
Oct, 18

Radio astronomy beam forming on GPUs

In order to build the radio telescopes needed for the experiments planned for the years to come, it will be necessary to design computers capable of performing thousands more floating point operations per second than the actual most powerful computers of today, and do it in a very power efficient way. In this work we […]
Oct, 18

Real-Time Spherical Panorama Image Stitching Using OpenCL

This paper presents a webcam-based spherical coordinate conversion system using OpenCL massive parallel computing for panorama video image stitching. With multi-core architecture and its high-bandwidth data transmission rate of memory accesses, modern programmable GPU makes it possible to process multiple video images in parallel for real-time interaction. To get a panorama view of 360 degrees, […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: