Posts
Nov, 18
Optimizing the multipole-to-local operator in the fast multipole method for graphical processing units
This paper presents a number of algorithms to run the fast multipole method (FMM) on NVIDIA CUDA-capable graphical processing units (GPUs) (Nvidia Corporation, Sta. Clara, CA, USA). The FMM is a class of methods to compute pairwise interactions between N particles for a given error tolerance and with computational cost of O(N). The methods described […]
Nov, 18
Neon: A Domain-Specific Programming Language for Image Processing
Neon is a high-level domain-specific programming language for writing efficient image processing programs which can run on either the CPU or the GPU. End users write Neon programs in a C# programming environment. When the Neon program is executed, our optimizing code generator outputs human-readable source files for either the CPU or GPU. These source […]
Nov, 17
Dax Toolkit: A Proposed Framework for Data Analysis and Visualization at Extreme Scale
Experts agree that the exascale machine will comprise processors that contain many cores, which in turn will necessitate a much higher degree of concurrency. Software will require a minimum of a 1,000 times more concurrency. Most parallel analysis and visualization algorithms today work by partitioning data and running mostly serial algorithms concurrently on each data […]
Nov, 17
Compilation for Heterogeneous Computing: Automating Analyses, Transformations and Decisions
Hardware accelerators, such as fpga boards or gpu, are an interesting alternative or a valuable complement to classic multi-core processors for computational-intensive software. However it proves to be both costly and difficult to use legacy applications with these new heterogeneous targets. In particular, existing compilers are generally targeted toward code generation for sequential processors and […]
Nov, 17
Programming Future Parallel Architectures with Haskell and Intel ArBB
New parallel architectures, such as Cell, Intel MIC, GPUs, and tiled architectures, enable high performance but are often hard to program. What is needed is a bridge between high-level programming models where programmers are most productive and modern parallel architectures. We propose that that bridge is Embedded Domain Specific Languages (EDSLs). One attractive target for […]
Nov, 17
Scientific GPU Programming with Data-Flow Languages
Graphical Processing Units or GPUs are processors used primarily to render images from computer models for domains ranging from gaming to design engineering. As the generation of very accurate images often in real time is extremely computationally intensive, they have developed into extremely powerful processors. To achieve this they have relied on being able to […]
Nov, 17
FPGA and ASIC Convergence
The growing demands on multimedia applications and high-speed high-quality telecommunication systems with real-time constrains oriented to portable, low power consumption, devices, have being driven technologies development, methodologies and design flows of embedded systems during the last years. Through the analysis of design methodologies and strategies facing multi-core, reconfigurability and power consumption challenges, this educational survey […]
Nov, 17
Characterization and Transformation of Unstructured Control Flow in GPU Applications
Hardware and compiler techniques for mapping data-parallel programs with divergent control flow to SIMD architectures have recently enabled the emergence of new GPGPU programming models such as CUDA and OpenCL. Although this technology is widely used, commodity GPUs use different schemes to implement it, and the performance limitations of these different schemes under real workloads […]
Nov, 17
Massive Image Editing on the Cloud
Processing massive imagery in a distributed environment currently requires the effort of a skilled team to efficiently handle communication, synchronization, faults, and data/process distribution. Moreover, these implementations are highly optimized for a specific system or cluster, therefore portability or improved performance due to system improvements is rarely considered. Much like early GPU computing, cluster computing […]
Nov, 17
Adaboost GPU-based Classifier for Direct Volume Rendering
In volume visualization, the voxel visibitity and materials are carried out through an interactive editing of Transfer Function. In this paper, we present a two-level GPU-based labeling method that computes in times of rendering a set of labeled structures using the Adaboost machine learning classifier. In a pre-processing step, Adaboost trains a binary classifier from […]
Nov, 17
The role of GPU computing in medical image analysis and visualization
The role of computers in medical image display and analysis continues to be one of the most computationally demanding tasks facing modern computers. Recent advances in GPU architecture have allowed for a new programming paradigm which utilized the massively parallel computational capacity of GPUs for general purpose computing. These parallel processors provide substantial performance benefits […]
Nov, 17
Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs
The power of GPUs is giving rise to heterogeneous parallel computing, with new demands on programming environments, runtime systems, and tools to deliver high-performing applications. This paper studies the problems associated with performance measurement of heterogeneous machines with GPUs. A heterogeneous computation model and alternative host-GPU measurement approaches are discussed to set the stage for […]