Posts
May, 3
A GPU Tool for Efficient, Accurate, and Realistic Simulation of Cone Beam CT Projections
Simulation of x-ray projection images plays an important role in cone beam CT (CBCT) related research projects. A projection image contains primary signal, scatter signal, and noise. It is computationally demanding to perform accurate and realistic computations for all of these components. In this work, we develop a package on GPU, called gDRR, for the […]
May, 3
A Distributed GPU-based Framework for real-time 3D Volume Rendering of Large Astronomical Data Cubes
We present a framework to interactively volume-render three-dimensional data cubes using distributed ray-casting and volume bricking over a cluster of workstations powered by one or more graphics processing units (GPUs) and a multi-core CPU. The main design target for this framework is to provide an in-core visualization solution able to provide three-dimensional interactive views of […]
May, 3
Using high performance computing and Monte Carlo simulation for pricing american options
High performance computing (HPC) is a very attractive and relatively new area of research, which gives promising results in many applications. In this paper HPC is used for pricing of American options. Although the American options are very significant in computational finance; their valuation is very challenging, especially when the Monte Carlo simulation techniques are […]
May, 2
A Fair Comparison of Modern CPUs and GPUs Running the Genetic Algorithm under the Knapsack Benchmark
The paper introduces an optimized multicore CPU implementation of the genetic algorithm and compares its performance with a fine-tuned GPU version. The main goal is to show the true performance relation between modern CPUs and GPUs and eradicate some of myths surrounding GPU performance. It is essential for the evolutionary community to provide the same […]
May, 2
Dynamic Kernel/Device Mapping Strategies for GPU-assisted HPC Systems
With their high computation throughput and outstanding performance-per-watt figures, the graphics processing units (GPU) are becoming increasingly important for high-performance computing (HPC) systems. Existing GPU execution environment restricts the GPU usage to local host node. This is suitable for standalone computer nodes, but becomes inefficient for HPC systems that consist of a large number of […]
May, 2
Diderot: A Parallel DSL for Image Analysis and Visualization
Research scientists and medical professionals use imaging technology, such as computed tomography (CT) and magnetic resonance imaging (MRI) to measure a wide variety of biological and physical objects. The increasing sophistication of imaging technology creates demand for equally sophisticated computational techniques to analyze and visualize the image data. Analysis and visualization codes are often crafted […]
May, 2
GPU Acceleration for the C++ Standard Template Library
Modern programmers must exploit parallelism for performance gains, possibly through the use of an attached or on-chip GPU. To take advantage of the GPU in C++ programs, the programmer must use either a new language (CUDA or OpenCL) or an external library (Thrust). Rather than requiring that programmers learn new tools, modify existing code, and […]
May, 2
Automatic NUMA Characterization using Cbench
Clusters of seemingly homogeneous compute nodes are increasingly heterogeneous within each node due to replication and distribution of node-level subsystems. This intra-node heterogeneity can adversely affect program execution performance by inflicting additional data-access costs when accessing non-local data. In this work-in-progress paper, we present extensions to the Cbench Scalable Testing Framework for analyzing main memory […]
May, 1
OpenCL and the 13 Dwarfs: A Work in Progress
In the past, evaluating the architectural innovation of parallel computing devices relied on a benchmark suite based on existing programs, e.g., EEMBC or SPEC. However, with the growing ubiquity of parallel computing devices, we argue that it is unclear how best to express parallel computation, and hence, a need exists to identify a higher level […]
May, 1
Graphics Processing Unit Audio Signals Processing in Pure Data and PdCUDA an Implementation with the CUDA Runtime API
The design of graphics processing unit (GPU) audio signals processing extensions to Pure Data (Pd) is discussed with attention to future growth in GPU computing and the complexity of programming a general solution. An implementation named PdCUDA is presented for use of GPU general programming capability for audio signals processing with Pd and the CUDA […]
May, 1
Optimized GPU simulation of continuous-spin glass models
We develop a highly optimized code for simulating the Edwards-Anderson Heisenberg model on graphics processing units (GPUs). Using a number of computational tricks such as tiling, data compression and appropriate memory layouts, the simulation code combining over-relaxation, heat bath and parallel tempering moves achieves a peak performance of 0.29 ns per spin update on realistic […]
May, 1
Random number generators for massively parallel simulations on GPU
High-performance streams of (pseudo) random numbers are crucial for the efficient implementation for countless stochastic algorithms, most importantly, Monte Carlo simulations and molecular dynamics simulations with stochastic thermostats. A number of implementations of random number generators has been discussed for GPU platforms before and some generators are even included in the CUDA supporting libraries. Nevertheless, […]