Posts
Feb, 21
Acceleration of Binomial Options Pricing via Parallelizing along time-axis on a GPU
Since the introduction of organized trading of options for commodities and equities, computing fair prices for options has been an important problem in financial engineering. A variety of numerical methods, including Monte Carlo methods, binomial trees, and numerical solution of stochastic differential equations, are used to compute fair prices. Traders and brokerage firms constantly strive […]
Feb, 21
A massively parallel framework using P systems and GPUs
Since CUDA programing model appeared on the general purpose computations, the developers can extract all the power contained in GPUs (Graphics Processing Unit) across many computational domains. Among these domains, P systems or membrane systems provide a high level computational modeling framework that allows, in theory, to obtain polynomial time solutions to NP-complete problems by […]
Feb, 21
GPU Acceleration of Equations Assembly in Finite Elements Method – Preliminary Results
The finite element method (FEM) is widely used for numerical solution of partial differential equations. Two computationally expensive tasks have to be performed in FEM – equations assembly and solution of the system of equations. We present mapping of the equations assembly problem for StVenant-Kirchhoff material to GPU computation model and show results of its […]
Feb, 21
Data parallel loop statement extension to CUDA: GpuC
In recent years, Graphics Processing Units (GPUs) have emerged as a powerful accelerator for general-purpose computations. GPUs are attached to every modern desktop and laptop host CPU as graphics accelerators. GPUs have over a hundred cores with lots of parallelism. Initially, they were used only for graphics applications such as image processing and video games. […]
Feb, 21
APEnet+: high bandwidth 3D torus direct network for petaflops scale commodity clusters
We describe herein the APElink+ board, a PCIe interconnect adapter featuring the latest advances in wire speed and interface technology plus hardware support for a RDMA programming model and experimental acceleration of GPU networking; this design allows us to build a low latency, high bandwidth PC cluster, the APEnet+ network, the new generation of our […]
Feb, 20
Final Project Implementing Extremely Randomized Trees in CUDA
In this paper, we present an implementation of extremely randomized trees (ERT), a supervised machine learning algorithm utilizing decision tree ensembles, in CUDA, nVidia’s GPU parallel programming extensions for C/C++. We describe the CUDA programming model and NVIDIA GPU architectures and explain the design tradeoffs that we made to exploit various forms of parallelism available […]
Feb, 20
Architecting graphics processors for non-graphics compute acceleration
This paper discusses the emergence of graphics processing units (GPUs) that contain architecture features for accelerating non-graphics (or GPGPU) applications. It provides an introduction for those interested in undertaking research at the intersection of manycore computing and GPU architecture. First, the motivation for using GPUs for non-graphics processing rather than developing specialized hardware is outlined. […]
Feb, 20
Design Space Exploration for GPU-Based Architecture
Recent advances in Graphics Processing Units (GPUs) provide opportunities to exploit GPUs for non-graphics applications. Scientific computation is inherently parallel, which is a good candidate to utilize the computing power of GPUs. This report investigates QR factorization, which is an important building block of scientific computation. We analyze different mapping mtheods of QR factorization on […]
Feb, 20
Fast Exact String Matching on the GPU
We present a string-matching program that runs on the GPU. Our program, Cmatch, achieves a speedup of as much as 35x on a recent GPU over the equivalent CPU-bound version. String matching has a long history in computational biology with roots in finding similar proteins and gene sequences in a database of known sequences. The […]
Feb, 20
Program Optimization Study on a 128-Core GPU
The newest generations of graphics processing unit (GPU) architecture, such as the NVIDIA GeForce 8-series, feature new interfaces that improve programmability and generality over previous GPU generations. Using NVIDIA’s Compute Unified Device Architecture (CUDA), the GPU is presented to developers as a flexible parallel architecture. This flexibility introduces the opportunity to perform a wide variety […]
Feb, 20
How GPUs Can Improve the Quality of Magnetic Resonance Imaging
In magnetic resonance imaging (MRI), nonCartesian scan trajectories are advantageous in a wide variety of emerging applications. Advanced reconstruction algorithms that operate directly on non-Cartesian scan data using optimality criteria such as least-squares (LS) can produce significantly better images than conventional algorithms that apply a fast Fourier transform (FFT) after interpolating the scan data onto […]
Feb, 20
MCUDA: An Efficient Implementation of CUDA Kernels on Multi-cores
The CUDA programming model, which is based on an extended ANSI C language and a runtime environment, allows the programmer to specify explicitly data parallel computation. NVIDIA developed CUDA to open the architecture of their graphics accelerators to more general applications, but did not provide an efficient mapping to execute the programming model on any […]