Posts
Mar, 18
A Fast GEMM Implementation On a Cypress GPU
We present benchmark results of optimized dense matrix multiplication kernels for Cypress GPU. We write general matrix multiply (GEMM) kernels for single (SP), double (DP) and double-double (DDP) precision. Our SGEMM and DGEMM kernels show ~2 Tflop/s and ~470 Gflop/s, respectively. These results for SP and DP correspond to 73% and 87% of the theoretical […]
Mar, 18
Bump Mapping Unparametrized Surfaces on the GPU
Original bump mapping is only defined for surfaces with a known surface parametrization. In this paper a new method, for the GPU, is proposed which does not use such a given parametrization. To compute the perturbed normal the only inputs used are the surface position, the height value and the original normal. The method decouples […]
Mar, 18
Hardware Acceleration of EDA Algorithms: GPU Architecture and the CUDA Programming Model
In this chapter we discuss the programming environment and model for programming the NVIDIA GeForce 280 GTX GPU, NVIDIA Quadro 5800 FX, and NVIDIA GeForce 8800 GTS devices, which are the GPUs used in our implementations. We discuss the hardware model, memory model, and the programmingmodel for these devices, in order to provide background for […]
Mar, 18
Scientific Computation Through a GPU
A personal computer’s graphics processing unit, or GPU, has been the seed of a growing interest in the academic and research communities of recent months. This paper investigates current technology that enables a GPU to process and solve linear algebra computations, in particular, matrix operations. Matrix operations of linear algebra are the basis of scientific […]
Mar, 18
Real-time ray tracing of implicit surfaces on the GPU
Compact representation of geometry using a suitable procedural or mathematical model and a ray-tracing mode of rendering fit the programmable graphics processor units (GPUs) well. Several such representations including parametric and subdivision surfaces have been explored in recent research. The important and widely applicable category of the general implicit surface has received less attention. In […]
Mar, 18
FPGA vs. GPU for sparse matrix vector multiply
Sparse matrix-vector multiplication (SpMV) is a common operation in numerical linear algebra and is the computational kernel of many scientific applications. It is one of the original and perhaps most studied targets for FPGA acceleration. Despite this, GPUs, which have only recently gained both general-purpose programmability and native support for double precision floating-point arithmetic, are […]
Mar, 18
Using Parallel Computing for the Display and Simulation of the Space Debris Environment
Parallelism is becoming the leading paradigm in today’s computer architectures. In order to take full advantage of this development, new algorithms have to be specifically designed for parallel execution while many old ones have to be upgraded accordingly. One field in which parallel computing has been firmly established for many years is computer graphics. Calculating […]
Mar, 18
MrBayes on a Graphics Processing Unit
MOTIVATION: Bayesian phylogenetic inference can be used to propose a “tree of life” for a collection of species whose DNA sequences are known. While there are many packages available that implement Bayesian phylogenetic inference, such as the popular MrBayes, running these programs poses significant computational challenges. Parallelized versions of the Metropolis coupled Markov chain Monte […]
Mar, 18
2D and 3D level-set algorithms on GPU
Locating object boundaries, modeling shapes is still an interesting and important task in many applications such as computer vision, object detection, image segmentation and tracking. In this paper we show the implementation of 2D and 3D algorithms based on the level sets using the advantages residing in today’s common GPUs. One main goal of this […]
Mar, 18
Directionally Unsplit Hydrodynamic Schemes with Hybrid MPI/OpenMP/GPU Parallelization in AMR
We present the implementation and performance of a class of directionally unsplit Riemann-solver-based hydrodynamic schemes on Graphic Processing Units (GPU). These schemes, including the MUSCL-Hancock method, a variant of the MUSCL-Hancock method, and the corner-transport-upwind method, are embedded into the adaptive-mesh-refinement (AMR) code GAMER. Furthermore, a hybrid MPI/OpenMP model is investigated, which enables the full […]
Mar, 17
Accelerated ray tracing for radiotherapy dose calculations on a GPU
PURPOSE: The graphical processing unit (GPU) on modern graphics cards offers the possibility of accelerating arithmetically intensive tasks. By splitting the work into a large number of independent jobs, order-of-magnitude speedups are reported. In this article, the possible speedup of PLATO’s ray tracing algorithm for dose calculations using a GPU is investigated. METHODS: A GPU […]
Mar, 17
Task Scheduling of Parallel Processing in CPU-GPU Collaborative Environment
With the rapid development of GPU (Graphics Processor Unit) in recent years, GPGPU (General-Purpose computation on GPU) has become an important technique in scientific research. However GPU might well be seen more as a cooperator than a rival to CPU. Therefore, we focus on exploiting the power of CPU and GPU in solving generic problems […]