Posts
Sep, 14
Parallelize L-BFGS-B on the GPU
Nonlinear optimization is at the heart of many algorithms in engineering. Recently, due to the rise of general purpose graphics processing unit (GPGPU), it is promising to investigate the performance improvement of optimization methods after parallelized. While much has been done for simple optimization methods such as conjugate gradient, due to the strong dependencies contained, […]
Sep, 14
An Optimized Parallel IDCT on Graphics Processing Units
In this paper we present an implementation of the H.264/AVC Inverse Discrete Cosine Transform (IDCT) optimized for Graphics Processing Units (GPUs) using OpenCL. By exploiting that most of the input data of the IDCT for real videos are zero valued coefficients a new compacted data representation is created that allows for several optimizations. Experimental evaluations […]
Sep, 14
Parallel Ray Tracing Simulations with MATLAB for Dynamic Lens Systems
Ray tracing simulations are required for investigating the dynamical behavior of optical systems. By means of image simulations, an exposed image can be generated. However, this requires a high number of rays which have to be traced through an optical system. Since all rays are independent of each other, they can be traced individually using […]
Sep, 14
On the Validation and Applications of a Parallel Flexible Multi-Body Dynamics Implementation
This work discusses how a flexible body formalism, specifically, the Absolute Nodal Coordinate Formulation (ANCF), is combined with the Discrete Element Method (DEM) and the Newmark implicit integration method to address many-body dynamics problems; i.e., problems with hundreds of thousands of rigid and deformable bodies. DEM is used to model friction and contact between elements, […]
Sep, 14
Accelerating Sparse Matrix Kernels on Graphics Processing Units
After microprocessor clock speeds have levelled off, general purpose computing on Graphics Pro- cessing Units (GPGPUs) has shown some promise for the future High Performance Computing (HPC). This can be largely attributed to the performance per unit cost, performance per unit watt and CUDA, a programming model for GPU. For instance, for under $400, one […]
Sep, 13
Data Sorting Using Graphics Processing Units
Graphics processing units (GPUs) have been increasingly used for general-purpose computation in recent years. The GPU accelerated applications are found in both scientific and commercial domains. Sorting is considered as one of the very important operations in many applications, so its efficient implementation is essential for the overall application performance. This paper represents an effort […]
Sep, 13
GPU Fluid Simulation using Smoothed Particle Hydrodynamics
In this paper we present an overview of our implementation of a fluid simulation technique called "Smoothed Particle Hydrodynamics". Our implementation uses a hybrid CPU+GPU hash based data structure to provide quick lookups of particle nearest neighbors and improve memory access patterns.In our discussion we begin with a brief overview of the Navier Stokes equations […]
Sep, 13
Exploring Heterogeneous Scheduling using the Task-Centric Programming Model
Computer architecture technology is moving towards more heterogeneous solutions, which will contain a number of processing units with different capabilities that may increase the performance of the system as a whole. However, with increased performance comes increased complexity; complexity that is now barely handled in homogeneous multiprocessing systems. The present study tries to solve a […]
Sep, 13
Multi-GPU implementation of the NICAM atmospheric model
Climate simulation models are used for a variety of scientific problems and accuracy of the climate prognoses is mostly limited by the resolution of the models. Finer resolution results in more accurate prognoses but, at the same time, significantly increases computational complexity. This explains the increasing interest to the High Performance Computing (HPC), and GPU […]
Sep, 13
Systematic Approach in Optimizing Numerical Memory-Bound Kernels on GPU
The use of GPUs has been very beneficial in accelerating dense linear algebra computational kernels (DLA). Many high performance numerical libraries like CUBLAS, MAGMA, and CULA provide BLAS and LAPACK implementations on GPUs as well as hybrid computations involving both, CPUs and GPUs. GPUs usually score better performance than CPUs for compute-bound operations, especially those […]
Sep, 12
Hotspot Analysis Based Partial CUDA Acceleration of HMMER 3.0 on GPGPUs
With the introduction of many-core GPUs, there is widespread interest in using GPUs to accelerate non-graphics applications such as bioinformatics, energy, finance and several research areas. Even though the GPUs provide highly parallel processing capability, the communication interface between CPU and GPU could be a performance bottleneck due to heavy data transfer. If data transfer […]
Sep, 12
Fast Determination of the Number of Endmembers for Real-Time Hyperspectral Unmixing on GPUs
Spectral unmixing is a very important task for remotely sensed hyperspectral data exploitation. It amounts at identifying a set of spectrally pure components (called endmembers) and their associated per-pixel coverage fractions (called abundances). A challenging problem in spectral unmixing is how to determine the number of endmembers in a given scene. Several automatic techniques exist […]