Posts
Sep, 15
NT-SIM: A Co-Simulator for Networked Signal Processing Applications
In networked signal processing systems, network nodes that perform embedded processing on sensory inputs and other data interact across wired or wireless communication networks. In such applications, the processing on individual network nodes can be described in terms of dataflow graphs. However, to analyze the correctness and performance of these applications, designers must understand the […]
Sep, 15
Real-time Kd-tree Based Importance Sampling of Environment Maps
We present a new real-time importance sampling algorithm for environment maps. Our method is based on representing environment maps using kd-tree structures, and generating samples with a single data lookup. An efficient algorithm has been developed for realtime image-based lighting applications. In this paper, we compared our algorithm with Inversion method [Fishman 1996]. We show […]
Sep, 15
Reducing thread divergence in a GPU-accelerated branch-and-bound algorithm
In this paper, we address the design and implementation of GPU-accelerated Branch-and-Bound algorithms (B&B) for solving Flow-shop scheduling optimization problems (FSP). Such applications are CPU-time consuming and highly irregular. On the other hand, GPUs are massively multi-threaded accelerators using the SIMD model at execution. A major issue which arises when executing on GPU a B&B […]
Sep, 15
Efficient computation of condition estimates for linear least squares problems
Linear least squares (LLS) is a classical linear algebra problem in scientific computing, arising for instance in many parameter estimation problems. In addition to computing efficiently LLS solutions, an important issue is to assess the numerical quality of the computed solution. The notion of conditioning provides a theoretical framework that can be used to measure […]
Sep, 15
High-Throughput parallel blind Virtual Screening using BINDSURF
BACKGROUND: Virtual Screening (VS) methods can considerably aid clinical research, predicting how ligands interact with drug targets. Most VS methods suppose a unique binding site for the target, usually derived from the interpretation of the protein crystal structure. However, it has been demonstrated that in many cases, diverse ligands interact with unrelated parts of the […]
Sep, 14
Parallelize L-BFGS-B on the GPU
Nonlinear optimization is at the heart of many algorithms in engineering. Recently, due to the rise of general purpose graphics processing unit (GPGPU), it is promising to investigate the performance improvement of optimization methods after parallelized. While much has been done for simple optimization methods such as conjugate gradient, due to the strong dependencies contained, […]
Sep, 14
An Optimized Parallel IDCT on Graphics Processing Units
In this paper we present an implementation of the H.264/AVC Inverse Discrete Cosine Transform (IDCT) optimized for Graphics Processing Units (GPUs) using OpenCL. By exploiting that most of the input data of the IDCT for real videos are zero valued coefficients a new compacted data representation is created that allows for several optimizations. Experimental evaluations […]
Sep, 14
Parallel Ray Tracing Simulations with MATLAB for Dynamic Lens Systems
Ray tracing simulations are required for investigating the dynamical behavior of optical systems. By means of image simulations, an exposed image can be generated. However, this requires a high number of rays which have to be traced through an optical system. Since all rays are independent of each other, they can be traced individually using […]
Sep, 14
On the Validation and Applications of a Parallel Flexible Multi-Body Dynamics Implementation
This work discusses how a flexible body formalism, specifically, the Absolute Nodal Coordinate Formulation (ANCF), is combined with the Discrete Element Method (DEM) and the Newmark implicit integration method to address many-body dynamics problems; i.e., problems with hundreds of thousands of rigid and deformable bodies. DEM is used to model friction and contact between elements, […]
Sep, 14
Accelerating Sparse Matrix Kernels on Graphics Processing Units
After microprocessor clock speeds have levelled off, general purpose computing on Graphics Pro- cessing Units (GPGPUs) has shown some promise for the future High Performance Computing (HPC). This can be largely attributed to the performance per unit cost, performance per unit watt and CUDA, a programming model for GPU. For instance, for under $400, one […]
Sep, 13
Data Sorting Using Graphics Processing Units
Graphics processing units (GPUs) have been increasingly used for general-purpose computation in recent years. The GPU accelerated applications are found in both scientific and commercial domains. Sorting is considered as one of the very important operations in many applications, so its efficient implementation is essential for the overall application performance. This paper represents an effort […]
Sep, 13
GPU Fluid Simulation using Smoothed Particle Hydrodynamics
In this paper we present an overview of our implementation of a fluid simulation technique called "Smoothed Particle Hydrodynamics". Our implementation uses a hybrid CPU+GPU hash based data structure to provide quick lookups of particle nearest neighbors and improve memory access patterns.In our discussion we begin with a brief overview of the Navier Stokes equations […]