Posts
Oct, 18
Heterogeneous FTDT for Seismic Processing
In the early days of computing, scientific calculations were done by specialized hardware. More recently, increasingly powerful CPUs took over and have been dominant for a long time. Now though, scientific computation is not only for the general CPU environment anymore. GPUs are specialized processors with their own memory hierarchy requiring more effort to program, […]
Oct, 18
Efficient SVM Training Using Parallel Primal-Dual Interior Point Method on GPU
The training of SVM can be viewed as a Convex Quadratic Programming (CQP) problem which becomes difficult to be solved when dealing with the large scale data sets. Traditional methods such as Sequential Minimal Optimization (SMO) for SVM training is used to solve a sequence of small scale sub-problems, which costs a large amount of […]
Oct, 18
Using CUDA GPU to Accelerate the Ant Colony Optimization Algorithm
Graph Processing Units (GPUs) have recently evolved into a super multi-core and a fully programmable architecture. In the CUDA programming model, the programmers can simply implement parallelism ideas of a task on GPUs. The purpose of this paper is to accelerate Ant Colony Optimization (ACO) for Traveling Salesman Problems (TSP) with GPUs. In this paper, […]
Oct, 18
Dynamic Load Balancing in GPU-Based Systems – Early Experiments
The dynamic load-balancing framework in Charm++/AMPI, developed at the University of Illinois, is based on using processor virtualization to allow thread migration across processors. This framework has been successfully applied to many scientific applications in the past, such as BRAMS, NAMD, ChaNGa, and others. Most of these applications use only CPUs to perform their operations. […]
Oct, 17
Understanding and Modeling the Synchronization Cost in the GPU Architecture
Graphic Processing Units (GPUs) have been growing more and more popular being used for general purpose computations. GPUs are massively parallel processors which make them a much more ideal fit for many algorithms than the CPU is. The drawback to using a GPU to do a computation is that they are much less efficient at […]
Oct, 17
Empirical performance modeling of GPU kernels using active learning
We focus on a design-of-experiments methodology for developing empirical performance models of GPU kernels. Recently, we developed an iterative active learning algorithm that adaptively selects parameter configurations in batches for concurrent evaluation on CPU architectures in order to build performance models over the parameter space. In this paper, we illustrate the adoption of the algorithm […]
Oct, 17
A Dynamic Resource Management System for Network-Attached Accelerator Clusters
Over the years, cluster systems have become increasingly heterogeneous by equipping cluster nodes with one or more accelerators such as graphic processing units (GPU). These devices are typically attached to a compute node via PCI Express. As a consequence, batch systems such as TORQUE/Maui and SLURM have been extended to be aware of those additional […]
Oct, 17
Real-time computation of interactive waves using the GPU
The Maritime Research Institute Netherlands (MARIN) supplies innovative products for the offshore industry and shipping companies. Among their products are highly realistic, real-time bridge simulators [2], see Figure 1. Currently, the waves are deterministic and are not affected by ships, moles, breakwaters, piers, or any other object. To bring the simulators to the next level, […]
Oct, 17
cudaMap: a GPU accelerated program for gene expression connectivity mapping
BACKGROUND: Modern cancer research often involves large datasets and the use of sophisticated statistical techniques. Together these add a heavy computational load to the analysis, which is often coupled with issues surrounding data accessibility. Connectivity mapping is an advanced bioinformatic and computational technique dedicated to therapeutics discovery and drug re-purposing around differential gene expression analysis. […]
Oct, 15
Performance Comparison of GPU, DSP and FPGA implementations of image processing and computer vision algorithms in embedded systems
The objective of this thesis is to compare the suitability of FPGAs, GPUs and DSPs for digital image processing applications. Normalized cross-correlation is used as a benchmark, because this algorithm includes convolution, a common operation in image processing and elsewhere. Normalized cross-correlation is a template matching algorithm that is used to locate predefined objects in […]
Oct, 15
Scaling Soft Matter Physics to Thousands of GPUs in Parallel
We describe a multi-GPU implementation of the Ludwig application, which specialises in simulating of a variety of complex fluids via lattice Boltzmann fluid dynamics coupled to additional physics describing complex fluid constituents. We describe our methodology in augmenting the original CPU version with GPU functionality in a maintainable fashion. We present several optimisations that maximize […]
Oct, 15
Domain-Specific Languages for Heterogeneous Parallel Computing
The heterogeneous parallel computing era has been accompanied by an ever-increasing number of disparate programming models. As a result, improving performance via heterogeneous computing is currently very challenging for application programmers. Domain-specific languages (DSLs) are a potential solution to this problem, as they can provide productivity, performance, and portability within the confines of a specific […]