
Oct, 18

OpenACC-based Snow Simulation

In recent years, the GPU platform has risen in popularity in high performance computing due to its cost effectiveness and high computing power offered through its many parallel cores. The GPUs computing power can be harnessed using the low-level GPGPU programming APIs CUDA and OpenCL. While both CUDA and OpenCL gives the programmer fine-grained control […]
Oct, 18

Heterogeneous Clustering with Homogeneous Code: Accelerate MPI Applications Without Code Surgery Using Intel Xeon Phi Coprocessors

This paper reports on our experience with a heterogeneous cluster execution environment, in which a distributed parallel application utilizes two types of compute devices: those employing general-purpose processors, and those based on computing accelerators known as Intel Xeon Phi coprocessors. Unlike general-purpose graphics processing units (GPGPUs), Intel Xeon Phi coprocessors are able to execute native […]
Oct, 18

Towards Code Generation from Design Models for Embedded Systems on Heterogeneous CPU-GPU Platforms

The complexity of modern embedded systems is ever increasing and the selection of target platforms is shifting from homogeneous to more heterogeneous and powerful configurations. In our previous works, we exploited the power of model-driven techniques to deal with such complexity by enabling the automatic generation of full-fledged functional code from UML models enriched with […]
Oct, 18

Heterogeneous FTDT for Seismic Processing

In the early days of computing, scientific calculations were done by specialized hardware. More recently, increasingly powerful CPUs took over and have been dominant for a long time. Now though, scientific computation is not only for the general CPU environment anymore. GPUs are specialized processors with their own memory hierarchy requiring more effort to program, […]
Oct, 18

Efficient SVM Training Using Parallel Primal-Dual Interior Point Method on GPU

The training of SVM can be viewed as a Convex Quadratic Programming (CQP) problem which becomes difficult to be solved when dealing with the large scale data sets. Traditional methods such as Sequential Minimal Optimization (SMO) for SVM training is used to solve a sequence of small scale sub-problems, which costs a large amount of […]
Oct, 18

Using CUDA GPU to Accelerate the Ant Colony Optimization Algorithm

Graph Processing Units (GPUs) have recently evolved into a super multi-core and a fully programmable architecture. In the CUDA programming model, the programmers can simply implement parallelism ideas of a task on GPUs. The purpose of this paper is to accelerate Ant Colony Optimization (ACO) for Traveling Salesman Problems (TSP) with GPUs. In this paper, […]
Oct, 18

Dynamic Load Balancing in GPU-Based Systems – Early Experiments

The dynamic load-balancing framework in Charm++/AMPI, developed at the University of Illinois, is based on using processor virtualization to allow thread migration across processors. This framework has been successfully applied to many scientific applications in the past, such as BRAMS, NAMD, ChaNGa, and others. Most of these applications use only CPUs to perform their operations. […]
Oct, 17

Understanding and Modeling the Synchronization Cost in the GPU Architecture

Graphic Processing Units (GPUs) have been growing more and more popular being used for general purpose computations. GPUs are massively parallel processors which make them a much more ideal fit for many algorithms than the CPU is. The drawback to using a GPU to do a computation is that they are much less efficient at […]
Oct, 17

Empirical performance modeling of GPU kernels using active learning

We focus on a design-of-experiments methodology for developing empirical performance models of GPU kernels. Recently, we developed an iterative active learning algorithm that adaptively selects parameter configurations in batches for concurrent evaluation on CPU architectures in order to build performance models over the parameter space. In this paper, we illustrate the adoption of the algorithm […]
Oct, 17

A Dynamic Resource Management System for Network-Attached Accelerator Clusters

Over the years, cluster systems have become increasingly heterogeneous by equipping cluster nodes with one or more accelerators such as graphic processing units (GPU). These devices are typically attached to a compute node via PCI Express. As a consequence, batch systems such as TORQUE/Maui and SLURM have been extended to be aware of those additional […]
Oct, 17

Real-time computation of interactive waves using the GPU

The Maritime Research Institute Netherlands (MARIN) supplies innovative products for the offshore industry and shipping companies. Among their products are highly realistic, real-time bridge simulators [2], see Figure 1. Currently, the waves are deterministic and are not affected by ships, moles, breakwaters, piers, or any other object. To bring the simulators to the next level, […]
Oct, 17

cudaMap: a GPU accelerated program for gene expression connectivity mapping

BACKGROUND: Modern cancer research often involves large datasets and the use of sophisticated statistical techniques. Together these add a heavy computational load to the analysis, which is often coupled with issues surrounding data accessibility. Connectivity mapping is an advanced bioinformatic and computational technique dedicated to therapeutics discovery and drug re-purposing around differential gene expression analysis. […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: