Posts
Oct, 19
Multi-Scale Scheduling Techniques for Signal Processing Systems
A variety of hardware platforms for signal processing has emerged, from distributed systems such as Wireless Sensor Networks (WSNs) to parallel systems such as Multicore Programmable Digital Signal Processors (PDSPs), Multicore General Purpose Processors (GPPs), and Graphics Processing Units (GPUs) to heterogeneous combinations of parallel and distributed devices. When a signal processing application is implemented […]
Oct, 19
FlexGrip: A Soft GPGPU for FPGAs
Over the past decade, soft microprocessors and vector processors have been extensively used in FPGAs for a wide variety of applications. However, it is difficult to straightforwardly extend their functionality to support conditional and thread-based execution characteristic of general-purpose graphics processing units (GPGPUs) without recompiling FPGA hardware for each application. In this paper, we describe […]
Oct, 19
An Incompressible Navier-Stokes Equations Solver on the GPU Using CUDA
Graphics Processing Units (GPUs) have emerged as highly capable computational accelerators for scientific and engineering applications. Many reports claim orders of magnitude of speedup compared to traditional Central Processing Units (CPUs), and the interest for GPU computation is high in the computational world. In this thesis, the capability of using GPUs to accelerate the full […]
Oct, 19
Massively Parallel Jacobian Computation
The Jacobian evaluation problem is ubiquitous throughout scientiOc computing. In this article, the possibility of massively parallel computing of Jacobian matrix is discussed. It is shown that the computation of the Jacobian matrix shares the same parallelism with the computation being differentiated, which suggests that once we know how to parallelize a computation, its Jacobian […]
Oct, 19
Efficient fine grained shared buffer management for multiple OpenCL devices
OpenCL programming provides full code portability between different hardware platforms, and can serve as a good programming candidate for heterogeneous systems, which typically consist of a host processor and several accelerators. However, to make full use of the computing capacity of such a system, programmers are requested to manage diverse OpenCL-enabled devices explicitly, including distributing […]
Oct, 19
Construction of a Virtual Cluster by Integrating PCI Pass-Through for GPU and InfiniBand Virtualization in Cloud
At present, NVIDIA’s CUDA can support programmers to develop highly parallel applications. It utilizes some parallel construct concepts: hierarchical thread blocks, shared memory, and barrier synchronization. CUDA development programs can be used to achieve amazing acceleration. The graphics processor is able to play an important role in cloud computing in a cluster environment, because it […]
Oct, 19
Early Experiences in Running Many-Task Computing Workloads on GPGPUs
This work aims to enable Swift to efficiently use accelerators (such as NVIDIA GPUs) to further accelerate a wide range of applications. This work presents preliminary results in the costs associated with managing and launching concurrent kernels on NVIDIA Kepler GPUs. We expect our results to be applicable to several XSEDE resources, such as Forge, […]
Oct, 18
VDBSCAN+: Performance Optimization Based on GPU Parallelism
Spatial data mining techniques enable the knowledge extraction from spatial databases. However, the high computational cost and the complexity of algorithms are some of the main problems in this area. This work proposes a new algorithm referred to as VDBSCAN+, which derived from the algorithm VDBSCAN (Varied Density Based Spatial Clustering of Applications with Noise) […]
Oct, 18
Progressive Photon Mapping on GPUs
Physically based rendering using ray tracing is capable of producing realistic images of much higher quality than other methods. However, the computational costs associated with exploring all paths of light are huge; it can take hours to render high quality images of complex scenes. Using graphics processing units has emerged as a popular way to […]
Oct, 18
OpenACC-based Snow Simulation
In recent years, the GPU platform has risen in popularity in high performance computing due to its cost effectiveness and high computing power offered through its many parallel cores. The GPUs computing power can be harnessed using the low-level GPGPU programming APIs CUDA and OpenCL. While both CUDA and OpenCL gives the programmer fine-grained control […]
Oct, 18
Heterogeneous Clustering with Homogeneous Code: Accelerate MPI Applications Without Code Surgery Using Intel Xeon Phi Coprocessors
This paper reports on our experience with a heterogeneous cluster execution environment, in which a distributed parallel application utilizes two types of compute devices: those employing general-purpose processors, and those based on computing accelerators known as Intel Xeon Phi coprocessors. Unlike general-purpose graphics processing units (GPGPUs), Intel Xeon Phi coprocessors are able to execute native […]
Oct, 18
Towards Code Generation from Design Models for Embedded Systems on Heterogeneous CPU-GPU Platforms
The complexity of modern embedded systems is ever increasing and the selection of target platforms is shifting from homogeneous to more heterogeneous and powerful configurations. In our previous works, we exploited the power of model-driven techniques to deal with such complexity by enabling the automatic generation of full-fledged functional code from UML models enriched with […]