Posts
Jun, 17
A Portable OpenCL Lattice Boltzmann Code for Multi- and Many-core Processor Architectures
The architecture of high performance computing systems is becoming more and more heterogeneous, as accelerators play an increasingly important role alongside traditional CPUs. Programming heterogeneous systems efficiently is a complex task, that often requires the use of specific programming environments. Programming frameworks supporting codes portable across different high performance architectures have recently appeared, but one […]
Jun, 17
An Improved Monte Carlo Ray Tracing for Large-Scale Rendering in Hadoop
To improve the performance of large-scale rendering, it requires not only a good view of data structure, but also less disk and network access, especially for achieving the realistic visual effects. This paper presents an optimization method of global illumination rendering for large datasets. We improved the previous rendering algorithm based on Monte Carlo ray […]
Jun, 17
A CUDA based Solution to the Multidimensional Knapsack Problem Using the Ant Colony Optimization
The Multidimensional Knapsack Problem (MKP) is a generalization of the basic Knapsack Problem, with two or more constraints. It is an important optimization problem with many real-life applications. To solve this NP-hard problem we use a metaheuristic algorithm based on ant colony optimization (ACO). Since several steps of the algorithm can be carried out concurrently, […]
Jun, 17
HAM – Heterogenous Active Messages for Efficient Offloading on the Intel Xeon Phi
The applicability of accelerators is limited by the attainable speed-up for the offloaded computations and by the offloading overheads. While GPU programming models like CUDA and OpenCL only allow to optimise the application code and its speed-up, the available low-level APIs for the Intel Xeon Phi provide opportunity to address the overheads, too. This work […]
Jun, 17
GPU Implementation of Bayesian Neural Network Construction for Data-Intensive Applications
We describe a graphical processing unit (GPU) implementation of the Hybrid Markov Chain Monte Carlo (HMC) method for training Bayesian Neural Networks (BNN). Our implementation uses NVIDIA’s parallel computing architecture, CUDA. We briefly review BNNs and the HMC method and we describe our implementations and give preliminary results.
Jun, 17
Synergia CUDA: GPU-accelerated accelerator modeling package
Synergia is a parallel, 3-dimensional space-charge particle-in-cell accelerator modeling code. We present our work porting the purely MPI-based version of the code to a hybrid of CPU and GPU computing kernels. The hybrid code uses the CUDA platform in the same framework as the pure MPI solution. We have implemented a lock-free collaborative charge-deposition algorithm […]
Jun, 16
Divide and Conquer G-Buffer Ray Tracing
Many real time computer graphics applications strive for realism, though they have difficulty achieving reflections that are fast, respond to scene changes, and work on a variety of surfaces. This thesis explores an alternative to existing techniques for real time reflections. Ray tracing, a slow technique that does well at physically modelling light, is combined […]
Jun, 16
An in-depth performance analysis of irregular workloads on VLIW APU
Heterogeneous multi-core architectures have a higher performance/power ratio than traditional homogeneous architectures. Due to their heterogeneity, these architectures support diverse applications but developing parallel algorithms on these architectures can be difficult. In implementing algorithms for heterogeneous systems, proprietary languages are often required, limiting portability. Although general purpose graphics processing units (GPUs) have shown great promise […]
Jun, 16
Improved Distance Weighted GPU-based 3D Ultrasound Reconstruction Methods
Ultrasound is a flexible medical imaging modality with many uses, one of them being intra-operative imaging for use in navigation. In order to obtain the highest possible spatial resolution and avoiding big, clunky 3D ultra-sound probes, reconstruction of many 2D ultrasound images obtained by a conventional 2D ultrasound probe with a tracking system attached has […]
Jun, 16
NaNet: a Low-Latency, Real-Time, Multi-Standard Network Interface Card with GPUDirect Features
While the GPGPU paradigm is widely recognized as an effective approach to high performance computing, its adoption in low-latency, real-time systems is still in its early stages. Although GPUs typically show deterministic behaviour in terms of latency in executing computational kernels as soon as data is available in their internal memories, assessment of real-time features […]
Jun, 16
Grover: Looking for Performance Improvement by Disabling Local Memory Usage in OpenCL Kernels
Due to the diversity of processor architectures and application memory access patterns, the performance impact of using local memory in OpenCL kernels has become unpredictable. For example, enabling the use of local memory for an OpenCL kernel can be beneficial for the execution on a GPU, but can lead to performance losses when running on […]
Jun, 15
Toward OpenCL Automatic Multi-Device Support
To fully tap into the potential of today heterogeneous machines, offloading parts of an application on accelerators is no longer sufficient. The real challenge is to build systems where the application would permanently spread across the entire machine, that is, where parallel tasks would be dynamically scheduled over the full set of available processing units. […]