Posts
Jun, 21
Continuous Representation of Projected Attribute Spaces of Multifields over Any Spatial Sampling
For the visual analysis of multidimensional data, dimension reduction methods are commonly used to project to a lower-dimensional visual space. In the context of multifields, i.e., volume data with a multidimensional attribute space, the spatial arrangement of the samples in the volumetric domain can be exploited to generate a Continuous Representation of the Projected Attribute […]
Jun, 19
Parallel Algorithms for Hybrid Multi-core CPU-GPU Implementations of Component Labelling in Critical Phase Models
Optimising the use of all the cores of a hybrid multi-core CPU and its accelerating GPUs is becoming increasingly important as such combined systems become widely available. We show how a complex interplay of cross-calling kernels and host components can be used to support good throughput performance on hybrid simulation tasks that have inherently serial […]
Jun, 19
Deep learning with COTS HPC systems
Scaling up deep learning algorithms has been shown to lead to increased performance in benchmark tasks and to enable discovery of complex high-level features. Recent efforts to train extremely large networks (with over 1 billion parameters) have relied on cloud-like computing infrastructure and thousands of CPU cores. In this paper, we present technical details and […]
Jun, 19
Megakernels Considered Harmful: Wavefront Path Tracing on GPUs
When programming for GPUs, simply porting a large CPU program into an equally large GPU kernel is generally not a good approach. Due to SIMT execution model on GPUs, divergence in control flow carries substantial performance penalties, as does high register usage that lessens the latency-hiding capability that is essential for the high-latency, high-bandwidth memory […]
Jun, 19
Real-Time Geometry Decompression on Graphics Hardware
Real-Time Computer Graphics focuses on generating images fast enough to cause the illusion of a continuous motion. It is used in science, engineering, computer games, image processing, and design. Special purpose graphics hardware, a so-called graphics processing unit (GPU), accelerates the image generation process substantially. Therefore, GPUs have become indispensable tools for Real-Time Computer Graphics. […]
Jun, 19
Parallel Asynchronous Modelization and Execution of Cholesky Algorithm using Petri Nets
Parallelization of algorithms with hard data dependencies has a lack of task synchronization. Synchronous parallel versions are simple to model and program, but inefficient in terms of scalability and processors use rate. The same problem for or Asynchronous versions with elemental static task scheduling. Efficient Asynchronous algorithms implements out of order execution and are complex […]
Jun, 18
GPU Matrix Multiplication
Graphics Processing Units (GPUs) were developed originally to meet the computational needs of algorithms for rendering computer graphics. The rapid and enormous growth in sophistication of graphics applications such as computer games has resulted in the availability of GPUs that have hundreds of processors and peak performance near a teraflop and that sell for hundreds […]
Jun, 18
Sorting On A Graphics Processing Unit (GPU)
One of the very first GPU sorting algorithms, an adaptation of bitonic sort, was developed by Govindraju et al. [12]. Since this algorithm was developed before the advent of CUDA, the algorithm was implemented using GPU pixel shaders. Zachmann et al. [13] improved on this sort algorithm by using BitonicT rees to reduce the number […]
Jun, 18
Delaunay Triangulation in R3 on the GPU
The Delaunay triangulation of points in R3 is a fundamental computational geometry structure that is useful for representing and studying objects from the physical world. The 3D Delaunay triangulation has desirable qualities that make it useful in many applications like FEM, surface reconstruction and tessellating solids. Algorithms for 3D Delaunay have been devised that utilize […]
Jun, 18
A GPU Parallelized Spectral Method for Elliptic Equations
We design and implement the first polynomial-based spectral method on graphic processing units (GPUs). The key to success lies in the seamless integration of the matrix diagonalization technique and new generation CUDA tools. The method is applicable to elliptic equations with general boundary conditions in both 2-D and 3-D cases. We show remarkable speedups of […]
Jun, 18
Accelerating GPU Programs by Reducing Irregular Control Flow and Memory Access
The graphics processing unit (GPU) is recently used as a massively parallel processor to speed up general computation. However, the GPU can decrease the performance of irregular computation, because the GPU is based on the single instruction, multiple data (SIMD) architecture. The irregular computations here are conditional branches and memory accesses, which vary the behavior […]
Jun, 17
Auto-Tunning of Data Communication on Heterogeneous Systems
Heterogeneous systems formed by trandional CPUs and compute accelerators, such as GPUs, are becoming widely used to build modern supercomputers. However, many different system topologies, i.e., how CPUs, accelerators, and I/O devices are interconnected, are being deployed. Each system organization presents different trade-offs when transferring data between CPUs, accelerators, and nodes within a cluster, requiring […]