12609

Posts

Jul, 30

Scaling Multifluid Compressible Fluid Dynamics to 700,000 cores, 1.5 Pflop/s, and a Trillion Grid Cells

We are using the Blue Waters system at NCSA to study compressible, turbulent mixing of gases in the deep interiors of stars and also in the context of inertial confinement fusion (ICF). In December, 2012, during the Blue Waters friendly user access period, we carried out a simulation of an ICF test problem on a […]
Jul, 30

Research on Parallel DVH Statistic Based on CUDA

Dose Volume Histogram(DVH) is necessary for evaluating radiotherapy planning. With the increase of patient CT slices and the development of intensity-modulated radiation therapy(IMRT) technology, statistical process of DVH requires a large number of cubic interpolation calculation, and the sequential single threaded DVH code on the CPU can not meet the real-time requirement. The paper presents […]
Jul, 30

A CUDA-Based Real Parameter Optimization Benchmark

Benchmarking is key for developing and comparing optimization algorithms. In this paper, a CUDA-based real parameter optimization benchmark (cuROB) is introduced. Test functions of diverse properties are included within cuROB and implemented efficiently with CUDA. Speedup of one order of magnitude can be achieved in comparison with CPU-based benchmark of CEC’14.
Jul, 29

Optimizing Lempel-Ziv Factorization for the GPU Architecture

Lossless data compression is used to reduce storage requirements, allowing for the relief of I/O channels and better utilization of bandwidth. The Lempel-Ziv lossless compression algorithms form the basis for many of the most commonly used compression schemes. General purpose computing on graphic processing units (GPGPUs) allows us to take advantage of the massively parallel […]
Jul, 29

Implicit Methods for Real-Time simulation of Interactive Waves

The project focuses on developing a simulator in which ships and waves interact. The new wave model is the Variational Boussinesq model (VBM). However, this new realistic model brings much more computation effort with it. The VBM mainly requires an unsteady state solver, that solves a coupled system of equations at each frame (20 fps). […]
Jul, 29

Parallel Worldline Numerics: Implementation and Error Analysis

We give an overview of the worldline numerics technique, and discuss the parallel CUDA implementation of a worldline numerics algorithm. In the worldline numerics technique, we wish to generate an ensemble of representative closed-loop particle trajectories, and use these to compute an approximate average value for Wilson loops. We show how this can be done […]
Jul, 29

Mixed-precision orthogonalization scheme and its case studies with CA-GMRES on a GPU

We propose a mixed-precision orthogonalization scheme that takes the input matrix in a standard 32 or 64-bit floating-point precision, but uses higher-precision arithmetics to accumulate its intermediate results. For the 64-bit precision, our scheme uses software emulation for the higher-precision arithmetics, and requires about 20x more computation but about the same amount of communication as […]
Jul, 29

Aristotle: A Performance Impact Indicator for the OpenCL Kernels Using Local Memory

Due to the increasing complexity of multi/many-core architectures (with their mix of caches and scratch-pad memories) and applications (with different memory access patterns), the performance of many workloads becomes increasingly variable. In this work, we address one of the main causes for this performance variability: the efficiency of the memory system. Specifically, based on an […]
Jul, 29

Course on Antenna Synthesis (with elements of GPU computing)

I’m pleased to announce the Course on Antenna Synthesis (with elements of GPU computing) organized in the framework of the European School of Antennas. The Course will take place at the Partenope Conference Center of the Università di Napoli Federico II, Napoli, Italy, on October 13-17, 2014. The Course faces three topics corresponding to the […]
Jul, 29

File I/O on Intel Xeon Phi Coprocessors: RAM disks, VirtIO, NFS and Lustre

The key innovation brought about by Intel Xeon Phi coprocessors is the possibility to port most HPC applications to manycore computing accelerators without code modification. One of the reasons why this is possible is support for file input/output (I/O) directly from applications running on coprocessors. These facilities allow seamless usage of manycore accelerators in common […]
Jul, 28

GPU Computing to Improve Game Engine Performance

Although the graphics processing unit (GPU) was originally designed to accelerate the image creation for output to display, today’s general purpose GPU (GPGPU) computing offers unprecedented performance by offloading computing-intensive portions of the application to the GPGPU, while running the remainder of the code on the central processing unit (CPU). The highly parallel structure of […]
Jul, 28

Computational investigation of intense short-wavelength laser interaction with rare gas clusters

Clusters of atoms have remarkable optical properties that were exploited since the antiquity. It was only during the late 20th century though that their production was better controlled and opened the door to a better understanding of matter. Lasers are the tool of choice to study these nanoscopic objects so scientists have been blowing clusters […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: