12603

Posts

Jul, 29

Parallel Worldline Numerics: Implementation and Error Analysis

We give an overview of the worldline numerics technique, and discuss the parallel CUDA implementation of a worldline numerics algorithm. In the worldline numerics technique, we wish to generate an ensemble of representative closed-loop particle trajectories, and use these to compute an approximate average value for Wilson loops. We show how this can be done […]
Jul, 29

Mixed-precision orthogonalization scheme and its case studies with CA-GMRES on a GPU

We propose a mixed-precision orthogonalization scheme that takes the input matrix in a standard 32 or 64-bit floating-point precision, but uses higher-precision arithmetics to accumulate its intermediate results. For the 64-bit precision, our scheme uses software emulation for the higher-precision arithmetics, and requires about 20x more computation but about the same amount of communication as […]
Jul, 29

Aristotle: A Performance Impact Indicator for the OpenCL Kernels Using Local Memory

Due to the increasing complexity of multi/many-core architectures (with their mix of caches and scratch-pad memories) and applications (with different memory access patterns), the performance of many workloads becomes increasingly variable. In this work, we address one of the main causes for this performance variability: the efficiency of the memory system. Specifically, based on an […]
Jul, 29

Course on Antenna Synthesis (with elements of GPU computing)

I’m pleased to announce the Course on Antenna Synthesis (with elements of GPU computing) organized in the framework of the European School of Antennas. The Course will take place at the Partenope Conference Center of the Università di Napoli Federico II, Napoli, Italy, on October 13-17, 2014. The Course faces three topics corresponding to the […]
Jul, 29

File I/O on Intel Xeon Phi Coprocessors: RAM disks, VirtIO, NFS and Lustre

The key innovation brought about by Intel Xeon Phi coprocessors is the possibility to port most HPC applications to manycore computing accelerators without code modification. One of the reasons why this is possible is support for file input/output (I/O) directly from applications running on coprocessors. These facilities allow seamless usage of manycore accelerators in common […]
Jul, 28

GPU Computing to Improve Game Engine Performance

Although the graphics processing unit (GPU) was originally designed to accelerate the image creation for output to display, today’s general purpose GPU (GPGPU) computing offers unprecedented performance by offloading computing-intensive portions of the application to the GPGPU, while running the remainder of the code on the central processing unit (CPU). The highly parallel structure of […]
Jul, 28

Computational investigation of intense short-wavelength laser interaction with rare gas clusters

Clusters of atoms have remarkable optical properties that were exploited since the antiquity. It was only during the late 20th century though that their production was better controlled and opened the door to a better understanding of matter. Lasers are the tool of choice to study these nanoscopic objects so scientists have been blowing clusters […]
Jul, 28

Ship Detection from SAR Imagery Using CUDA and Performance Analysis of the System

Synthetic aperture radar (SAR) Ship Detection System SDS is an important application from the point of view of Maritime Security monitoring. It allows monitoring traffic, fisheries, naval warfare. Since full-resolution SAR images are heavily affected by the presence of speckle, ship detection algorithms generally employ speckle reduced SAR images at the expense of a degradation […]
Jul, 28

Bayesian model comparison via sequential Monte Carlo

The sequential Monte Carlo (smc) methods have been widely used for modern scientific computation. Bayesian model comparison has been successfully applied in many fields. Yet there have been few researches on the use of smc for the purpose of Bayesian model comparison. This thesis studies different smc strategies for Bayesian model computation. In addition, various […]
Jul, 28

OMP2HMPP: HMPP Source Code Generation from Programs with Pragma Extensions

High-performance computing are based more and more in heterogeneous architectures and GPGPUs have become one of the main integrated blocks in these, as the recently emerged Mali GPU in embedded systems or the NVIDIA GPUs in HPC servers. In both GPGPUs, programming could become a hurdle that can limit their adoption, since the programmer has […]
Jul, 28

Understanding the ISA impact on GPU Architecture

The wide spread acceptance of GPU for parallel computation has created the demand for general purpose capabilities in GPU. In response, Industry is coming up rapidly with better architecture to support general purpose processing on GPUs. NVIDIA has come up with Tesla, Fermi and Kepler architecture. General Purpose Graphics Processing Units (GPGPU) are widely being […]
Jul, 28

Agent-based crowd simulation using GPU computing

The purpose of the research is to investigate agent-based approaches to virtual crowd simulation. Crowds are ubiquitous and are becoming an increasingly common phenomena in modern society, particularly in urban settings. As such, crowd simulation systems are becoming increasingly popular in training simulations, pedestrian modelling, emergency simulations, and multimedia. One of the primary challenges in […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org