Posts
May, 17
Performance Evaluation of CPU-GPU communication Depending on the Characteristic of Co-Located Workloads
Todays, there are many studies in complicated computation and big data processing by using the high performance computability of GPU. Tesla K20X recently announced by NVIDIA provides 3.95 TFLOPS in precision floating point performance [1]. The performance of K20X is 10 times higher than Intel’s high-end CPUs. Due to the high performance computability of GPU, […]
May, 17
Making the case of GPUs in courses on computational physics
Most relatively modern desktop or even laptop computers contain a graphics card useful for more than showing colors on a screen. In this paper, we make a case for why you should learn enough about GPU (graphics processing unit) computing to use as an accelerator or even replacement to your CPU code. We include an […]
May, 17
The 2013 International Workshop on Embedded Multicore Systems, ICPP-EMS 2013
ICPP-EMS 2013 is organized in conjunction with ICPP 2013 The 42nd International Conference on Parallel Processing. The 2013 International Workshop on Embedded Multicore Systems (ICPP-EMS 2013) will bring researchers and experts together to present and discuss the latest developments and technical solutions concerning various aspects of embedded Multicore computing. ICPP-EMS 2013 seeks original unpublished papers […]
May, 17
3rd International Workshop on Embedded Multi-core Computing and Applications, EMCA 2013
In conjunction with the 15th IEEE International Conference on High Performance Computing and Communications (HPCC 2013). The goal of this workshop is to provide a forum for researchers and practitioners to discuss and share their research and development experiences and outputs on the massively parallel GPU platforms, multi-core system, optimization techniques, parallel algorithm design, applications, […]
May, 17
IS&T/SPIE Electronic Imaging 2014
Mobile Computational Photography 2014, part of program track on Mobile Imaging This conference is intended to bring together world class researchers and practitioners that develop and deploy imaging technologies to enable novel solutions for mobile photography. Submissions are accepted on theory, application, and experience. The scope of the conference includes: Cameras optical designs for ultra […]
May, 17
22nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2014
Special Session on GPU computing The Special Session on GPU Computing and Hybrid Computing aims at providing a forum for scientific researchers and engineers on hot topics related to GPU computing and hybrid computing with special emphasis on applications, performance analysis, programming models and mechanisms for mapping codes. Topics of interest include, but are not […]
May, 16
Point Spread Function Estimation of Solar Surface Images with a Cooperative Particle Swarm Optimization on GPUs
We present a method for estimating the point spread function (PSF) of solar surface images acquired from ground telescopes and degraded by atmosphere. The estimation is done by retrieving the wavefront phase using a set of short exposures, the speckle reconstruction of the observed object and a PSF model parametrized by Zernike polynomials. Estimates of […]
May, 16
Extended Data Collection: Analysis of Cache Behavior and Performance of Different BVH Memory Layouts for Tracing Incoherent Rays
With CPUs moving towards many-core architectures and GPUs becoming more general purpose architectures, path tracing can now be well parallelized on commodity hardware. While parallelization is trivial in theory, properties of real hardware make efficient parallelization difficult, especially when tracing incoherent rays. We investigate how different bounding volume hierarchy (BVH) and node memory layouts as […]
May, 16
Analyzing Locality of Memory References in GPU Architectures
In this paper we advocate formal locality analysis on memory references of GPGPU kernels. We investigate the locality of reference at different cache levels in the memory hierarchy. At the L1 cache level, we look into the locality behavior at the warp-, the thread block- and the streaming multiprocessor-level. Using matrix multiplication as a case […]
May, 16
Stabilized Backward Diffusion for Partial Volume Correction
This paper proposes a novel algorithm for correcting the Partial Volume Effect in Positron Emission Tomography (PET) images, using registered Computed Tomography (CT) data to enhance the blurred PET image. The algorithm is based on a forward-and-backward anisotropic heat equation solver that deblurs the PET image along CT gradients. A forward diffusion force is also […]
May, 16
Gauge fixing in lattice QCD with multi-GPUs
Here we present the cuLGT code for gauge fixing in lattice gauge field theories with graphic processing units (GPUs). Implementations for SU(3) Coulomb, Landau and maximally Abelian gauge fixing are available and the overrelaxation, stochastic relaxation and simulated annealing algorithms are supported. Performance results for single and multi-GPUs are given.
May, 15
CUDA implementation of the solution of a system of linear equations arising in an hp-Finite Element code
The FEM has proven to be one of the most efficient methods for solving differential equations. Designed to run on different computer architectures, technological improvements have led over the years to the fast solution of larger and larger problems. Among these technological improvements, we emphasize the development of GPU (Graphic Processor Unit). Scientific programming in […]