Jul, 29

Optimizing Lempel-Ziv Factorization for the GPU Architecture

Lossless data compression is used to reduce storage requirements, allowing for the relief of I/O channels and better utilization of bandwidth. The Lempel-Ziv lossless compression algorithms form the basis for many of the most commonly used compression schemes. General purpose computing on graphic processing units (GPGPUs) allows us to take advantage of the massively parallel […]
Jul, 29

Implicit Methods for Real-Time simulation of Interactive Waves

The project focuses on developing a simulator in which ships and waves interact. The new wave model is the Variational Boussinesq model (VBM). However, this new realistic model brings much more computation effort with it. The VBM mainly requires an unsteady state solver, that solves a coupled system of equations at each frame (20 fps). […]
Jul, 29

Parallel Worldline Numerics: Implementation and Error Analysis

We give an overview of the worldline numerics technique, and discuss the parallel CUDA implementation of a worldline numerics algorithm. In the worldline numerics technique, we wish to generate an ensemble of representative closed-loop particle trajectories, and use these to compute an approximate average value for Wilson loops. We show how this can be done […]
Jul, 29

Mixed-precision orthogonalization scheme and its case studies with CA-GMRES on a GPU

We propose a mixed-precision orthogonalization scheme that takes the input matrix in a standard 32 or 64-bit floating-point precision, but uses higher-precision arithmetics to accumulate its intermediate results. For the 64-bit precision, our scheme uses software emulation for the higher-precision arithmetics, and requires about 20x more computation but about the same amount of communication as […]
Jul, 29

Aristotle: A Performance Impact Indicator for the OpenCL Kernels Using Local Memory

Due to the increasing complexity of multi/many-core architectures (with their mix of caches and scratch-pad memories) and applications (with different memory access patterns), the performance of many workloads becomes increasingly variable. In this work, we address one of the main causes for this performance variability: the efficiency of the memory system. Specifically, based on an […]
Jul, 29

Course on Antenna Synthesis (with elements of GPU computing)

I’m pleased to announce the Course on Antenna Synthesis (with elements of GPU computing) organized in the framework of the European School of Antennas. The Course will take place at the Partenope Conference Center of the Università di Napoli Federico II, Napoli, Italy, on October 13-17, 2014. The Course faces three topics corresponding to the […]
Jul, 29

File I/O on Intel Xeon Phi Coprocessors: RAM disks, VirtIO, NFS and Lustre

The key innovation brought about by Intel Xeon Phi coprocessors is the possibility to port most HPC applications to manycore computing accelerators without code modification. One of the reasons why this is possible is support for file input/output (I/O) directly from applications running on coprocessors. These facilities allow seamless usage of manycore accelerators in common […]
Jul, 28

GPU Computing to Improve Game Engine Performance

Although the graphics processing unit (GPU) was originally designed to accelerate the image creation for output to display, today’s general purpose GPU (GPGPU) computing offers unprecedented performance by offloading computing-intensive portions of the application to the GPGPU, while running the remainder of the code on the central processing unit (CPU). The highly parallel structure of […]
Jul, 28

Computational investigation of intense short-wavelength laser interaction with rare gas clusters

Clusters of atoms have remarkable optical properties that were exploited since the antiquity. It was only during the late 20th century though that their production was better controlled and opened the door to a better understanding of matter. Lasers are the tool of choice to study these nanoscopic objects so scientists have been blowing clusters […]
Jul, 28

Ship Detection from SAR Imagery Using CUDA and Performance Analysis of the System

Synthetic aperture radar (SAR) Ship Detection System SDS is an important application from the point of view of Maritime Security monitoring. It allows monitoring traffic, fisheries, naval warfare. Since full-resolution SAR images are heavily affected by the presence of speckle, ship detection algorithms generally employ speckle reduced SAR images at the expense of a degradation […]
Jul, 28

Bayesian model comparison via sequential Monte Carlo

The sequential Monte Carlo (smc) methods have been widely used for modern scientific computation. Bayesian model comparison has been successfully applied in many fields. Yet there have been few researches on the use of smc for the purpose of Bayesian model comparison. This thesis studies different smc strategies for Bayesian model computation. In addition, various […]
Jul, 28

OMP2HMPP: HMPP Source Code Generation from Programs with Pragma Extensions

High-performance computing are based more and more in heterogeneous architectures and GPGPUs have become one of the main integrated blocks in these, as the recently emerged Mali GPU in embedded systems or the NVIDIA GPUs in HPC servers. In both GPGPUs, programming could become a hurdle that can limit their adoption, since the programmer has […]
Page 1 of 74012345...102030...Last »

* * *

* * *

Like us on Facebook

HGPU group

128 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1193 peoples are following HGPU @twitter

Featured events

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: AMD APP SDK 2.9
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 6.0.1, AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us: