Jen-Cheng Huang
Architecture simulation is an important performance modeling approach. Modeling hardware components with sufficient detail helps architects to identify both hardware and software bottlenecks. However, the major issue of architectural simulation is the huge slowdown compared to native execution. The slowdown gets higher for the emerging workloads that feature high throughput and massive parallelism, such as […]
View View   Download Download (PDF)   
Junjie Lai
In this thesis work, we have mainly worked on two topics of GPU performance analysis. First, we have developed an analytical method and a timing estimation tool (TEG) to predict CUDA application’s performance for GT200 generation GPUs. TEG can predict GPU applications’ performance in cycle-approximate level. Second, we have developed an approach to estimate GPU […]
View View   Download Download (PDF)   
Go Ogiya, Masao Mori, Yohei Miki, Taisuke Boku, Naohito Nakasato
The discrepancy in the mass-density profile of dark matter halos between simulations and observations, the core-cusp problem, is a long-standing open question in the standard paradigm of cold dark matter cosmology. Here, we study the dynamical response of dark matter halos to oscillations of the galactic potential which are induced by a cycle of gas […]
View View   Download Download (PDF)   
Yuan Yuan, Rubao Lee, Xiaodong Zhang
Database community has made significant research efforts to optimize query processing on GPUs in the past few years. However, we can hardly find that GPUs have been truly adopted in major warehousing production systems. Preparing to merge GPUs to the warehousing systems, we have identified and addressed several critical issues in a three-dimensional study of […]
Vitaly Zakharenko
We present FusionSim, a modeling framework capable of cycle-accurate simulation of a complete x86-based computer system with (a) a CPU and a GPU on the same die, and (b) a CPU and a GPU connected as separate components. We use FusionSim to characterize the performance of the Rodinia benchmarks on fused and discrete systems. We […]
Amirsaman Farrokhpanah
A parallel GPU compatible Lagrangian mesh free particle solver for multiphase fluid flow based on SPH scheme is developed and used to capture the interface evolution during droplet impact. Surface tension is modeled employing the multiphase scheme of Hu et al. (2006). In order to precisely simulate the wetting phenomena, a method based on the […]
View View   Download Download (PDF)   
Joseph Issa
The change in processor architectures and 3D benchmarks makes performance characterization important for every processor and 3D application generation. Recent 3D applications require large amount of data to be processed by the GPU and the CPU. This leads to the importance in analyzing processor performance for different architectures and benchmarks so that benchmarks and processors […]
View View   Download Download (PDF)   
Jaewoong Sim, Aniruddha Dasgupta, Hyesoon Kim, and Richard Vuduc
Tuning code for GPGPU and other emerging many-core platforms is a challenge because few models or tools can precisely pinpoint the root cause of performance bottlenecks. In this paper, we present a performance analysis framework that can help shed light on such bottlenecks for GPGPU applications. Although a handful of GPGPU profiling tools exist, most […]
View View   Download Download (PDF)   
Sunpyo Hong, Hyesoon Kim
GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Programming thousands of massively parallel threads is a big challenge for software engineers, but understanding the performance bottlenecks of those parallel programs on GPU architectures to improve application performance is even more dif?cult. Current approaches rely on programmers […]
View View   Download Download (PDF)   
Jiayuan Meng, Vitali A. Morozov, Kalyan Kumaran, Venkatram Vishwanath, Thomas D. Uram
We propose GROPHECY, a GPU performance projection framework that can estimate the performance benefit of GPU acceleration without actual GPU programming or hardware. Users need only to skeletonize pieces of CPU code that are targets for GPU acceleration. Code skeletons are automatically transformed in various ways to mimic tuned GPU codes with characteristics resembling real […]
View View   Download Download (PDF)   
Michael Moeng, Sangyeun Cho, Rami Melhem
Software simulation is the primary tool used for evaluation of processor design. Simulation offers better accuracy than analytical models and is an important evaluation step before actually fabricating a chip. Unfortunately, simulator speeds are slow — a conventional cycle-accurate simulator will be unable to keep up with increasing core counts in modern processor design. Parallel […]
View View   Download Download (PDF)   
Chris McClanahan, Kent Czechowski, Casey Battaglino, Kartik Iyer, P.-K. Yeung, Richard Vuduc
We consider the problem of implementing scalable three-dimensional fast Fourier transforms with an eye toward future exascale systems comprised of graphics co-processor (GPUs) or other similarly high-density compute units. We describe a new software implementation; derive and calibrate a suitable analytical performance model; and use this model to make predictions about potential outcomes at exascale, […]
Page 1 of 212

* * *

* * *

Follow us on Twitter

HGPU group

1662 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

337 people like HGPU on Facebook

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: