Jun, 18

Expansion Techniques for Collisionless Stellar Dynamical Simulations

We present GPU implementations of two fast force calculation methods, based on series expansions of the Poisson equation. One is the Self-Consistent Field (SCF) method, which is a Fourier-like expansion of the density field in some basis set; the other is the Multipole Expansion (MEX) method, which is a Taylor-like expansion of the Green’s function. […]
Jun, 17

On the Performance Portability of Structured Grid Codes on Many-Core Computer Architectures

With the advent of many-core computer architectures such as GPGPUs from NVIDIA and AMD, and more recently Intel’s Xeon Phi, ensuring performance portability of HPC codes is potentially becoming more complex. In this work we have focused on one important application area — structured grid codes — and investigated techniques for ensuring performance portability across […]
Jun, 17

A Portable OpenCL Lattice Boltzmann Code for Multi- and Many-core Processor Architectures

The architecture of high performance computing systems is becoming more and more heterogeneous, as accelerators play an increasingly important role alongside traditional CPUs. Programming heterogeneous systems efficiently is a complex task, that often requires the use of specific programming environments. Programming frameworks supporting codes portable across different high performance architectures have recently appeared, but one […]
Jun, 17

An Improved Monte Carlo Ray Tracing for Large-Scale Rendering in Hadoop

To improve the performance of large-scale rendering, it requires not only a good view of data structure, but also less disk and network access, especially for achieving the realistic visual effects. This paper presents an optimization method of global illumination rendering for large datasets. We improved the previous rendering algorithm based on Monte Carlo ray […]
Jun, 17

A CUDA based Solution to the Multidimensional Knapsack Problem Using the Ant Colony Optimization

The Multidimensional Knapsack Problem (MKP) is a generalization of the basic Knapsack Problem, with two or more constraints. It is an important optimization problem with many real-life applications. To solve this NP-hard problem we use a metaheuristic algorithm based on ant colony optimization (ACO). Since several steps of the algorithm can be carried out concurrently, […]
Jun, 17

HAM – Heterogenous Active Messages for Efficient Offloading on the Intel Xeon Phi

The applicability of accelerators is limited by the attainable speed-up for the offloaded computations and by the offloading overheads. While GPU programming models like CUDA and OpenCL only allow to optimise the application code and its speed-up, the available low-level APIs for the Intel Xeon Phi provide opportunity to address the overheads, too. This work […]
Jun, 17

GPU Implementation of Bayesian Neural Network Construction for Data-Intensive Applications

We describe a graphical processing unit (GPU) implementation of the Hybrid Markov Chain Monte Carlo (HMC) method for training Bayesian Neural Networks (BNN). Our implementation uses NVIDIA’s parallel computing architecture, CUDA. We briefly review BNNs and the HMC method and we describe our implementations and give preliminary results.
Jun, 17

Synergia CUDA: GPU-accelerated accelerator modeling package

Synergia is a parallel, 3-dimensional space-charge particle-in-cell accelerator modeling code. We present our work porting the purely MPI-based version of the code to a hybrid of CPU and GPU computing kernels. The hybrid code uses the CUDA platform in the same framework as the pure MPI solution. We have implemented a lock-free collaborative charge-deposition algorithm […]
Jun, 16

Divide and Conquer G-Buffer Ray Tracing

Many real time computer graphics applications strive for realism, though they have difficulty achieving reflections that are fast, respond to scene changes, and work on a variety of surfaces. This thesis explores an alternative to existing techniques for real time reflections. Ray tracing, a slow technique that does well at physically modelling light, is combined […]
Jun, 16

An in-depth performance analysis of irregular workloads on VLIW APU

Heterogeneous multi-core architectures have a higher performance/power ratio than traditional homogeneous architectures. Due to their heterogeneity, these architectures support diverse applications but developing parallel algorithms on these architectures can be difficult. In implementing algorithms for heterogeneous systems, proprietary languages are often required, limiting portability. Although general purpose graphics processing units (GPUs) have shown great promise […]
Jun, 16

Improved Distance Weighted GPU-based 3D Ultrasound Reconstruction Methods

Ultrasound is a flexible medical imaging modality with many uses, one of them being intra-operative imaging for use in navigation. In order to obtain the highest possible spatial resolution and avoiding big, clunky 3D ultra-sound probes, reconstruction of many 2D ultrasound images obtained by a conventional 2D ultrasound probe with a tracking system attached has […]
Jun, 16

NaNet: a Low-Latency, Real-Time, Multi-Standard Network Interface Card with GPUDirect Features

While the GPGPU paradigm is widely recognized as an effective approach to high performance computing, its adoption in low-latency, real-time systems is still in its early stages. Although GPUs typically show deterministic behaviour in terms of latency in executing computational kernels as soon as data is available in their internal memories, assessment of real-time features […]
Page 30 of 754« First...1020...2829303132...405060...Last »

* * *

* * *

Like us on Facebook

HGPU group

149 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1239 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: AMD APP SDK 2.9
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 6.0.1, AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us: