Jerome Kieffer, Giannis Ashiotis
The pyFAI package has been designed to reduce X-ray diffraction images into powder diffraction curves to be further processed by scientists. This contribution describes how to convert an image into a radial profile using the Numpy package, how the process was accelerated using Cython. The algorithm was parallelised, needing a complete re-design to benefit from […]
A. C. Mallinson, D. A. Beckingsale, W. P. Gaudin, J. A. Herdman, S. A. Jarvis
Significantly increasing intra-node parallelism is widely recognised as being a key prerequisite for reaching exascale levels of computational performance. In future exascale systems it is likely that this performance improvement will be realised by increasing the parallelism available in traditional CPU devices and using massively-parallel hardware accelerators. The MPI programming model is starting to reach […]
Shuhao Zhang, Jiong He, Bingsheng He, Mian Lu
Driven by the rapid hardware development of parallel CPU/GPU architectures, we have witnessed emerging relational query processing techniques and implementations on those parallel architectures. However, most of those implementations are not portable across different architectures, because they are usually developed from scratch and target at a specific architecture. This paper proposes a kernel-adapter based design […]
View View   Download Download (PDF)   
Simon John Pennycook
The gap between a supercomputer’s theoretical maximum ("peak") floating-point performance and that actually achieved by applications has grown wider over time. Today, a typical scientific application achieves only 5-20% of any given machine’s peak processing capability, and this gap leaves room for significant improvements in execution times. This problem is most pronounced for modern "accelerator" […]
S.J. Pennycook, S.D. Hammond, S.A. Wright, J.A. Herdman, I. Miller, S.A. Jarvis
This paper reports on the development of an MPI/OpenCL implementation of LU, an application-level benchmark from the NAS Parallel Benchmark Suite. An account of the design decisions addressed during the development of this code is presented, demonstrating the importance of memory arrangement and work-item/work-group distribution strategies when applications are deployed on different device types. The […]
S. J. Pennycook, S. A. Jarvis
This paper investigates the development of a molecular dynamics code that is highly portable between architectures. Using OpenCL, we develop an implementation of Sandia’s miniMD benchmark that achieves good levels of performance across a wide range of hardware: CPUs, discrete GPUs and integrated GPUs. We demonstrate that the performance bottlenecks of miniMD’s short-range force calculation […]
Jerome Kieffer, Dimitrios Karkoulis
2D area detectors like ccd or pixel detectors have become popular in the last 15 years for diffraction experiments (e.g. for waxs, saxs, single crystal and powder diffraction (xrpd)). These detectors have a large sensitive area of millions of pixels with high spatial resolution. The software package pyFAI has been designed to reduce saxs, waxs […]
Ewa Niewiadomska-Szynkiewicz, Michal Marks, Jaroslaw Jantura, Mikolaj Podbielski
The main advantage of a distributed computing system over standalone computer is an ability to share the workload between cores, processors and computers. In our paper we present a hybrid cluster system – a novel computing architecture with multi-core CPUs working together with many-core GPUs. It integrates two types of CPU, i.e., Intel and AMD […]
View View   Download Download (PDF)   
Michal Marks, Jaroslaw Jantura, Ewa Niewiadomska-Szynkiewicz, Przemyslaw Strzelczyk, Krzysztof Gozdz
This paper addresses issues associated with distributed computing systems and the application of mixed GPU&CPU technology to data encryption and decryption algorithms. We describe a heterogenous cluster HGCC formed by two types of nodes: Intel processor with NVIDIA graphics processing unit and AMD processor with AMD graphics processing unit (formerly ATI), and a novel software […]
View View   Download Download (PDF)   
Ali Khajeh Saeed
Graphics processing units function well as high performance computing devices for scientific computing. The non-standard processor architecture and high memory bandwidth allow graphics processing units (GPUs) to provide some of the best performance in terms of FLOPS per dollar. Recently these capabilities became accessible for general purpose computations with the CUDA programming environment on NVIDIA […]
View View   Download Download (PDF)   
Girish Ravunnikutty, Rejith George Joseph, Sanjay Ranka, Alin Dobra
Ensemble problems uses multiple models generated from a data set to improve the correctness and ensure faster convergence. The use of multiple models makes ensemble problems computationally intensive. In this paper, we explore the parallelization of ensemble problems on modern multicore hardware like CPUs and GPUs. We use the K-means clustering algorithm as a case […]
View View   Download Download (PDF)   
Cathal O'Broin, Lampros A. A. Nikolopoulos
Open Computing Language (OpenCL) is a parallel processing language that is ideally suited for running parallel algorithms on Graphical Processing Units (GPUs). In the present work we report the development of a generic parallel single-GPU code for the numerical solution of a system of first-order ordinary differential equations (ODEs) based on the openCL model. We […]
View View   Download Download (PDF)   

* * *

* * *

Follow us on Twitter

HGPU group

1658 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

335 people like HGPU on Facebook

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: