May, 20

A Performance and Scalability Analysis of the Tsunami Simulation EasyWave for Different Multi-Core Architectures and Programming Models

In this paper, the performance and scalability of different multi-core systems is experimentally evaluated for the Tsunami simulation EasyWave. The target platforms include a standard Ivy Bridge Xeon processor, an Intel Xeon Phi accelerator card, and also a GPU. OpenMP, MPI and CUDA were used to parallelize the program to these platforms. The absolute performance […]
May, 20

Physically Based Rendering: Implementation of Path Tracer

The main topic of this thesis was to implement a computer program that can render photorealistic images by simulating the laws of physics. In practice the program builds an image by finding every possible path that a light ray can travel. Technique presented in this thesis will naturally simulate many physical phenomenons such as reflections, […]
May, 20

Kalman Filter Tracking on Parallel Architectures

Power density constraints are limiting the performance improvements of modern CPUs. To address this we have seen the introduction of lower-power, multi-core processors, but the future will be even more exciting. In order to stay within the power density limits but still obtain Moore’s Law performance/price gains, it will be necessary to parallelize algorithms to […]
May, 20

U-Net: Convolutional Networks for Biomedical Image Segmentation

There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a […]
May, 20

An Efficient, Automatic Approach to High Performance Heterogeneous Computing

Users of heterogeneous computing systems face two problems: firstly, understanding the trade-off relationship between the observable characteristics of their applications, such as latency and quality of the result, and secondly, how to exploit knowledge of these characteristics to allocate work to distributed resources efficiently. A domain specific approach addresses both of these problems. By considering […]
May, 19

CHO: Towards a Benchmark Suite for OpenCL FPGA Accelerators

Programming FPGAs with OpenCL-based high-level synthesis frameworks is gaining attention with a number of commercial and research frameworks announced. However, there are no benchmarks for evaluating these frameworks. To this end, we present CHO benchmark suite an extension of CHStone, a commonly used C-based high-level synthesis benchmark suite, for OpenCL. We characterise CHO at various […]
May, 19

Optimizing Full Correlation Matrix Analysis of fMRI Data on Intel Xeon Phi Coprocessors

Full correlation matrix analysis (FCMA) is an unbiased approach for exhaustively studying interactions among brain regions in functional magnetic resonance imaging (fMRI) data from human participants. In order to answer neuro-scientific questions efficiently, we are developing a closedloop analysis system with FCMA on a cluster of nodes with Intel Xeon Phi coprocessors. We have proposed […]
May, 19

Use of modern GPUs in Design Optimization

Graphics Processing Units (GPUs) are a promising alternative hardware to Central Processing Units (CPU) for accelerating applications with a high computational power demand. In many fields researchers are taking advantage of the high computational power present in GPUs to speed up their applications. These applications span from data mining to machine learning and life sciences. […]
May, 19

A GPU-accelerated Navier-Stokes Solver for Steady Turbomachinery Simulations

Any tiny improvement of modern turbomachinery components require nowadays a large amount of design evaluations. Every evaluation runs time consuming simulations. Reducing the computational cost of the simulations allows to run more evaluations, thus reaching a higher design improvement. In this work, an Nvidia Graphics Processing Unit (GPU) of Kepler generation is used to accelerate […]
May, 19

An Interrupt-Driven Work-Sharing For-Loop Scheduler

In this paper we present a parallel for-loop scheduler which is based on work-stealing principles but runs under a completely cooperative scheme. POSIX signals are used by idle threads to interrupt left-behind workers, which in turn decide what portion of their workload can be given to the requester. We call this scheme Interrupt-Driven Work-Sharing (IDWS). […]
May, 18

A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems

As the number of cores on a chip increase and key applications become even more data-intensive, memory systems in modern processors have to deal with increasingly large amount of data. In face of such challenges, data compression presents as a promising approach to increase effective memory system capacity and also provide performance and energy advantages. […]
May, 18

Workshop on Heterogeneous and Unconventional Cluster Architectures and Applications (HUCAA2015), 2015

====================================================================== CALL FOR PAPERS 4th International Workshop on Heterogeneous and Unconventional Cluster Architectures and Applications (HUCAA 2015) http://www.hucaa-workshop.org/hucaa2015 Sept. 8-11, 2015 – Chicago, IL, US In conjunction with IEEE CLUSTER 2015 IEEE International Conference on Cluster Computing ====================================================================== ABOUT THE WORKSHOP The workshop on Heterogeneous and Unconventional Cluster Architectures and Applications gears to gather recent […]
Page 1 of 80312345...102030...Last »

* * *

* * *

Like us on Facebook

HGPU group

243 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1466 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: