May, 19

Use of modern GPUs in Design Optimization

Graphics Processing Units (GPUs) are a promising alternative hardware to Central Processing Units (CPU) for accelerating applications with a high computational power demand. In many fields researchers are taking advantage of the high computational power present in GPUs to speed up their applications. These applications span from data mining to machine learning and life sciences. […]
May, 19

A GPU-accelerated Navier-Stokes Solver for Steady Turbomachinery Simulations

Any tiny improvement of modern turbomachinery components require nowadays a large amount of design evaluations. Every evaluation runs time consuming simulations. Reducing the computational cost of the simulations allows to run more evaluations, thus reaching a higher design improvement. In this work, an Nvidia Graphics Processing Unit (GPU) of Kepler generation is used to accelerate […]
May, 19

An Interrupt-Driven Work-Sharing For-Loop Scheduler

In this paper we present a parallel for-loop scheduler which is based on work-stealing principles but runs under a completely cooperative scheme. POSIX signals are used by idle threads to interrupt left-behind workers, which in turn decide what portion of their workload can be given to the requester. We call this scheme Interrupt-Driven Work-Sharing (IDWS). […]
May, 18

A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems

As the number of cores on a chip increase and key applications become even more data-intensive, memory systems in modern processors have to deal with increasingly large amount of data. In face of such challenges, data compression presents as a promising approach to increase effective memory system capacity and also provide performance and energy advantages. […]
May, 18

Workshop on Heterogeneous and Unconventional Cluster Architectures and Applications (HUCAA2015), 2015

====================================================================== CALL FOR PAPERS 4th International Workshop on Heterogeneous and Unconventional Cluster Architectures and Applications (HUCAA 2015) http://www.hucaa-workshop.org/hucaa2015 Sept. 8-11, 2015 – Chicago, IL, US In conjunction with IEEE CLUSTER 2015 IEEE International Conference on Cluster Computing ====================================================================== ABOUT THE WORKSHOP The workshop on Heterogeneous and Unconventional Cluster Architectures and Applications gears to gather recent […]
May, 16

A Fast and Rigorously Parallel Surface Voxelization Technique for GPU-Accelerated CFD Simulations

This paper presents a fast surface voxelization technique for the mapping of tessellated triangular surface meshes to uniform and structured grids that provide a basis for CFD simulations with the lattice Boltzmann method (LBM). The core algorithm is optimized for massively parallel execution on graphics processing units (GPUs) and is based on a unique dissection […]
May, 16

Multi-GPU Support on Single Node Using Directive-Based Programming Model

Existing studies show that using single GPU can lead to obtaining significant performance gains. We should be able to achieve further performance speedup if we use more than one GPU. Heterogeneous processors consisting of multiple CPUs and GPUs offer immense potential and are often considered as a leading candidate for porting complex scientific applications. Unfortunately […]
May, 16

Efficient Resource Scheduling for Big Data Processing on Accelerator-based Heterogeneous Systems

The involvement of accelerators is becoming widespread in the field of heterogeneous processing, performing computation tasks through a wide range of applications. In this paper, we examine the heterogeneity in modern computing systems, particularly, how to achieve a good level of resource utilization and fairness, when multiple tasks with different load and computation ratios are […]
May, 16

Performance Analysis and Efficient Execution on Systems with multi-core CPUs, GPUs and MICs

We carry out a comparative performance study of multi-core CPUs, GPUs and Intel Xeon Phi (Many Integrated Core – MIC) with a microscopy image analysis application. We experimentally evaluate the performance of computing devices on core operations of the application. We correlate the observed performance with the characteristics of computing devices and data access patterns, […]
May, 16

Using Butterfly-Patterned Partial Sums to Optimize GPU Memory Accesses for Drawing from Discrete Distributions

We describe a technique for drawing values from discrete distributions, such as sampling from the random variables of a mixture model, that avoids computing a complete table of partial sums of the relative probabilities. A table of alternate ("butterfly-patterned") form is faster to compute, making better use of coalesced memory accesses. From this table, complete […]
May, 15

Speeding up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves

Deep neural networks (DNNs) show very strong performance on many machine learning problems, but they are very sensitive to the setting of their hyperparameters. Automated hyperparameter optimization methods have recently been shown to yield settings competitive with those found by human experts, but their widespread adoption is hampered by the fact that they require more […]
May, 15

MRCUDA: MapReduce Acceleration Framework Based on GPU

GPU programming model for general purpose computing is complex and difficult to be maintained. A MapReduce acceleration framework named MRCUDA is designed and implemented in this paper. There are four loosely coupled stages in MRCUDA, including Pre-Processing, Map, Group and Reduce, which can support flexible configurations for different applications. In order to take full advantage […]
Page 2 of 80412345...102030...Last »

* * *

* * *

Like us on Facebook

HGPU group

243 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1468 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: