13397
Lukasz Laniewski-Wollk, Jacek Rokicki
In this paper we present a topology optimization technique applicable to a broad range of flow design problems. We propose also a discrete adjoint formulation effective for a wide class of Lattice Boltzmann Methods (LBM). This adjoint formulation is used to calculate sensitivity of the LBM solution to several type of parameters, both global and […]
View View   Download Download (PDF)   
Benjamin Hernandez, Hugo Perez, Isaac Rudomin, Sergio Ruiz, Oriam DeGyves, Leonel Toledo
We present a set of algorithms for simulating and visualizing real-time crowds in GPU (Graphics Processing Units) clusters. First we will present crowd simulation and rendering techniques that take advantage of single GPU machines, then using as an example a wandering crowd behavior simulation algorithm, we explain how this kind of algorithms can be extended […]
View View   Download Download (PDF)   
Wei Wu, Aurelien Bouteiller, George Bosilca, Mathieu Faverge, Jack Dongarra
Accelerator-enhanced computing platforms have drawn a lot of attention due to their massive peak com-putational capacity. Despite significant advances in the pro-gramming interfaces to such hybrid architectures, traditional programming paradigms struggle mapping the resulting multi-dimensional heterogeneity and the expression of algorithm parallelism, resulting in sub-optimal effective performance. Task-based programming paradigms have the capability to alleviate […]
View View   Download Download (PDF)   
Tianyi David Han, Tarek S. Abdelrahman
The use of local memory is important to improve the performance of OpenCL programs. However, its use may not always benefit performance, depending on various application characteristics, and there is no simple heuristic for deciding when to use it. We develop a machine learning model to decide if the optimization is beneficial or not. We […]
View View   Download Download (PDF)   
Michael Edward Bauer
This thesis covers the design and implementation of Legion, a new programming model and runtime system for targeting distributed heterogeneous machine architectures. Legion introduces logical regions as a new abstraction for describing the structure and usage of program data. We describe how logical regions provide a mechanism for applications to express important properties of program […]
Guray Ozen
The aim of OpenMP which is a well known shared memory programming API, is using shared memory multiprocessor programming with pragma directives easily. Up till now, its interface consisted of task and iteration level parallelism for general purpose CPU. However OpenMP includes in its latest 4.0 specification the accelerator model. OmpSs is an OpenMP extended […]
View View   Download Download (PDF)   
Steven Gurfinkel
Many computer systems now include both CPUs and programmable GPUs. OpenCL, a new programming framework, can program individual CPUs or GPUs; however, distributing a problem across multiple devices is more difficult. This thesis contributes three OpenCL runtimes that automatically distribute a problem across multiple devices: DualCL and m2sOpenCL, which distribute tasks across a single system’s […]
View View   Download Download (PDF)   
Matthew Thomas Calef, John Greaton Wohlbier
We describe the problem of iterating over mesh zones and iterating over material data within a zone, in the context of relatively new compute architectures. We present an example for how this can be done in a way that is portable across parallel programming environments and can be made to perform well. We offer a […]
View View   Download Download (PDF)   
Satoshi Tanaka, Kohji Yoshikawa, Takashi Okamoto, Kenji Hasegawa
We present a new numerical scheme to solve the transfer of diffuse radiation on three-dimensional mesh grids which is efficient on processors with highly parallel architecture such as recently popular GPUs and CPUs with multi- and many-core architectures. The scheme is based on the ray-tracing method and the computational cost is proportional to N^5/3_m where […]
View View   Download Download (PDF)   
Sreeram Potluri
Accelerators (such as NVIDIA GPUs) and coprocessors (such as Intel MIC/Xeon Phi) are fueling the growth of next-generation ultra-scale systems that have high compute density and high performance per watt. However, these many-core architectures cause systems to be heterogeneous by introducing multiple levels of parallelism and varying computation/communication costs at each level. Application developers also […]
View View   Download Download (PDF)   
Mikhail A. Farkov
The vast majority of problems faced by bioinformatics are very complex and time consuming. They require the use of modern high-performance computational systems and the development of algorithms for such system. Heterogeneous computing systems which include graphics processing unit (GPU) occupy a separate niche. Such systems allow to accelerate solving of some task significantly. The […]
View View   Download Download (PDF)   
Jinwoong Kim, Beomseok Nam
The general purpose computing on graphics processing unit (GP-GPU) has emerged as a new cost effective parallel computing paradigm in high performance computing research that enables large amount of data to be processed in parallel. Large scale scientific data intensive applications have been playing an important role in modern high performance computing research. A common […]
View View   Download Download (PDF)   
Page 1 of 1112345...10...Last »

* * *

* * *

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: