13242
Wenhao Jia
In response to the ever growing demand for computing power, heterogeneous parallelism has emerged as a widespread computing paradigm in the past decade or so. In particular, massively parallel processors such as graphics processing units (GPUs) have become the prevalent throughput computing elements in heterogeneous systems, offering high performance and power efficiency for general-purpose workloads. […]
View View   Download Download (PDF)   
Owe Philipsen, Christopher Pinke, Alessandro Sciarra, Matthias Bach
We present the Lattice QCD application CL2QCD, which is based on OpenCL and can be utilized to run on Graphic Processing Units as well as on common CPUs. We focus on implementation details as well as performance results of selected features. CL2QCD has been successfully applied in LQCD studies at finite temperature and density and […]
Zachary Langbert, Mark C. Lewis
Physically accurate hard sphere collisions are inherently sequential as the order in which collisions occur can have a significant impact on the resulting system. This makes processing hard sphere collisions on parallel hardware challenging. We present an approach to solving this problem that can be implemented using OpenCL that runs on current hardware. This approach […]
View View   Download Download (PDF)   
Simon Naude
The graphics processing unit (GPU) has seen significant increase in performance over the past few years. Hence the interest in using GPUs for more general purposes has increased. The higher number of cores on a GPU allows it to outperform central processing units (CPUs). However, since in certain aspects instructions executed on the GPU must […]
Ru Zhu
A micromagnetic simulator running on graphics processing unit (GPU) is presented. It achieves significant performance boost as compared to previous central processing unit (CPU) simulators, up to two orders of magnitude for large input problems. Different from GPU implementations of other research groups, this simulator is developed with C++ Accelerated Massive Parallelism (C++ AMP) and […]
View View   Download Download (PDF)   
Kazuya Matsumoto, Naohito Nakasato, Stanislav Sedukhin
This paper presents an implementation of different matrix-matrix multiplication routines in OpenCL. We utilize the high-performance GEMM (GEneral Matrix-Matrix Multiply) implementation from our previous work for the present implementation of other matrix-matrix multiply routines in Level-3 BLAS (Basic Linear Algebra Subprograms). The other routines include SYMM (Symmetric Matrix-Matrix Multiply), SYRK (Symmetric Rank-K Update), SYR2K (Symmetric […]
View View   Download Download (PDF)   
Lokendra Singh Panwar
Today, heterogeneous computing has truly reshaped the way scientists think and approach high-performance computing (HPC). Hardware accelerators such as general-purpose graphics processing units (GPUs) and Intel Many Integrated Core (MIC) architecture continue to make in-roads in accelerating large-scale scientific applications. These advancements, however, introduce new sets of challenges to the scientific community such as: selection […]
View View   Download Download (PDF)   
Matthaus Wander, Lorenz Schwittmann, Christopher Boelmann, Torben Weis
When a client queries for a non-existent name in the Domain Name System (DNS), the server responds with a negative answer. With the DNS Security Extensions (DNSSEC), the server can either use NSEC or NSEC3 for authenticated negative answers. NSEC3 claims to protect DNSSEC servers against domain enumeration, but incurs significant CPU and bandwidth overhead. […]
Yuan Wen, Zheng Wang, Michael F.P. O'Boyle
Heterogeneous systems consisting of multiple CPUs and GPUs are increasingly attractive as platforms for high performance computing. Such platforms are usually programmed using OpenCL which provides program portability by allowing the same program to execute on different types of device. As such systems become more mainstream, they will move from application dedicated devices to platforms […]
View View   Download Download (PDF)   
Dale Tristram, Karen Bradshaw
General-purpose computation on graphics processing units (GPGPU) has great potential to accelerate many scientific models and algorithms. However, some problems are considerably more difficult to accelerate than others, and it may be challenging for those new to GPGPU to ascertain the difficulty of accelerating a particular problem. Through what was learned in the acceleration of […]
View View   Download Download (PDF)   
Thomas R. W. Scogland, Wu-chun Feng
As core counts increase and as heterogeneity becomes more common in parallel computing, we face the prospect of programming hundreds or even thousands of concurrent threads in a single shared-memory system. At these scales, even highly-efficient concurrent algorithms and data structures can become bottlenecks, unless they are designed from the ground up with throughput as […]
View View   Download Download (PDF)   
Olav Aanes Fagerlund, Takeshi Kitayama, Gaku Hashimoto, Hiroshi Okuda
In the finite element method simulation we often deal with large sparse matrices. Sparse matrix-vector multiplication (SpMV) is of high importance for iterative solvers. During the solver stage, most of the time is in fact spent in the SpMV routine. The SpMV routine is highly memory-bound; the processor spends much time waiting for the needed […]
View View   Download Download (PDF)   
Page 1 of 812345...Last »

* * *

* * *

Like us on Facebook

HGPU group

193 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1329 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: AMD APP SDK 2.9
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 6.0.1, AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us: