Dec, 16

An Optimized GPU Memory Hierarchy Design for an OpenCL Kernel

With the advent of multi and many-core processors, communication has replaced computation as the performance bottleneck. Most current approaches to the problem try to tolerate memory access latency through a high amount of Thread-Level Parallelism. However, not all applications benefit from such techniques and there is a need to address the weakness of the underlying […]
Dec, 16

Scaling behavior of topologically constrained polymer rings in a melt

Large scale molecular dynamics simulations on graphic processing units (GPUs) are employed to study the scaling behavior of ring polymers with various topological constraints in melts. Typical sizes of rings containing $3_1$, $5_1$ knots and catenanes made up of two unknotted rings scale like $N^{1/3}$ in the limit of large ring sizes $N$. This is […]
Dec, 16

MatConvNet – Convolutional Neural Networks for MATLAB

MatConvNet is an implementation of Convolutional Neural Networks (CNNs) for MATLAB. The toolbox is designed with an emphasis on simplicity and flexibility. It exposes the building blocks of CNNs as easy-to-use MATLAB functions, providing routines for computing linear convolutions with filter banks, feature pooling, and many more. In this manner, MatConvNet allows fast prototyping of […]
Dec, 15

Minerva: A Scalable and Highly Efficient Training Platform for Deep Learning

The tooling landscape of deep learning is fragmented by a growing gap between the generic and productivity-oriented tools that optimize for algorithm development and the task-specific ones that optimize for speed and scale. This creates an artificial barrier to bring new innovations into real-world applications. Minerva addresses this issue with a layered design that provides […]
Dec, 15

Bayesian neural networks for detecting epistasis in genetic association studies

BACKGROUND: Discovering causal genetic variants from large genetic association studies poses many difficult challenges. Assessing which genetic markers are involved in determining trait status is a computationally demanding task, especially in the presence of gene-gene interactions. RESULTS: A non-parametric Bayesian approach in the form of a Bayesian neural network is proposed for use in analyzing […]
Dec, 15

Analysis and Optimization Techniques for Massively Parallel Processors

In response to the ever growing demand for computing power, heterogeneous parallelism has emerged as a widespread computing paradigm in the past decade or so. In particular, massively parallel processors such as graphics processing units (GPUs) have become the prevalent throughput computing elements in heterogeneous systems, offering high performance and power efficiency for general-purpose workloads. […]
Dec, 15

Easy-to-Use On-the-Fly Binary Program Acceleration on Many-Cores

This paper introduces Binary Acceleration At Runtime (BAAR), an easy-to-use on-the-fly binary acceleration mechanism which aims to tackle the problem of enabling existent software to automatically utilize accelerators at runtime. BAAR is based on the LLVM Compiler Infrastructure and has a client-server architecture. The client runs the program to be accelerated in an environment which […]
Dec, 15

Performance Comparison of GPUs with a Genetic Algorithm based on CUDA

Generally genetic algorithm (GA) has disadvantage of taking a lot of computation time, and it is worth reducing the execution time while keeping good quality and result. Comparative experiments are conducted with one CPU and four GPUs using CUDA (Compute Unified Device Architecture) and generational GA. We implement the fitness functions of the GA which […]
Dec, 15

Bamboo: Automatic Translation of MPI Source into a Latency-Tolerant Form

Communication remains a significant barrier to scalability on distributed-memory systems. At present, the trend in architectural system design, which focuses on enhancing node performance, exacerbates the communication problem, since the relative cost of communication grows as the computation rate increases. This problem will be more pronounced at the exascale, where computational rates will be orders […]
Dec, 14

Heuristics for Conversion Process of GPU’s Kernels for Multiples Kernels with Concurrent Optimization Divergence

Graphics Processing Units have been created with the objective of accelerating the construction and processing of graphic images. In its historical evolution line, concerned with the large computational capacity inherent, these devices started to be used for general purposes. However, the design of the GPUs don’t work well with divergent algorithms, mainly conditionals and repetitions. […]
Dec, 14

Locality-Aware Automatic Parallelization for GPGPU with OpenHMPP Directives

The use of GPUs for general purpose computation has increased dramatically in the past years due to the rising demands of computing power and their tremendous computing capacity at low cost. Hence, new programming models have been developed to integrate these accelerators with high-level programming languages, giving place to heterogeneous computing systems. Unfortunately, this heterogeneity […]
Dec, 14

Acceleration of Hessenberg Reduction for Nonsymmetric Matrix

The worth of finding a general solution for nonsymmetric eigenvalue problems is specified in many areas of engineering and science computations, such as reducing noise to have a quiet ride in automotive industrial engineering or calculating the natural frequency of a bridge in civil engineering. The main objective of this thesis is to design a […]
Page 2 of 77412345...102030...Last »

* * *

* * *

Like us on Facebook

HGPU group

194 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1330 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: AMD APP SDK 2.9
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 6.0.1, AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us: