Feb, 10

Patterns and Rewrite Rules for Systematic Code Generation (From High-Level Functional Patterns to High-Performance OpenCL Code)

Computing systems have become increasingly complex with the emergence of heterogeneous hardware combining multicore CPUs and GPUs. These parallel systems exhibit tremendous computational power at the cost of increased programming effort. This results in a tension between achieving performance and code portability. Code is either tuned using device-specific optimizations to achieve maximum performance or is […]
Feb, 10

CAVE-CL: An OpenCL version of the package for detection and quantitative analysis of internal cavities in a system of overlapping balls: application to proteins

Here we present the revised and newly rewritten version of our earlier published CAVE package [J. Busa et al., Comput. Phys. Commun. 181 (2010) 2116] which was originally written in FORTRAN. The package has been rewritten in C language, the algorithm has been parallelized and implemented using OpenCL. This makes the program convenient to run […]
Feb, 9

A Survey of Architectural Techniques For DRAM Power Management

Recent trends of CMOS technology scaling and wide-spread use of multicore processors have dramatically increased the power consumption of main memory. It has been estimated that modern data-centers spend more than 30% of their total power consumption in main memory alone. This excessive power dissipation has created the problem of “memory power wall”; which has […]
Feb, 9

FIR filtering and AES encryption with OpenCL 2.0

OpenCL has become a popular standard to leverage the unique power/performance opportunities found on heterogeneous systems. In this short contribution, we evaluate the latest parallel programming features supported in the OpenCL 2.0 standard. We explore using shared virtual memory and dynamic parallelism to accelerate two example applications.
Feb, 9

Speech Recognition on Modern Graphic Processing Units

Speech Recognition run on Graphic Processing Units (GPUs) has shown some promising performance improvements ranging 2-10x speedups when compare to execution on CPUs. GPU has continued to introduce new programming features, such as Dynamic Parallelism and Hyper-Q, that could further benefit Speech Recognition processing. In this paper we describe a framework developed at Northeastern describing […]
Feb, 9

Fast Subgraph Matching on Large Graphs using Graphics Processors

Subgraph matching is the task of finding all matches of a query graph in a large data graph, which is known as an NP-complete problem. Many algorithms are proposed to solve this problem using CPUs. In recent years, Graphics Processing Units (GPUs) have been adopted to accelerate fundamental graph operations such as breadth-first search and […]
Feb, 9

Sparse Matrix-Vector Multiplication on GPU

Sparse Matrix-Vector multiplication (SpMV) is one of the key operations in linear algebra. Overcoming thread divergence, load imbalance and un-coalesced and indirect memory access due to sparsity and irregularity are challenges to optimizing SpMV on GPUs. This dissertation develops solutions that address these challenges effectively. The first part of this dissertation focuses on a new […]
Feb, 9

Fine-Tuning Vectorization and Memory Traffic on Intel Xeon Phi Coprocessors: LU Decomposition of Small Matrices

Common techniques for fine-tuning the performance of automatically vectorized loops in applications for Intel Xeon Phi coprocessors are discussed. These techniques include strength reduction, regularizing the vectorization pattern, data alignment and aligned data hint, and pointer disambiguation. In addition, the loop tiling technique of memory traffic tuning is shown. The optimization methods are illustrated on […]
Feb, 8

Power Management Techniques for Data Centers: A Survey

With growing use of internet and exponential growth in amount of data to be stored and processed (known as ‘big data’), the size of data centers has greatly increased. This, however, has resulted in significant increase in the power consumption of the data centers. For this reason, managing power consumption of data centers has become […]
Feb, 6

Extending the Gotran framework: LATEX and GPU acceleration

Gotran provides a framework for working with systems of ordinary differential equations (ODEs): Its primary goal is to increase the workflow efficiency of computational modelling in biomedical research. The ODEs, given by the time derivative of state variables, are described in a Gotran form file and can be automatically translated into different outputs depending on […]
Feb, 6

Unlocking Bandwidth for GPUs in CC-NUMA Systems

Historically, GPU-based HPC applications have had a substantial memory bandwidth advantage over CPU-based workloads due to using GDDR rather than DDR memory. However, past GPUs required a restricted programming model where application data was allocated up front and explicitly copied into GPU memory before launching a GPU kernel by the programmer. Recently, GPUs have eased […]
Feb, 6

Nucleation Studies on Graphics Processing Units

A system in a metastable state needs to overcome a certain free energy barrier to form a droplet of the stable phase. Standard treatments assume spherical droplets, but this is not appropriate in the presence of an anisotropy, such as for crystals. The anisotropy of the system has a strong effect on their surface free […]
Page 4 of 787« First...23456...102030...Last »

* * *

* * *

Like us on Facebook

HGPU group

218 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1400 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: