13061

Posts

Nov, 12

Code Optimization on Kepler GPUs and Xeon Phi

Kepler GTX Titan Black and Kepler Tesla K40 are still the best GPUs for high performance computing, although Maxwell GPUs such as GTX 980 are available in the market. Hence, we measure the performance of our lattice QCD codes using the Kepler GPUs. We also upgrade our code to use the latest CPS (Columbia Physics […]
Nov, 9

An Execution Model for OpenCL 2.0

A popular approach to programming manycore GPUs is the Single Instruction Multiple Thread (SIMT) abstraction. SIMT has the benefit of presenting a "single thread" view, alleviating the complexity of explicitly vectorizing the source code. However, due to the SIMD nature of the underlying hardware it is often difficult to fully hide all aspects from the […]
Nov, 9

Real-time 3D Reconstruction for FPGAs: A Case Study for Evaluating the Performance, Area, and Programmability Trade-offs of the Altera OpenCL SDK

Embedding real-time 3D reconstruction of a scene from a low-cost depth sensor can improve the development of technologies in the domains of augmented reality, mobile robotics, and more. However, current implementations require a computer with a powerful GPU, which limits its prospective applications with low-power requirements. To implement low-power 3D reconstruction we embedded two prominent […]
Nov, 9

Relax-Miracle: GPU Parallelization of Semi-Analytic Fourier-Domain solvers for Earthquake Modeling

Effective utilization of GPU processing capacity for scientific workloads is often limited by memory throughput and PCIe communication transfer times. This is particularly true for semi-analytic Fourier-domain computations in earthquake modeling (Relax) where operations on large-scale 3D data structures can require moving large volumes of data from storage to the compute in predictable but orthogonal […]
Nov, 9

Parallel FIM Approach on GPU using OpenCL

In this paper, we describe GPU-Eclat algorithm, a GPU (General Purpose Graphics Processing Unit) enhanced implementation of Frequent Item set Mining (FIM). The frequent itemsets are extracted from a transactional database as it is a essential assignment in data mining field because of its broad applications in mining association rules, time series, correlations etc. The […]
Nov, 9

Dogwild! – Distributed Hogwild for CPU & GPU

Deep learning has enjoyed tremendous success in recent years. Unfortunately, training large models can be very time consuming, even on GPU hardware. We describe a set of extensions to the state of the art Caffe library [3], allowing training on multiple threads and GPUs, and across multiple machines. Our focus is on architecture, implementing asynchronous […]
Nov, 5

Graphics Processing Unit-Based Computer-Aided Design Algorithms for Electronic Design Automation

This dissertation presents research focusing on reshaping the design paradigm of electronic design automation (EDA) applications to embrace the computational throughput of a massively parallel computing architecture. The EDA industry has gone through major evolution in algorithm designs over the past several decades, delivering improved and more sophisticated design tools. Today, these tools provide a […]
Nov, 5

GPU Acceleration of k-Nearest Neighbor Search in Face Classifier based on Eigenfaces

Face recognition is a specialized case of object recognition, and has broad applications in security, surveillance, identity management, law enforcement, human-computer interaction, and automatic photo and video indexing. Because human faces occupy a narrow portion of the total image space, specialized methods are required to identify faces based on subtle differences. One such method is […]
Nov, 5

Parallelization techniques of the x264 video encoder

Higher video quality is demanded by the users of any kind of video stream service, including web applications, High Definition broadcast terrestrial services, etc. All of those video streams are encoded first using a compression format, one of them is H.264/MPEG-4 AVC. The main issue is that the better the quality of the video the […]
Nov, 5

Highly optimized simulations on single- and multi-GPU systems of 3D Ising spin glass

We present a highly optimized implementation of a Monte Carlo (MC) simulator for the three-dimensional Ising spin-glass model with bimodal disorder, i.e., the 3D Edwards-Anderson model running on CUDA enabled GPUs. Multi-GPU systems exchange data by means of the Message Passing Interface (MPI). The chosen MC dynamics is the classic Metropolis one, which is purely […]
Nov, 5

A GPU-Based Wide-Band Radio Spectrometer

The Graphics Processing Unit (GPU) has become an integral part of astronomical instrumentation, enabling high-performance online data reduction and accelerated online signal processing. In this paper, we describe a wide-band reconfigurable spectrometer built using an off-the-shelf GPU card. This spectrometer, when configured as a polyphase filter bank (PFB), supports a dual-polarization bandwidth of up to […]
Nov, 3

Profiling of Data-Parallel Processors

Profiling data can help to improve an application with respect to various objectives like execution time, energy consumption or even thermal sensor placement for an upcoming device. This survey reviews state-of-the-art profiling tools for dataparallel processors like Nsight, PAPI and TAU as well as Lynx. Additionally, the attained knowledge is utilized to detect the bottleneck […]
Page 4 of 768« First...23456...102030...Last »

* * *

* * *

Like us on Facebook

HGPU group

184 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1311 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: AMD APP SDK 2.9
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 6.0.1, AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us: