Sencer Nuri Yeralan, Timothy A. Davis, Sanjay Ranka
Sparse matrix factorization involves a mix of regular and irregular computation, which is a particular challenge when trying to obtain high-performance on the highly parallel general-purpose computing cores available on graphics processing units (GPUs). We present a sparse multifrontal QR factorization method that meets this challenge, and is up to eleven times faster than a […]
Reza Nakhjavani
The ever increasing complexity of scientific applications has led to utilization of new HPC paradigms such as Graphical Processing Units (GPUs). However, modifying applications to run on GPU is challenging. Furthermore, the speedup achieved by using GPUs has added a huge heterogeneity to HPC clusters. In this dissertation, we enabled NPAIRS, a neuro-imaging application, to […]
View View   Download Download (PDF)   
Lena Oden
Today, GPUs and other parallel accelerators are widely used in high performance computing, due to their high computational power and high performance per watt. Still, one of the main bottlenecks of GPU-accelerated cluster computing is the data transfer between distributed GPUs. This not only affects performance, but also power consumption. Often, a data transfer between […]
View View   Download Download (PDF)   
Mark Stephenson, Siva Kumar Sastry Hari, Yunsup Lee, Eiman Ebrahimi, Daniel R. Johnson, David Nellans, Mike O'Connor, Stephen W. Keckler
To aid application characterization and architecture design space exploration, researchers and engineers have developed a wide range of tools for CPUs, including simulators, profilers, and binary instrumentation tools. With the advent of GPU computing, GPU manufacturers have developed similar tools leveraging hardware profiling and debugging hooks. To date, these tools are largely limited by the […]
View View   Download Download (PDF)   
Weifeng Liu, Brian Vinter
General sparse matrix-matrix multiplication (SpGEMM) is a fundamental building block for numerous applications such as algebraic multigrid method (AMG), breadth first search and shortest path problem. Compared to other sparse BLAS routines, an efficient parallel SpGEMM implementation has to handle extra irregularity from three aspects: (1) the number of nonzero entries in the resulting sparse […]
Yuechao Pan, Yangzihao Wang, Yuduo Wu, Carl Yang, John D. Owens
We present a multi-GPU graph processing library that allows programmers to easily extend single-GPU graph algorithms to achieve scalable performance on large graph datasets with billions of edges. Our design only requires users to specify a few algorithm-dependent blocks, hiding most multi-GPU related implementation details. Our design effectively overlaps computation and data transfer and implements […]
View View   Download Download (PDF)   
Yi Hou, Hong Zhang, Shilin Zhou
Deep convolutional neural networks (CNN) have recently been shown in many computer vision and pattern recognition applications to outperform by a significant margin state-of-the-art solutions that use traditional hand-crafted features. However, this impressive performance is yet to be fully exploited in robotics. In this paper, we focus one specific problem that can benefit from the […]
View View   Download Download (PDF)   
Firas Abuzaid, Stefan Hadjis, Ce Zhang, Christopher Re
We present Caffe con Troll (CcT), a fully compatible end-to-end version of the popular framework Caffe with rebuilt internals. We built CcT to examine the performance characteristics of training and deploying general-purpose convolutional neural networks across different hardware architectures. We find that, by employing standard batching optimizations for CPU training, we achieve up to one […]
View View   Download Download (PDF)   
Manuel Marin, David Defour, Federico Milano
This paper proposes a novel representation for symmetric fuzzy numbers that uses the midpoint-radius approach instead of the conventional lower-upper representation. A theoretical analysis based on the alpha-cut concept shows that the proposed format requires half the amount of operations and memory than the traditional one. Also, a novel technique involving radius increments is introduced, […]
View View   Download Download (PDF)   
Haoxiang Li, Zhe Lin, Xiaohui Shen, Jonathan Brandt, Gang Hua
In real-world face detection, large visual variations, such as those due to pose, expression, and lighting, demand an advanced discriminative model to accurately differentiate faces from the backgrounds. Consequently, effective models for the problem tend to be computationally prohibitive. To address these two conflicting challenges, we propose a cascade architecture built on convolutional neural networks […]
View View   Download Download (PDF)   
Rahul Sharma, Michael Bauer, Alex Aiken
Previous efforts to formally verify code written for GPUs have focused solely on kernels written within the traditional data-parallel GPU programming model. No previous work has considered the higher performance, but more complex, warp-specialized kernels based on producer-consumer named barriers available on current hardware. In this work we present the first formal operational semantics for […]
Leonardo Carvalho, Maria Andrade, Luiz Velho
In recent years, many researchers have used the Navier-Stokes equations and Reaction-Diffusion systems for fluid simulation and for the creation of textures on surfaces, respectively. For this purpose it is necessary to obtain information about operators defined on surfaces. We obtained the metric information of the distortion caused by the parametrization of Catmull-Clark subdivision surfaces. […]
View View   Download Download (PDF)   
Page 1 of 49912345...102030...Last »

* * *

* * *

Like us on Facebook

HGPU group

238 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1444 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: