14036
Johann Hauswald, Yiping Kang, Michael A. Laurenzano, Quan Chen, Cheng Li, Trevor Mudge, Ronald G. Dreslinski, Jason Mars, Lingjia Tang
As applications such as Apple Siri, Google Now, Microsoft Cortana, and Amazon Echo continue to gain traction, webservice companies are adopting large deep neural networks (DNN) for machine learning challenges such as image processing, speech recognition, natural language processing, among others. A number of open questions arise as to the design of a server platform […]
View View   Download Download (PDF)   
Freddie Astrom, Michael Felsberg
Many image processing methods such as corner detection,optical flow and iterative enhancement make use of image tensors. Generally, these tensors are estimated using the structure tensor. In this work we show that the gradient energy tensor can be used as an alternativeto the structure tensor in several cases. We apply the gradient energy tensor to […]
View View   Download Download (PDF)   
Yushan Wang
In this PhD thesis, we present our research in the domain of high performance software for computational fluid dynamics (CFD). With the increasing demand of high-resolution simulations, there is a need of numerical solvers that can fully take advantage of current manycore accelerated parallel architectures. In this thesis we focus more specifically on developing an […]
View View   Download Download (PDF)   
Prasann Choudhari, Eikshith Baikampadi, Paresh Patil, Sanket Gadekar
The internet is a huge collection of websites in the order of 10^8 bytes. Around 90% of the world’s population uses search engines for getting relevant information. According to Wikipedia, more than 200 million Indians use the Internet every day. Thus the correct data retrieval least time domain is the most important task. Hence need […]
View View   Download Download (PDF)   
Wei Dai, Yarkin Doroz, Berk Sunar
In this work we focus on tailoring and optimizing the computational Private Information Retrieval (cPIR) scheme proposed in WAHC 2014 for efficient execution on graphics processing units (GPUs). Exploiting the mass parallelism in GPUs is a commonly used approach in speeding up cPIRs. Our goal is to eliminate the efficiency bottleneck of the Dor"{o}z et […]
View View   Download Download (PDF)   
Steffen Christgau, Johannes Spazier, Bettina Schnor
In this paper, the performance and scalability of different multi-core systems is experimentally evaluated for the Tsunami simulation EasyWave. The target platforms include a standard Ivy Bridge Xeon processor, an Intel Xeon Phi accelerator card, and also a GPU. OpenMP, MPI and CUDA were used to parallelize the program to these platforms. The absolute performance […]
Olaf Ronneberger, Philipp Fischer, Thomas Brox
There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a […]
C. F. Janssen, N. Koliha, T. Rung
This paper presents a fast surface voxelization technique for the mapping of tessellated triangular surface meshes to uniform and structured grids that provide a basis for CFD simulations with the lattice Boltzmann method (LBM). The core algorithm is optimized for massively parallel execution on graphics processing units (GPUs) and is based on a unique dissection […]
View View   Download Download (PDF)   
George Teodoro, Tahsin Kurc, Guilherme Andrade, Jun Kong, Renato Ferreira, Joel Saltz
We carry out a comparative performance study of multi-core CPUs, GPUs and Intel Xeon Phi (Many Integrated Core – MIC) with a microscopy image analysis application. We experimentally evaluate the performance of computing devices on core operations of the application. We correlate the observed performance with the characteristics of computing devices and data access patterns, […]
View View   Download Download (PDF)   
Guy L. Steele Jr. (Oracle Labs), Jean-Baptiste Tristan
We describe a technique for drawing values from discrete distributions, such as sampling from the random variables of a mixture model, that avoids computing a complete table of partial sums of the relative probabilities. A table of alternate ("butterfly-patterned") form is faster to compute, making better use of coalesced memory accesses. From this table, complete […]
View View   Download Download (PDF)   
Tobias Domhan, Jost Tobias Springenberg, Frank Hutter
Deep neural networks (DNNs) show very strong performance on many machine learning problems, but they are very sensitive to the setting of their hyperparameters. Automated hyperparameter optimization methods have recently been shown to yield settings competitive with those found by human experts, but their widespread adoption is hampered by the fact that they require more […]
View View   Download Download (PDF)   
Jie Wang, Yanshuo Yu, Hang Cui, Shenglai Yang
GPU programming model for general purpose computing is complex and difficult to be maintained. A MapReduce acceleration framework named MRCUDA is designed and implemented in this paper. There are four loosely coupled stages in MRCUDA, including Pre-Processing, Map, Group and Reduce, which can support flexible configurations for different applications. In order to take full advantage […]
View View   Download Download (PDF)   
Page 1 of 50212345...102030...Last »

* * *

* * *

Like us on Facebook

HGPU group

244 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1473 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: