Amir Gholami, Judith Hill, Dhairya Malhotra, George Biros
We present a new library for parallel distributed Fast Fourier Transforms (FFT). Despite the large amount of work on FFTs, we show that significant speedups can be achieved for distributed transforms. The importance of FFT in science and engineering and the advances in high performance computing necessitate further improvements. AccFFT extends existing FFT libraries for […]
Lucas Benedicic
The complexity of the design of radio networks has grown with the adoption of modern standards. Therefore, the role of the computer for the faster delivery of accurate results has become increasingly important. In this thesis, novel methods for the planning and automatic optimization of radio networks are developed and discussed. The state-of-the-art metaheuristic algorithms, […]
Nicholas P. Bailey, Trond S. Ingebrigtsen, Jesper Schmidt Hansen, Arno A. Veldhorst, Lasse Bohling, Claire A. Lemarchand, Andreas E. Olsen, Andreas K. Bacher, Heine Larsen, Jeppe C. Dyre, Thomas B. Schroder
RUMD is a general purpose, high-performance molecular dynamics (MD) simulation package running on graphical processing units (GPU’s). RUMD addresses the challenge of utilizing the many-core nature of modern GPU hardware when simulating small to medium system sizes (roughly from a few thousand up to hundred thousand particles). It has a performance that is comparable to […]
Trevor L. McDonell, Manuel M. T. Chakravarty, Vinod Grover, Ryan R. Newton
Embedded languages are often compiled at application runtime; thus, embedded compile-time errors become application runtime errors. We argue that advanced type system features, such as GADTs and type families, play a crucial role in minimising such runtime errors. Specifically, a rigorous type discipline reduces runtime errors due to bugs in both embedded language applications and […]
Christopher D. Cooper, Lorena A. Barba
Interactions between surfaces and proteins occur in many vital processes and are crucial in biotechnology: the ability to control specific interactions is essential in fields like biomaterials, biomedical implants and biosensors. In the latter case, biosensor sensitivity hinges on ligand proteins adsorbing on bioactive surfaces with a favorable orientation, exposing reaction sites to target molecules. […]
Petr Pilar
We provide an efficient implementation of existing parameter synthesis techniques for stochastic systems modelled as continuous-time Markov chains (CTMCs). These techniques iteratively decompose the parameter space into its subspaces and approximate the satisfaction function that for any parameter values from the parameter space returns the probability of the formula being satisfied in the CTMC given […]
Peng Li
Graphics processing units (GPUs) are highly parallel processors that are now commonly used in the acceleration of a wide range of computationally intensive tasks. GPU programs often suffer from data races and deadlocks, necessitating systematic testing. Conventional GPU debuggers are ineffective at finding and root-causing races since they detect errors with respect to the specific […]
Moises Vinas, Basilio B. Fraguela, Zeki Bozkus, Diego Andrade
The use of heterogeneous devices is becoming increasingly widespread. Their main drawback is their low programmability due to the large amount of details that must be handled. Another important problem is the reduced code portability, as most of the tools to program them are vendor or device-specific. The exception to this observation is OpenCL, which […]
Vilem Otte
Computer graphics renderers for creating photo-realistic images use mainly unidirectional path tracing, having good results for scenes without caustics or hard cases. There are also few renderers with bi-directional path tracing implementation, however due to the complexity of the algorithm implementation, they almost exclusively target sequential CPUs. The thesis proposes a way of implementation of […]
Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun
State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus […]
Bart van Merrienboer, Dzmitry Bahdanau, Vincent Dumoulin, Dmitriy Serdyuk, David Warde-Farley, Jan Chorowski, Yoshua Bengio
We introduce two Python frameworks to train neural networks on large datasets: Blocks and Fuel. Blocks is based on Theano, a linear algebra compiler with CUDA-support. It facilitates the training of complex neural network models by providing parametrized Theano operations, attaching metadata to Theano’s symbolic computational graph, and providing an extensive set of utilities to […]
W. B. Langdon, Brian Yee Hong Lam
BarraCUDA is a C program which uses the BWA algorithm in parallel with nVidia CUDA to align short next generation DNA sequences against a reference genome. The genetically improved (GI) code is up to three times faster on short paired end reads from The 1000 Genomes Project and 60 percent more accurate on a short […]
Page 1 of 8512345...102030...Last »

* * *

* * *

Follow us on Twitter

HGPU group

1493 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

251 people like HGPU on Facebook

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: