Weibin Sun
As the base of the software stack, system-level software is expected to provide efficient and scalable storage, communication, security and resource management functionalities. However, there are many computationally expensive functionalities at the system level, such as encryption, packet inspection, and error correction. All of these require substantial computing power. What’s more, today’s application workloads have […]
Dinghua Li, Chi-Man Liu, Ruibang Luo, Kunihiko Sadakane, Tak-Wah Lam
MEGAHIT is a NGS de novo assembler for assembling large and complex metagenomics data in a time- and cost-efficient manner. It finished assembling a soil metagenomics dataset with 252Gbps in 44.1 hours and 99.6 hours on a single computing node with and without a GPU, respectively. MEGAHIT assembles the data as a whole, i.e., it […]
Romain Dolbeau
This paper describes & evaluates a fast, hybrid implementation of the Advanced Encryption Standard with 256 bit keys (AES-256) block encryption in Galois/Counter Mode (GCM). The implementation is bit-compatible with the implemented standard in both the OpenSSL and Crypto++ libraries, while significantly (up to three times) faster for large amount of data. In this implementation, […]
Jukka Saarelma, Lauri Savioja
Wave based simulation methods have been utilized to numerically estimate wave propagation in domains where low-frequency wave effects dominate the response. Finite-difference time-domain (FDTD) methods are increasingly useful for such problems, but they require massive spatial oversampling to increase the bandwidth of the simulation, which leads to significant computational expense. The advantage of explicit time-stepping […]
Matthaus Wander, Lorenz Schwittmann, Christopher Boelmann, Torben Weis
When a client queries for a non-existent name in the Domain Name System (DNS), the server responds with a negative answer. With the DNS Security Extensions (DNSSEC), the server can either use NSEC or NSEC3 for authenticated negative answers. NSEC3 claims to protect DNSSEC servers against domain enumeration, but incurs significant CPU and bandwidth overhead. […]
Alastair F. Donaldson
I present a tutorial overview demonstrating the key technique used by GPUVerify, a static verification tool for graphics processing unit (GPU) kernels. The technique is a method for translating a massively parallel GPU kernel into a sequential program such that correctness of the sequential program implies data race-freedom of the parallel kernel.
Gary K. Chen, Eric Chi, John Ranola, Kenneth Lange
The primary goal in cluster analysis is to discover natural groupings of objects. The field of cluster analysis is crowded with diverse methods that make special assumptions about data and address different scientific aims. Despite its shortcomings in accuracy, hierarchical clustering is the dominant clustering method in bioinformatics. Biologists find the trees constructed by hierarchical […]
Havard Heitlo Holm
As parallel and heterogeneous computing becomes more and more a necessity for implementing high performance simulators, it becomes increasingly harder for scientists and engineers without experience in high performance computing to achieve good performance. Even for those who knows how to write efficient code the process for doing so is time consuming and error prone, […]
F. D. Witherden, B. C. Vermeire, P. E. Vincent
PyFR is an open-source high-order accurate computational fluid dynamics solver for mixed unstructured grids that can target a range of hardware platforms from a single codebase. In this paper we demonstrate the ability of PyFR to perform high-order accurate unsteady simulations of flow on mixed unstructured grids using heterogeneous multi-node hardware. Specifically, after benchmarking single-node […]
Sergiy Gogolenko, Zhaojun Bai, Richard Scalettar
We present a block structured orthogonal factorization (BSOF) algorithm and its parallelization for computing the inversion of block p-cyclic matrices.We aim at the high performance on multicores with GPU accelerators. We provide a quantitative performance model for optimal host-device load balance, and validate the model through numerical tests. Benchmarking results show that the parallel BSOF […]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, Trevor Darrell
Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits industry and […]
Gordon Inggs, David Thomas, Wayne Luk
We advocate a domain specific software development methodology for heterogeneous computing platforms such as Multicore CPUs, GPUs and FPGAs. We argue that three specific benefits are realised from adopting such an approach: portable, efficient implementations across heterogeneous platforms; domain specific metrics of quality that characterise platforms in a form software developers will understand; automatic, optimal […]
Page 1 of 7612345...102030...Last »

* * *

* * *

Like us on Facebook

HGPU group

152 people like HGPU on Facebook

Follow us on Twitter

HGPU group

1252 peoples are following HGPU @twitter

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: AMD APP SDK 2.9
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.2
  • SDK: nVidia CUDA Toolkit 6.0.1, AMD APP SDK 2.9

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2014 hgpu.org

All rights belong to the respective authors

Contact us: