14091

Posts

Jun, 8

CGO: G: Intelligent Heuristic Construction with Active Learning

Building effective optimization heuristics is a challenging task which often takes developers several months if not years to complete. Predictive modelling has recently emerged as a promising solution, automatically constructing heuristics from training data, however, obtaining this data can take months per platform. This is becoming an ever more critical problem as the pace of […]
Jun, 8

Exploring CPU-GPU Coherence

AMD, ARM and other members of the Heterogeneous Systems Architecture Foundation are focusing on integrated CPU-GPU systems with shared memory, to improve the programmability of heterogeneous systems. Such integration is also necessary to eliminate the energy and latency costs associated with conventional heterogeneous computation. This work investigates the relevance of CPU-GPU coherence for current heterogeneous […]
Jun, 8

Cryptanalysis of the McEliece Cryptosystem on GPGPUs

The linear code based McEliece cryptosystem is potentially promising as a so-called "post-quantum" public key cryptosystem because thus far it has resisted quantum cryptanalysis, but to be considered secure, the cryptosystem must resist other attacks as well. In 2011, Bernstein et al. introduced the "Ball Collision Decoding" (BCD) attack on McEliece which is a significant […]
Jun, 8

Bi-directional Path Tracing on GPU

Computer graphics renderers for creating photo-realistic images use mainly unidirectional path tracing, having good results for scenes without caustics or hard cases. There are also few renderers with bi-directional path tracing implementation, however due to the complexity of the algorithm implementation, they almost exclusively target sequential CPUs. The thesis proposes a way of implementation of […]
Jun, 7

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus […]
Jun, 7

Implementation of K-shortest Path Algorithm in GPU Using CUDA

K-shortest path algorithm is generalization of the shortest path algorithm. K-shortest path is used in various fields like sequence alignment problem in molecular bioinformatics, robot motion planning, path finding in gene network where speed to calculate paths plays a vital role. Parallel implementation is one of the best ways to fulfill the requirement of these […]
Jun, 7

Meta-Programming and Auto-Tuning in the Search for High Performance GPU Code

Writing high performance GPGPU code is often difficult and time-consuming, potentially requiring laborious manual tuning of low-level details. Despite these challenges, the cost in ignoring GPUs in high performance computing is increasingly large. Auto-tuning is a potential solution to the problem of tedious manual tuning. We present a framework for auto-tuning GPU kernels which are […]
Jun, 7

The implementation and optimization of Bitonic sort algorithm based on CUDA

This paper describes in detail the bitonic sort algorithm,and implements the bitonic sort algorithm based on cuda architecture. At the same time,we conduct two effective optimization of implementation details according to the characteristics of the GPU, which greatly improve the efficiency. Finally,we survey the optimized Bitonic sort algorithm on the GPU with the speedup of […]
Jun, 7

A Parallel Implementation of the Galerkin Method for Solving Partial Differential Equations on a Triangular Mesh

Finite Element Methods are techniques for estimating solutions to boundary value problems for partial differential equations from an approximating subspace. These methods are based on weak or variational forms of the BVP that require less of the problem functions than what the original PDE would suggest in terms of order of differentiability and continuity. In […]
Jun, 5

Machine Learning Based Auto-tuning for Enhanced OpenCL Performance Portability

Heterogeneous computing, which combines devices with different architectures, is rising in popularity, and promises increased performance combined with reduced energy consumption. OpenCL has been proposed as a standard for programing such systems, and offers functional portability. It does, however, suffer from poor performance portability, code tuned for one device must be re-tuned to achieve good […]
Jun, 5

Accelerated Nodal Discontinuous Galerkin Simulations for Reverse Time Migration with Large Clusters

Improving both accuracy and computational performance of numerical tools is a major challenge for seismic imaging and generally requires specialized implementations to make full use of modern parallel architectures. We present a computational strategy for reverse-time migration (RTM) with accelerator-aided clusters. A new imaging condition computed from the pressure and velocity fields is introduced. The […]
Jun, 5

Blocks and Fuel: Frameworks for deep learning

We introduce two Python frameworks to train neural networks on large datasets: Blocks and Fuel. Blocks is based on Theano, a linear algebra compiler with CUDA-support. It facilitates the training of complex neural network models by providing parametrized Theano operations, attaching metadata to Theano’s symbolic computational graph, and providing an extensive set of utilities to […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: