12970

Posts

Oct, 16

Parallel Programming and Compressed Material Data for an Eulerian Code

We describe the problem of iterating over mesh zones and iterating over material data within a zone, in the context of relatively new compute architectures. We present an example for how this can be done in a way that is portable across parallel programming environments and can be made to perform well. We offer a […]
Oct, 16

Multi-GPU Based Lattice Boltzmann Method for Hemodynamic Simulation in Patient-Specific Cerebral Aneurysm

Conducting lattice Boltzmann method on GPU has been proved to be an effective manner to gain a significant performance benefit, thus the GPU or multi-GPU based lattice Boltzmann method is considered as a promising and competent candidate in the study of large-scale complex fluid flows. In this work, a multi-GPU based lattice Boltzmann algorithm coupled […]
Oct, 16

Pipelined Iterative Solvers with Kernel Fusion for Graphics Processing Units

We revisit the implementation of iterative solvers on discrete graphics processing units and demonstrate the benefit of implementations using extensive kernel fusion for pipelined formulations over conventional implementations of classical formulations. The proposed implementations with both CUDA and OpenCL are freely available in ViennaCL and achieve up to three-fold performance gains when compared to other […]
Oct, 16

5th International Conference on Bioscience, Biochemistry and Bioinformatics, ICBBB 2015

Last Round Deadline: 2014-11-15 Publication: Submitted conference papers will be reviewed by technical committees of the Conference.ICBBB 2015 papers will be published in: *WIT Transactions on Biomedicine and Health (ISSN: 1743-3525), all the papers published by WIT Press which will be indexed by EI Compendex and SCOPUS. *International Journal of Bioscience, Biochemistry and Bioinformatics (IJBBB, […]
Oct, 14

A Case Study of OpenCL on an Android Mobile GPU

An observation in supercomputing in the past decade illustrates the transition of pervasive commodity products being integrated with the world’s fastest system. Given today’s exploding popularity of mobile devices, we investigate the possibilities for high performance mobile computing. Because parallel processing on mobile devices will be the key element in developing a mobile and computationally […]
Oct, 14

Synthetic Aperture Radar imaging on a CUDA-enabled mobile platform

This paper presents the details of a Synthetic Aperture Radar (SAR) imaging on the smallest CUDA-capable platform available, the Jetson TK1. The results indicate that GPU accelerated embedded platforms have considerable potential for this type of workload and in conjunction with low power consumption, light weight and standard programming tools, could open new horizons in […]
Oct, 14

A Complete and Efficient CUDA-Sharing Solution for HPC Clusters

In this paper we detail the key features, architectural design, and implementation of rCUDA, an advanced framework to enable remote and transparent GPGPU acceleration in HPC clusters. rCUDA allows decoupling GPUs from nodes, forming pools of shared accelerators, which brings enhanced flexibility to cluster configurations. This opens the door to configurations with fewer accelerators than […]
Oct, 14

Random Address Permute-Shift Technique for the Shared Memory on GPUs

The Discrete Memory Machine (DMM) is a theoretical parallel computing model that captures the essence of memory access to the shared memory of a streaming multiprocessor on CUDA-enabled GPUs. The DMM has w memory banks that constitute a shared memory, and w threads in a warp try to access them at the same time. However, […]
Oct, 14

Parallel Algorithms for the Summed Area Table on the Asynchronous Hierarchical Memory Machine, with GPU implementations

The Hierarchical Memory Machine (HMM) is a theoretical parallel computing model that captures the essence of computing on CUDA-enabled GPUs. The summed area table (SAT) of a matrix is a data structure frequently used in the area of computer vision which can be obtained by computing the column-wise prefix-sums and then the rowwise prefix-sums. The […]
Oct, 13

Scalable approximate k-NN in multidimensional big data

This thesis studies the scalability of the similarity search problem in large-scale multidimensional data. Similarity search, translating into the neighbour search problem, finds many applications for information retrieval, visualization, machine learning and data mining. The current exponential growth of data motivates the need for approximate and scalable algorithms. In most of existing algorithms and data-structures, […]
Oct, 13

A Parallel Algorithm for Enumerating Joint Weight of a Binary Linear Code in Network Coding

In this paper, we present a parallel algorithm for enumerating joint weight of a binary linear (n, k) code, aiming at accelerating assessment of its decoding error probability for network coding. To reduce the number of pairs of codewords to be investigated, our parallel algorithm reduces dimension k by focusing on the all-one vector included […]
Oct, 13

GAIN: GPU-based Constraint Checking for Context Consistency

Applications in pervasive computing are often context-aware. However, due to uncontrollable environmental noises, contexts collected by applications can be distorted or even conflicting with each other. This is known as the context inconsistency problem. To provide reliable services, applications need to validate contexts before using them. One promising approach is to check contexts against consistency […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: