15427
Gang Mei, Hong Tian
This paper focuses on evaluating the impact of different data layouts on the computational efficiency of GPU-accelerated Inverse Distance Weighting (IDW) interpolation algorithm. First we redesign and improve our previous GPU implementation that was performed by exploiting the feature of CUDA dynamic parallelism (CDP). Then we implement three versions of GPU implementations, i.e., the naive […]
Yifei Li
Measuring the similarity between two streamlines is fundamental to many important flow data analysis and visualization tasks such as feature detection, pattern querying and streamline clustering. This dissertation presents a novel streamline similarity measure inspired by the bag-of-features concept from computer vision. Different from other streamline similarity measures, the proposed one considers both the distribution […]
Benoit Liquet, Leonardo Bottolo, Gianluca Campanella, Sylvia Richardson, Marc Chadeau-Hyam
Technological advances in molecular biology over the past decade have given rise to high dimensional and complex datasets offering the possibility to investigate biological associations between a range of genomic features and complex phenotypes. The analysis of this novel type of data generated unprecedented computational challenges which ultimately led to the definition and implementation of […]
Zdenek Buk
The paper presents application of OpenCLLink in Wolfram Mathematica to accelerate fully recurrent neural networks using GPU. We also show the idea of automatically generated parts of source code using SymbolicC.
Esin Yavuz, James Turner, Thomas Nowotny
Large-scale numerical simulations of detailed brain circuit models are important for identifying hypotheses on brain functions and testing their consistency and plausibility. An ongoing challenge for simulating realistic models is, however, computational speed. In this paper, we present the GeNN (GPU-enhanced Neuronal Networks) framework, which aims to facilitate the use of graphics accelerators for computational […]
Francisco Javier Ordonez, Daniel Roggen
Human activity recognition (HAR) tasks have traditionally been solved using engineered features obtained by heuristic processes. Current research suggests that deep convolutional neural networks are suited to automate feature extraction from raw sensor inputs. However, human activities are made of complex sequences of motor movements, and capturing this temporal dynamics is fundamental for successful HAR. […]
Kyungjoo Kim, Sivasankaran Rajamanickam, George Stelle, H. Carter Edwards, Stephen L. Olivier
We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block layout. The algorithm-by-blocks approach induces a task graph for the factorization. These tasks are inter-related to each other through their data dependences in […]
Alina Sirbu, Ozalp Babaoglu
Power consumption is a major obstacle for High Performance Computing (HPC) systems in their quest towards the holy grail of ExaFLOP performance. Significant advances in power efficiency have to be made before this goal can be attained and accurate modeling is an essential step towards power efficiency by optimizing system operating parameters to match dynamic […]
Hung-Yi Pu, Kiyun Yun, Ziri Younsi, Suk-Jin Yoon
General-relativistic radiative transfer (GRRT) calculations coupled with the calculation of geodesics in the Kerr spacetime are an essential tool for determining the images, spectra and light curves from matter in the vicinity of black holes. Such studies are especially important for ongoing and upcoming millimeter/submillimeter (mm/sub-mm) Very Long Baseline Interferometry (VLBI) observations of the supermassive […]
Alessio Sclocco, Joeri van Leeuwen, Henri E. Bal, Rob V. van Nieuwpoort
Dedispersion, the removal of deleterious smearing of impulsive signals by the interstellar matter, is one of the most intensive processing steps in any radio survey for pulsars and fast transients. We here present a study of the parallelization of this algorithm on many-core accelerators, including GPUs from AMD and NVIDIA, and the Intel Xeon Phi. […]
Phillipe A. Pereira, Higo F. Albuquerque, Hendrio M. Marques, Isabela S. Silva, Celso B. Carvalho, Lucas C. Cordeiro
We present ESBMC-GPU, an extension to the ESBMC model checker that is aimed at verifying GPU programs written for the CUDA framework. ESBMC-GPU uses an operational model for the verification, i.e., an abstract representation of the standard CUDA libraries that conservatively approximates their semantics. ESBMC-GPU verifies CUDA programs, by explicitly exploring the possible interleavings (up […]
Amund Tvei, Torbjorn Morland, Thomas Brox Rost
In this paper we present DeepLearningKit – an open source framework that supports using pre- trained deep learning models (convolutional neural networks) for iOS, OS X and tvOS. DeepLearningKit is developed in Metal in order to utilize the GPU efficiently and Swift for integration with applications, e.g. iOS-based mobile apps on iPhone/iPad, tvOS-based apps for […]
Page 1 of 9112345...102030...Last »

* * *

* * *

Follow us on Twitter

HGPU group

1735 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

368 people like HGPU on Facebook

HGPU group © 2010-2016 hgpu.org

All rights belong to the respective authors

Contact us: