15428
Milan Ceska, Petr Pilar, Nicola Paoletti, Lubos Brim, Marta Kwiatkowska
In this paper we present PRISM-PSY, a novel tool that performs precise GPU-accelerated parameter synthesis for continuous-time Markov chains and time-bounded temporal logic specifications. We redesign, in terms of matrix-vector operations, the recently formulated algorithms for precise parameter synthesis in order to enable effective dataparallel processing, which results in significant acceleration on many-core architectures. High […]
View View   Download Download (PDF)   
Zeke Wang, Bingsheng He, Wei Zhang, Shunning Jiang
Recently, FPGA vendors such as Altera and Xilinx have released OpenCL SDK for programming FPGAs. However, the architecture of FPGA is significantly different from that of CPU/GPU, for which OpenCL is originally designed. Tuning the OpenCL code for good performance on FPGAs is still an open problem, since the existing OpenCL tools and models designed […]
View View   Download Download (PDF)   
Tobias Klein
The utilization of GPUs and the massively parallel computing paradigm have become increasingly prominent in many research domains. Recent developments of platforms, such as OpenCL and CUDA, enable the usage of heterogeneous parallel computing in a wide-spread field. However, the efficient utilization of parallel hardware requires profound knowledge of parallel programming and the hardware itself. […]
View View   Download Download (PDF)   
Zdenek Buk
The paper presents application of OpenCLLink in Wolfram Mathematica to accelerate fully recurrent neural networks using GPU. We also show the idea of automatically generated parts of source code using SymbolicC.
Adam Harries, Michel Steuwer, Murray Cole, Alan Gray, Christophe Dubach
While contemporary GPU architectures are heavily biased towards the execution of predictably regular data parallelism, many real application domains are based around data structures which are naturally sparse and irregular. In this paper we demonstrate that high level programming and high performance GPU execution for sparse, irregular problems are not mutually exclusive. Our insight is […]
View View   Download Download (PDF)   
Alexander Matz, Mark Hummel, Holger Froning
GPUs have established themselves in the computing landscape, convincing users and designers by their excellent performance and energy efficiency. They differ in many aspects from general-purpose CPUs, for instance their highly parallel architecture, their thread-collective bulk-synchronous execution model, and their programming model. In particular, languages like CUDA or OpenCL require users to express parallelism very […]
View View   Download Download (PDF)   
Seyed Parsa Banihashemi
The Explicit Finite Element Method is a powerful tool in nonlinear dynamic finite element analysis. Recent major developments in computational devices, in particular, General Purpose Graphical Processing Units (GPGPU’s) now make it possible to increase the performance of the explicit FEM. This dissertation investigates existing explicit finite element method algorithms which are then redesigned for […]
View View   Download Download (PDF)   
Issam Said
In an exploration context, Oil and Gas (O&G) companies rely on HPC to accelerate depth imaging algorithms. Solutions based on CPU clusters and hardware accelerators are widely embraced by the industry. The Graphics Processing Units (GPUs), with a huge compute power and a high memory bandwidth, had attracted significant interest. However, deploying heavy imaging workflows, […]
View View   Download Download (PDF)   
Nadesh Ramanathan, John Wickerson, Felix Winterstein, George A. Constantinides
We provide a case study of work-stealing, a popular method for run-time load balancing, on FPGAs. Following the Cederman-Tsigas implementation for GPUs, we synchronize workitems not with locks, mutexes or critical sections, but instead with the atomic operations provided by Altera’s OpenCL SDK. We evaluate work-stealing for FPGAs by synthesizing a K-means clustering algorithm on […]
View View   Download Download (PDF)   
Alessio Sclocco, Joeri van Leeuwen, Henri E. Bal, Rob V. van Nieuwpoort
Dedispersion, the removal of deleterious smearing of impulsive signals by the interstellar matter, is one of the most intensive processing steps in any radio survey for pulsars and fast transients. We here present a study of the parallelization of this algorithm on many-core accelerators, including GPUs from AMD and NVIDIA, and the Intel Xeon Phi. […]
Rana Nandi, Stefan Schramm
We study the effect of isospin-dependent nuclear forces on the pasta phase in the inner crust of neutron stars. To this end we model the crust within the framework of quantum molecular dynamics (QMD). For maximizing the numerical performance, the newly developed code has been implemented on GPU processors. As a first application of the […]
View View   Download Download (PDF)   
Amund Tvei, Torbjorn Morland, Thomas Brox Rost
In this paper we present DeepLearningKit – an open source framework that supports using pre- trained deep learning models (convolutional neural networks) for iOS, OS X and tvOS. DeepLearningKit is developed in Metal in order to utilize the GPU efficiently and Swift for integration with applications, e.g. iOS-based mobile apps on iPhone/iPad, tvOS-based apps for […]
Page 1 of 12712345...102030...Last »

* * *

* * *

Follow us on Twitter

HGPU group

1737 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

369 people like HGPU on Facebook

HGPU group © 2010-2016 hgpu.org

All rights belong to the respective authors

Contact us: