18053

Posts

Feb, 17

The performances of R GPU implementations of the GMRES method

Although the performance of commodity computers has improved drastically with the introduction of multicore processors and GPU computing, the standard R distribution is still based on single-threaded model of computation, using only a small fraction of the computational power available now for most desktops and laptops. Modern statistical software packages rely on high performance implementations […]
Feb, 17

Evaluating High-Level Synthesis Techniques for Scalable Hardware-Accelerated Computing

Hardware acceleration is considered a powerful tool in parallel-computing, able to overcome the limitations imposed by sequential execution of software applications and, at the same time, provide energy-efficient alternatives to other parallel computing platforms such as GPUs. However, the increasing application complexity makes it unaffordable to map algorithms directly into HDL. Hence, High-Level Synthesis tools […]
Feb, 15

Accelerating Interpreted Programming Languages on GPUs with Just-In-Time Compilation and Runtime Optimisations

Nowadays, most computer systems are equipped with powerful parallel devices such as Graphics Processing Units (GPUs). They are present in almost every computer system including mobile devices, tablets, desktop computers and servers. These parallel systems have unlocked the possibility for many scientists and companies to process significant amounts of data in shorter time. But the […]
Feb, 10

Accelerating Deep Neural Networks on Low Power Heterogeneous Architectures

Deep learning applications are able to recognise images and speech with great accuracy, and their use is now everywhere in our daily lives. However, developing deep learning architectures such as deep neural networks in embedded systems is a challenging task because of the demanding computational resources and power consumption. Hence, sophisticated algorithms and methods that […]
Jan, 13

ImageCL: Language and source-to-source compiler for performance portability, load balancing, and scalability prediction on heterogeneous systems

Applications written for heterogeneous CPU-GPU systems often suffer from poor performance portability. Finding good work partitions can also be challenging as different devices are suited for different applications. This article describes ImageCL, a high-level domain-specific language and source-to-source compiler, targeting single system as well as distributed heterogeneous hardware. Initially targeting image processing algorithms, our framework […]
Jan, 13

pyPaSWAS: Python-based multi-core CPU and GPU sequence alignment

BACKGROUND: Our previously published CUDA-only application PaSWAS for Smith-Waterman (SW) sequence alignment of any type of sequence on NVIDIA-based GPUs is platform-specific and therefore adopted less than could be. The OpenCL language is supported more widely and allows use on a variety of hardware platforms. Moreover, there is a need to promote the adoption of […]
Jan, 6

Rubus: A compiler for seamless and extensible parallelism

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single […]
Dec, 28

Design and Implementation of the Futhark Programming Language

In this thesis we describe the design and implementation of Futhark, a small data-parallel purely functional array language that offers a machine-neutral programming model, and an optimising compiler that generates efficient OpenCL code for GPUs. The overall philosophy is based on seeking a middle ground between functional and imperative approaches. The specific contributions are as […]
Dec, 19

Improving 3D Lattice Boltzmann Method stencil with asynchronous transfers on many-core processors

CPU-based many-core processors present an alternative to multicore CPU and GPU processors. In particular, the 93-Petaflops Sunway supercomputer, built from clustered many-core processors, has opened a new era for high performance computing that does not rely on GPU acceleration. However, memory bandwidth remains the main challenge for these architectures. This motivates our endeavor for optimizing […]
Nov, 30

2D Image Convolution using Three Parallel Programming Models on the Xeon Phi

Image convolution is widely used for sharpening, blurring and edge detection. In this paper, we review two common algorithms for convolving a 2D image by a separable kernel (filter). After optimising the naive codes using loop unrolling and SIMD vectorisation, we choose the algorithm with better performance as the baseline for parallelisation. We then compare […]
Nov, 21

Unified Deep Learning with CPU, GPU, and FPGA Technologies

Deep learning and complex machine learning has quickly become one of the most important computationally intensive applications for a wide variety of fields. The combination of large data sets, high-performance computational capabilities, and evolving and improving algorithms has enabled many successful applications which were previously difficult or impossible to consider. This paper explores the challenges […]
Nov, 16

Domain-Specific Acceleration and Auto-Parallelization of Legacy Scientific Code in FORTRAN 77 using Source-to-Source Compilation

Massively parallel accelerators such as GPGPUs, manycores and FPGAs represent a powerful and affordable tool for scientists who look to speed up simulations of complex systems. However, porting code to such devices requires a detailed understanding of heterogeneous programming tools and effective strategies for parallelization. In this paper we present a source to source compilation […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: