Posts
Sep, 29
Re-Introduction of Communication-Avoiding FMM-Accelerated FFTs with GPU Acceleration
As distributed memory systems grow larger, communication demands have increased. Unfortunately, while the costs of arithmetic operations continue to decrease rapidly, communication costs have not. As a result, there has been a growing interest in communication-avoiding algorithms for some of the classic problems in numerical computing, including communication-avoiding Fast Fourier Transforms (FFTs). A previously-developed low-communication […]
Sep, 28
Toward a GPU-Accelerated Immersed Boundary Method for Wind Forecasting Over Complex Terrain
A short-term wind power forecasting capability can be a valuable tool in the renewable energy industry to address load-balancing issues that arise from intermittent wind fields. Although numerical weather prediction models have been used to forecast winds, their applicability to micro-scale atmospheric boundary layer flows and ability to predict wind speeds at turbine hub height […]
Sep, 28
APOGEE: adaptive prefetching on GPUs for energy efficiency
Modern graphics processing units (GPUs) combine large amounts of parallel hardware with fast context switching among thousands of active threads to achieve high performance. However, such designs do not translate well to mobile environments where power constraints often limit the amount of hardware. In this work, we investigate the use of prefetching as a means […]
Sep, 28
A GPU Implementation of a Jacobi Method for Lattice Basis Reduction
This paper describes a parallel Jacobi method for lattice basis reduction and a GPU implementation using CUDA. Our experiments have shown that the parallel implementation is more than fifty times as fast as the serial counterpart, which is about twice as fast as the well-known LLL lattice reduction algorithm.
Sep, 28
gEMfitter: A Highly Parallel FFT-Based 3D Density Fitting Tool With GPU Texture Memory Acceleration
Fitting high resolution protein structures into low resolution cryo-electron microscopy (cryo-EM) density maps is an important technique for modeling the atomic structures of very large macromolecular assemblies. This article presents "gEMfitter", a highly parallel fast Fourier transform (FFT) EM density fitting program which can exploit the special hardware properties of modern graphics processor units (GPUs) […]
Sep, 28
Fast, parallel implementation of particle filtering on the GPU architecture
In this paper, we introduce a modified cellular particle filter (CPF) which we mapped on a graphics processing unit (GPU) architecture. We developed this filter adaptation using a state-of-the art CPF technique. Mapping this filter realization on a highly parallel architecture entailed a shift in the logical representation of the particles. In this process, the […]
Sep, 28
From OpenCL to Gates: the FFT
The FFT plays a fundamental role in OFDM programmable digital baseband communication systems under the SDR context. The core nature of this algorithm marks it as a primary target for acceleration. Since long frame lengths of the FFT are desirable in order to achieve higher bitrates, the computational complexity becomes even more significant. In this […]
Sep, 28
Performance Improvement of Optical Algorithms on Multicore Platforms
ASML is one of the world’s largest suppliers of lithography systems for the semiconductor industry. ASML designs and develops machines that are used to print circuits on silicon wafers, to produce IC chips. These circuits have to be printed with accuracy of up to 2nm. For this purpose, the machines incorporate several measurement systems. The […]
Sep, 28
Efficient Computation of k-Nearest Neighbour Graphs for Large High-Dimensional Data Sets on GPU Clusters
This paper presents an implementation of the brute-force exact k-Nearest Neighbor Graph (k-NNG) construction for ultra-large high-dimensional data cloud. The proposed method uses Graphics Processing Units (GPUs) and is scalable with multi-levels of parallelism (between nodes of a cluster, between different GPUs on a single node, and within a GPU). The method is applicable to […]
Sep, 27
Clustering on GPU – A Brief Survey
Clustering, as a process of partitioning data elements with similar properties, is an essential task in many application areas. Due to technological advances, the amount as well as the dimensionality of data sets in general is steadily growing. Graphics Processing Units in today’s desktops can be thought of as a high performance parallel processor. As […]
Sep, 27
CUD@ASP: Experimenting with GPUs in ASP solving
This paper illustrates the design and implementation of a prototype ASP solver that is capable of exploiting the parallelism offered by general purpose graphical processing units (GPGPUs). The solver is based on a basic conflict-driven search algorithm. The core of the solving process develops on the CPU, while most of the activities, such as literal […]
Sep, 27
OpenCL Parallel Programming Development Cookbook
Welcome to the OpenCL Parallel Programming Development Cookbook! Whew, that was more than a mouthful. This book was written by a developer, that’s me, and for a developer, hopefully that’s you. This book will look familiar to some and distinct to others. It is a result of my experience with OpenCL, but more importantly in […]