Posts
Sep, 28
gEMfitter: A Highly Parallel FFT-Based 3D Density Fitting Tool With GPU Texture Memory Acceleration
Fitting high resolution protein structures into low resolution cryo-electron microscopy (cryo-EM) density maps is an important technique for modeling the atomic structures of very large macromolecular assemblies. This article presents "gEMfitter", a highly parallel fast Fourier transform (FFT) EM density fitting program which can exploit the special hardware properties of modern graphics processor units (GPUs) […]
Sep, 28
Fast, parallel implementation of particle filtering on the GPU architecture
In this paper, we introduce a modified cellular particle filter (CPF) which we mapped on a graphics processing unit (GPU) architecture. We developed this filter adaptation using a state-of-the art CPF technique. Mapping this filter realization on a highly parallel architecture entailed a shift in the logical representation of the particles. In this process, the […]
Sep, 28
From OpenCL to Gates: the FFT
The FFT plays a fundamental role in OFDM programmable digital baseband communication systems under the SDR context. The core nature of this algorithm marks it as a primary target for acceleration. Since long frame lengths of the FFT are desirable in order to achieve higher bitrates, the computational complexity becomes even more significant. In this […]
Sep, 28
Performance Improvement of Optical Algorithms on Multicore Platforms
ASML is one of the world’s largest suppliers of lithography systems for the semiconductor industry. ASML designs and develops machines that are used to print circuits on silicon wafers, to produce IC chips. These circuits have to be printed with accuracy of up to 2nm. For this purpose, the machines incorporate several measurement systems. The […]
Sep, 28
Efficient Computation of k-Nearest Neighbour Graphs for Large High-Dimensional Data Sets on GPU Clusters
This paper presents an implementation of the brute-force exact k-Nearest Neighbor Graph (k-NNG) construction for ultra-large high-dimensional data cloud. The proposed method uses Graphics Processing Units (GPUs) and is scalable with multi-levels of parallelism (between nodes of a cluster, between different GPUs on a single node, and within a GPU). The method is applicable to […]
Sep, 27
Clustering on GPU – A Brief Survey
Clustering, as a process of partitioning data elements with similar properties, is an essential task in many application areas. Due to technological advances, the amount as well as the dimensionality of data sets in general is steadily growing. Graphics Processing Units in today’s desktops can be thought of as a high performance parallel processor. As […]
Sep, 27
CUD@ASP: Experimenting with GPUs in ASP solving
This paper illustrates the design and implementation of a prototype ASP solver that is capable of exploiting the parallelism offered by general purpose graphical processing units (GPGPUs). The solver is based on a basic conflict-driven search algorithm. The core of the solving process develops on the CPU, while most of the activities, such as literal […]
Sep, 27
OpenCL Parallel Programming Development Cookbook
Welcome to the OpenCL Parallel Programming Development Cookbook! Whew, that was more than a mouthful. This book was written by a developer, that’s me, and for a developer, hopefully that’s you. This book will look familiar to some and distinct to others. It is a result of my experience with OpenCL, but more importantly in […]
Sep, 27
GPU-TLS: An Efficient Runtime for Speculative Loop Parallelization on GPUs
Recently GPUs have risen as one important parallel platform for general purpose applications, both in HPC and cloud environments. Due to the special execution model, developing programs for GPUs is difficult even with the recent introduction of high-level languages like CUDA and OpenCL. To ease the programming efforts, some research has proposed automatically generating parallel […]
Sep, 27
Java with Auto-Parallelization on Graphics Coprocessing Architecture
GPU-based many-core accelerators have gained a footing in supercomputing. Their widespread adoption yet hinges on better parallelization and load scheduling techniques to utilize the hybrid system of CPU and GPU cores easily and efficiently. This paper introduces a new userfriendly compiler framework and runtime system, dubbed Japonica, to help Java applications harness the full power […]
Sep, 27
Multi-Scale, Multi-Level, Heterogeneous Features Extraction and Classification of Volumetric Medical Images
This paper articulates a novel method for the heterogeneous feature extraction and classification directly on volumetric images, which covers multi-scale point feature, multi-scale surface feature, multi-level curve feature, and blob feature. To tackle the challenge of complex volumetric inner structure and diverse feature forms, our technical solution hinges upon the integrated approach of locally-defined diffusion […]
Sep, 27
Trellis: Portability Across Architectures with a High-level Framework
The increasing computational needs of parallel applications inevitably require portability across parallel architectures, which now include heterogeneous processing resources, such as CPUs and GPUs, and multiple SIMD/SIMT widths. However, the lack of a common parallel programming paradigm that provides predictable, near-optimal performance on each resource leads to the use of low-level frameworks with architecture-specific optimizations, […]