Posts
Apr, 4
An Effective Model of CPU/GPU Collaborative Computing in GPU Clusters
Remote procedure call (RPC) is a simple, transparent and useful paradigm for providing communication between two processes across a network. The compute unified device architecture (CUDA) programming toolkit and runtime enhance the programmability of the graphics processing unit (GPU) and make GPU more versatile in high performance computing. The current researches mainly focus on the […]
Apr, 4
The Design and Implementation of a Verification Technique for GPU Kernels
We present a technique for the formal verification of GPU kernels, addressing two classes of correctness properties: data races and barrier divergence. Our approach is founded on a novel formal operational semantics for GPU kernels termed synchronous, delayed visibility (SDV) semantics, which captures the execution of a GPU kernel by multiple groups of threads. The […]
Apr, 4
Using OpenCL to Implement Median Filtering and RSA Algorithms: Two GPGPU Application Case Studies
Graphics Processing Units (GPU) and their development tools have advanced recently, and industry has become more interested in using them. Among several development frameworks for GPU(s), OpenCL provides a programming environment to write portable code that can run in parallel. This report describes two case studies of algorithm implementations in OpenCL. The first algorithm is […]
Apr, 1
Distributed wideband software-defined radio receiver for heterogeneous systems
Recent years have seen an increasing need for computationally efficient implementation of software-defined radio (SDR) systems. Given the limitations of a typical SDR application running on a single machine, we present a distributed SDR system using high-performance techniques. To split a digital signal into multiple channels, we use an efficient digital signal processing technique: a […]
Apr, 1
Generating Null Models for Large-Scale Networks on GPU
A network generated by randomly rewiring the edges of an original network on some constraint conditions is called the null model of the original network. It’s a useful tool for revealing some mechanisms affecting the topology of networks. As the scales of networks become larger and larger, time consumption of generating null models increases. How […]
Apr, 1
Microbranching in mode-I fracture using large scale simulations of amorphous and perturbed lattice models
We study the high-velocity regime mode-I fracture instability using large scale simulations. At large driving displacements, the pattern of a single, steady-state crack that propagates in the midline of the sample breaks down, and small microbranches start to appear near the main crack. Some of the features of those microbranches have been reproduced qualitatively in […]
Apr, 1
Separable projection integrals for higher-order correlators of the cosmic microwave sky: Acceleration by factors exceeding 100
We study the optimisation and porting of the "Modal" code on Intel(R) Xeon(R) processors and/or Intel(R) Xeon Phi(TM) coprocessors using methods which should be applicable to more general compute bound codes. "Modal" is used by the Planck satellite experiment for constraining general non-Gaussian models of the early universe via the bispectrum of the cosmic microwave […]
Apr, 1
Parameter Selection and Pre-Conditioning for a Graph Form Solver
In a recent paper, Parikh and Boyd describe a method for solving a convex optimization problem, where each iteration involves evaluating a proximal operator and projection onto a subspace. In this paper we address the critical practical issues of how to select the proximal parameter in each iteration, and how to scale the original problem […]
Mar, 30
Massively Parallel Analysis of Similarity Matrices on Heterogeneous Hardware
We conduct a study that investigates the performance characteristics of a set of parallel implementations of the recurrence quantification analysis (RQA) using OpenCL. Being an important tool in climate impact and medical research, a central aspect of RQA is the construction of a binary matrix that captures the similarities of multi-dimensional vectors. Based on this […]
Mar, 30
Face Retriever: Pre-filtering the Gallery via Deep Neural Net
Face retrieval is an enabling technology for many applications, including automatic face annotation, deduplication, and surveillance. In this paper, we propose a face retrieval system which combines a k-NN search procedure with a COTS matcher (PittPatt) in a cascaded manner. In particular, given a query face, we first pre-filter the gallery set and find the […]
Mar, 30
Accelerating complex brain-model simulations on GPU platforms
The Inferior Olive (IO) in the brain, in conjunction with the cerebellum, is responsible for crucial sensorimotor-integration functions in humans. In this paper, we simulate a computationally challenging IO neuron model consisting of three compartments per neuron in a network arrangement on GPU platforms. Several GPU platforms of the two latest NVIDIA GPU architectures (Fermi, […]
Mar, 30
High Performance Computing for solving large sparse systems. Optical Diffraction Tomography as a case of study
This thesis, entitled "High Performance Computing for solving large sparse systems. Optical Diffraction Tomography as a case of study" investigates the computational issues related to the resolution of linear systems of equations which come from the discretization of physical models described by means of Partial Differential Equations (PDEs). These physical models are conceived for the […]