Posts
Oct, 22
A linguistic approach to concurrent, distributed, and adaptive programming across heterogeneous platforms
Two major trends in computing hardware during the last decade have been an increase in the number of processing cores found in individual computer hardware platforms and an ubiquity of distributed, heterogeneous systems. Together, these changes can improve not only the performance of a range of applications, but the types of applications that can be […]
Oct, 22
Stadium Hashing: Scalable and Flexible Hashing on GPUs
Hashing is one of the most fundamental operations that provides a means for a program to obtain fast access to large amounts of data. Despite the emergence of GPUs as many-threaded general purpose processors, high performance parallel data hashing solutions for GPUs are yet to receive adequate attention. Existing hashing solutions for GPUs not only […]
Oct, 22
BLASX: A High Performance Level-3 BLAS Library for Heterogeneous Multi-GPU Computing
Basic Linear Algebra Subprograms (BLAS) are a set of low level linear algebra kernels widely adopted by applications involved with the deep learning and scientific computing. The massive and economic computing power brought forth by the emerging GPU architectures drives interest in implementation of compute-intensive level 3 BLAS on multi-GPU systems. In this paper, we […]
Oct, 22
Neurokernel: An Open Source Platform for Emulating the Fruit Fly Brain
We have developed an open software platform called Neurokernel for collaborative development of comprehensive models of the brain of the fruit fly Drosophila melanogaster and their execution and testing on multiple Graphics Processing Units (GPUs). Neurokernel provides a programming model that capitalizes upon the structural organization of the fly brain into a fixed number of […]
Oct, 18
Self-Adapting Parallel Framework for Long-Term Object Tracking
Object tracking is a crucial field in computer vision that has many uses in human-computer interaction, security and surveillance, video communication and compression, augmented reality, traffic control, etc. Many implementations are introduced in practice, and yet recent methods emphasize on tracking objects adaptively by learning the object’s perspectives and rediscovering it when it becomes untraceable, […]
Oct, 18
Implementation of a Power Efficient Synthetic Aperture Radar Back Projection Algorithm on FPGAs Using OpenCL
In this thesis, an implementation of a Synthetic Aperture Radar (SAR) back projection algorithm onto a Field-Programmable Gate Array (FPGA) device using Open Computing Language (OpenCL) is developed. SAR back projection is a method to form a high-resolution terrain image from radar data. SAR is used in many applications such as Geographic Information Systems (GIS), […]
Oct, 18
A Network Intrusion Detection System Framework based on Hadoop and GPGPU
In IT industry the business data grows exponentially, which results in concern to enhance the security system by implementing effective NIDS (Network Intrusion Detection System).The quick response to detecting intrusion an essential feature of any NIDS system, but due to the huge amount of data obtained from organizations which impacts the performance of NIDS. The […]
Oct, 18
Performance analysis and optimization of a CFD application
This thesis documents the analysis and optimization of a high-order finite difference computational fluid dynamics (CFD) application (PlasComCM). Performance bottlenecks were identified using performance tools and hardware counters. The performance analysis of PlasComCM showed that the quantity of memory accesses and the lack of vectorization inhibited optimal serial performance on a x86-based CPU. Optimizing techniques […]
Oct, 18
MetaFork: A Compilation Framework for Concurrency Models Targeting Hardware Accelerators and Its Application to the Generation of Parametric CUDA Kernels
In this paper, we present the accelerator model of MetaFork together with the software framework that allows automatic generation of CUDA code from annotated MetaFork programs. One of the key features of this CUDA code generator is that it supports the generation of CUDA kernel code where program parameters (like number of threads per block) […]
Oct, 16
Multi-dimensional Functional Principal Component Analysis
Functional principal component analysis is one the most commonly employed approaches in functional/longitudinal data analysis and we extend it to conduct $d$-dimensional functional/longitudinal data analysis. The computational issues emerging in the extension are fully addressed with our proposed solutions. The local linear smoothing technique is employed to perform estimation because of its capabilities of performing […]
Oct, 16
Density-based parallel skin lesion border detection with webCL
BACKGROUND: Dermoscopy is a highly effective and noninvasive imaging technique used in diagnosis of melanoma and other pigmented skin lesions. Many aspects of the lesion under consideration are defined in relation to the lesion border. This makes border detection one of the most important steps in dermoscopic image analysis. In current practice, dermatologists often delineate […]
Oct, 16
Comparison of Thread Execution Methods for GPU-oriented OpenCL Programs on Multicore Processors
With the broad deployment of multicore processors, there are increasing demands to port OpenCL programs written for GPUs onto the multicore processors. However, OpenCL programs written for GPUs cannot run efficiently on multicore processors since GPU-oriented OpenCL programs generally consist of a huge number of threads. This paper presents experimental comparisons of three thread execution […]