Posts
Oct, 25
Modern Gyrokinetic Particle-In-Cell Simulation of Fusion Plasmas on Top Supercomputers
The Gyrokinetic Toroidal Code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5D Vlasov-Poisson equation featuring efficient utilization of modern parallel computer architectures at the petascale and beyond. Motivated by the goal of developing a modern code capable of dealing with the physics challenge of increasing problem size […]
Oct, 25
Join Algorithms on GPUs: A Revisit After Seven Years
Implementing database operations on parallel platforms has gain a lot of momentum in the past decade. A number of studies have shown the potential of using GPUs to speed up database operations. In this paper, we present empirical evaluations of a state-of-the-art work published in SIGMOD’08 on GPU-based join processing. In particular, such work provides […]
Oct, 22
Sequential Code Parallelization for Multi-core Embedded Systems: A Survey of Models, Algorithms and Tools
In recent years the industry experienced a shift in the design and manufacture of processors. Multiple-core processors in one single chip started replacing the common used single-core processors. This design trend reached the develop of System-on-Chip, widely used in embedded systems, and turned them into powerful Multiprocessor System-on-Chip. These multi-core systems have presented not only […]
Oct, 22
A linguistic approach to concurrent, distributed, and adaptive programming across heterogeneous platforms
Two major trends in computing hardware during the last decade have been an increase in the number of processing cores found in individual computer hardware platforms and an ubiquity of distributed, heterogeneous systems. Together, these changes can improve not only the performance of a range of applications, but the types of applications that can be […]
Oct, 22
Stadium Hashing: Scalable and Flexible Hashing on GPUs
Hashing is one of the most fundamental operations that provides a means for a program to obtain fast access to large amounts of data. Despite the emergence of GPUs as many-threaded general purpose processors, high performance parallel data hashing solutions for GPUs are yet to receive adequate attention. Existing hashing solutions for GPUs not only […]
Oct, 22
BLASX: A High Performance Level-3 BLAS Library for Heterogeneous Multi-GPU Computing
Basic Linear Algebra Subprograms (BLAS) are a set of low level linear algebra kernels widely adopted by applications involved with the deep learning and scientific computing. The massive and economic computing power brought forth by the emerging GPU architectures drives interest in implementation of compute-intensive level 3 BLAS on multi-GPU systems. In this paper, we […]
Oct, 22
Neurokernel: An Open Source Platform for Emulating the Fruit Fly Brain
We have developed an open software platform called Neurokernel for collaborative development of comprehensive models of the brain of the fruit fly Drosophila melanogaster and their execution and testing on multiple Graphics Processing Units (GPUs). Neurokernel provides a programming model that capitalizes upon the structural organization of the fly brain into a fixed number of […]
Oct, 18
MetaFork: A Compilation Framework for Concurrency Models Targeting Hardware Accelerators and Its Application to the Generation of Parametric CUDA Kernels
In this paper, we present the accelerator model of MetaFork together with the software framework that allows automatic generation of CUDA code from annotated MetaFork programs. One of the key features of this CUDA code generator is that it supports the generation of CUDA kernel code where program parameters (like number of threads per block) […]
Oct, 18
Self-Adapting Parallel Framework for Long-Term Object Tracking
Object tracking is a crucial field in computer vision that has many uses in human-computer interaction, security and surveillance, video communication and compression, augmented reality, traffic control, etc. Many implementations are introduced in practice, and yet recent methods emphasize on tracking objects adaptively by learning the object’s perspectives and rediscovering it when it becomes untraceable, […]
Oct, 18
Implementation of a Power Efficient Synthetic Aperture Radar Back Projection Algorithm on FPGAs Using OpenCL
In this thesis, an implementation of a Synthetic Aperture Radar (SAR) back projection algorithm onto a Field-Programmable Gate Array (FPGA) device using Open Computing Language (OpenCL) is developed. SAR back projection is a method to form a high-resolution terrain image from radar data. SAR is used in many applications such as Geographic Information Systems (GIS), […]
Oct, 18
A Network Intrusion Detection System Framework based on Hadoop and GPGPU
In IT industry the business data grows exponentially, which results in concern to enhance the security system by implementing effective NIDS (Network Intrusion Detection System).The quick response to detecting intrusion an essential feature of any NIDS system, but due to the huge amount of data obtained from organizations which impacts the performance of NIDS. The […]
Oct, 18
Performance analysis and optimization of a CFD application
This thesis documents the analysis and optimization of a high-order finite difference computational fluid dynamics (CFD) application (PlasComCM). Performance bottlenecks were identified using performance tools and hardware counters. The performance analysis of PlasComCM showed that the quantity of memory accesses and the lack of vectorization inhibited optimal serial performance on a x86-based CPU. Optimizing techniques […]