Posts
Jan, 19
GPU Computing for Meshfree Particle Method
Graphics Processing Units (GPUs), originally developed for computer games, now provide computational power for scientific applications. A study on the comparison of computational speed-up and efficiency of a GPU with a CPU for the Finite Pointset Method (FPM), which is a numerical tool in Computational Fluid Dynamics (CFD) is presented. As FPM is based on […]
Jan, 18
High-performance and Embedded Systems for Cryptography
This thesis addresses the design of cryptographic accelerators, ranging from the embedded system to the high-performance computing device. New techniques are proposed to allow several cryptographic algorithms to be computed by the same target. Therefore, flexibility (to support several algorithms) and scalability (to extend the features of a designed accelerator) are two keywords in all […]
Jan, 18
Supporting x86-64 Address Translation for 100s of GPU Lanes
Efficient memory sharing between CPU and GPU threads can greatly expand the effective set of GPGPU workloads. For increased programmability, this memory should be uniformly virtualized, necessitating compatible address translation support for GPU memory references. However, even a modest GPU might need 100s of translations per cycle (6 CUs * 64 lanes/CU) with memory access […]
Jan, 18
Improving the Performance of CA-GMRES on Multicores with Multiple GPUs
The Generalized Minimum Residual (GMRES) method is one of the most widely-used iterative methods for solving nonsymmetric linear systems of equations. In recent years, techniques to avoid communication in GMRES have gained attention because in comparison to floating-point operations, communication is becoming increasingly expensive on modern computers. Since graphics processing units (GPUs) are now becoming […]
Jan, 18
A GPU-based Multi-level Subspace Decomposition Scheme for Hierarchical Tensor Product Bases
The aim of this thesis is to implement a multi-level splitting of full grids on the GPU, which could be used in the incremental visualization of scientific data sets. The splitting is motivated by the approximation properties of the sparse grid technique. Looking towards large amounts of data, ideas of parallelization and data slicing are […]
Jan, 18
Computing Spatial Distance Histograms for Large Scientific Datasets On-the-Fly
This paper focuses on an important query in scientific simulation data analysis: the Spatial Distance Histogram (SDH). The computation time of an SDH query using brute force method is quadratic. Often, such queries are executed continuously over certain time periods, increasing the computation time. We propose highly efficient approximate algorithm to compute SDH over consecutive […]
Jan, 17
Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator
We examine the Xeon Phi, which is based on Intel’s Many Integrated Cores architecture, for its suitability to run the FDK algorithm–the most commonly used algorithm to perform the 3D image reconstruction in cone-beam computed tomography. We study the challenges of efficiently parallelizing the application and means to enable sensible data sharing between threads despite […]
Jan, 17
Power Profiling of GeMTC Many Task Computing
GeMTC allows for Many Task Computing (MTC) workloads to run on hardware accelerators allowing for advantages that come from the many-core architecture. However, presently GeMTC is only written to take advantage of NVIDIA GPUs. Another such hardware accelerator, the Intel Xeon Phi, is also an excellent candidate for MTC workloads. Therefore, the first goal of […]
Jan, 17
GPU Accelerated Vessel Segmentation Using Laplacian Eigenmaps
Laplacian eigenmap is one of the most widely used techniques to improve cluster-based segmentation of multivariate images. However, one problem with this approach is its excessive computational requirements, especially when processing large image datasets. In this paper, we aim to employ the emerging commodity graphics hardware of eigenmap-based segmentation. In particular, we present a highly […]
Jan, 17
Prefiltered Single Scattering
Volumetric light scattering is a complex phenomenon that is difficult to simulate in real time as light can be scattered towards the camera from everywhere in space. By assuming a single-scattering model, we can transform the usually-employed ray-marching into an efficient ray-independent texture filtering process. Our algorithm builds upon a rectified shadow map as input […]
Jan, 17
Efficient Parallel Video Processing Techniques on GPU: From Framework to Implementation
Through reorganizing the execution order and optimizing the data structure, we proposed an efficient parallel framework for H.264/AVC encoder based on massively parallel architecture. We implemented the proposed framework by CUDA on NVIDIA’s GPU. Not only the compute intensive components of the H.264 encoder are parallelized, but also the control intensive components are realized effectively, […]
Jan, 16
MRPB: Memory Request Prioritization for Massively Parallel Processors
Massively parallel, throughput-oriented systems such as graphics processing units (GPUs) offer high performance for a broad range of programs. They are, however, complex to program, especially because of their intricate memory hierarchies with multiple address spaces. In response, modern GPUs have widely adopted caches, hoping to providing smoother reductions in memory access traffic and latency. […]