11295

Posts

Jan, 18

Supporting x86-64 Address Translation for 100s of GPU Lanes

Efficient memory sharing between CPU and GPU threads can greatly expand the effective set of GPGPU workloads. For increased programmability, this memory should be uniformly virtualized, necessitating compatible address translation support for GPU memory references. However, even a modest GPU might need 100s of translations per cycle (6 CUs * 64 lanes/CU) with memory access […]
Jan, 18

Improving the Performance of CA-GMRES on Multicores with Multiple GPUs

The Generalized Minimum Residual (GMRES) method is one of the most widely-used iterative methods for solving nonsymmetric linear systems of equations. In recent years, techniques to avoid communication in GMRES have gained attention because in comparison to floating-point operations, communication is becoming increasingly expensive on modern computers. Since graphics processing units (GPUs) are now becoming […]
Jan, 18

A GPU-based Multi-level Subspace Decomposition Scheme for Hierarchical Tensor Product Bases

The aim of this thesis is to implement a multi-level splitting of full grids on the GPU, which could be used in the incremental visualization of scientific data sets. The splitting is motivated by the approximation properties of the sparse grid technique. Looking towards large amounts of data, ideas of parallelization and data slicing are […]
Jan, 18

Computing Spatial Distance Histograms for Large Scientific Datasets On-the-Fly

This paper focuses on an important query in scientific simulation data analysis: the Spatial Distance Histogram (SDH). The computation time of an SDH query using brute force method is quadratic. Often, such queries are executed continuously over certain time periods, increasing the computation time. We propose highly efficient approximate algorithm to compute SDH over consecutive […]
Jan, 17

Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator

We examine the Xeon Phi, which is based on Intel’s Many Integrated Cores architecture, for its suitability to run the FDK algorithm–the most commonly used algorithm to perform the 3D image reconstruction in cone-beam computed tomography. We study the challenges of efficiently parallelizing the application and means to enable sensible data sharing between threads despite […]
Jan, 17

Power Profiling of GeMTC Many Task Computing

GeMTC allows for Many Task Computing (MTC) workloads to run on hardware accelerators allowing for advantages that come from the many-core architecture. However, presently GeMTC is only written to take advantage of NVIDIA GPUs. Another such hardware accelerator, the Intel Xeon Phi, is also an excellent candidate for MTC workloads. Therefore, the first goal of […]
Jan, 17

GPU Accelerated Vessel Segmentation Using Laplacian Eigenmaps

Laplacian eigenmap is one of the most widely used techniques to improve cluster-based segmentation of multivariate images. However, one problem with this approach is its excessive computational requirements, especially when processing large image datasets. In this paper, we aim to employ the emerging commodity graphics hardware of eigenmap-based segmentation. In particular, we present a highly […]
Jan, 17

Prefiltered Single Scattering

Volumetric light scattering is a complex phenomenon that is difficult to simulate in real time as light can be scattered towards the camera from everywhere in space. By assuming a single-scattering model, we can transform the usually-employed ray-marching into an efficient ray-independent texture filtering process. Our algorithm builds upon a rectified shadow map as input […]
Jan, 17

Efficient Parallel Video Processing Techniques on GPU: From Framework to Implementation

Through reorganizing the execution order and optimizing the data structure, we proposed an efficient parallel framework for H.264/AVC encoder based on massively parallel architecture. We implemented the proposed framework by CUDA on NVIDIA’s GPU. Not only the compute intensive components of the H.264 encoder are parallelized, but also the control intensive components are realized effectively, […]
Jan, 16

MRPB: Memory Request Prioritization for Massively Parallel Processors

Massively parallel, throughput-oriented systems such as graphics processing units (GPUs) offer high performance for a broad range of programs. They are, however, complex to program, especially because of their intricate memory hierarchies with multiple address spaces. In response, modern GPUs have widely adopted caches, hoping to providing smoother reductions in memory access traffic and latency. […]
Jan, 16

VertexAPI2 – A Vertex-Program API for Large Graph Computations on the GPU

VertexAPI2 uses state-of-the-art GPU algorithms to implement the Gather-Apply-Scatter (GAS) abstraction for graph computations. VertexAPI2 provides up to an order of magnitude greater performance over the previous implementation and performance comparable to speed-of-light hand-coded algorithms in some cases, while retaining the simplicity of development of the GAS model. The current code also has a preliminary […]
Jan, 16

Improving Student Learning in Computer Science Courses by Using Virtual OpenCL Laboratory

Laboratory experience is an essential factor for engineering and science education. Virtual laboratories are widely used by universities and research institutions in various kinds of academic sectors. However, general virtual laboratories always have some weakness for computer graphics which its experiment needs to be done in high performance computers. In the assessment of a graduate […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org