high performance computing on graphics processing units: hgpu.org

Posts

May, 22

An Introduction to OpenCL C++

Today servers, desktops, mobile devices, and embedded systems contain many processors in addition to the CPU that runs programs. These extra processors are generally called accelerators and could be a GPU, FPGA, Xeon Phi, or other programmable device. There are many types of accelerators available, from many vendors, for many different environments. Khronos developed the […]

OpenCL

May, 22

Key derivation functions and their GPU implementation

Key derivation functions are a key element of many cryptographic applications. Password-based key derivation functions are designed specifically to derive cryptographic keys from low-entropy sources (such as passwords or passphrases) and to counter brute-force and dictionary attacks. However, the most widely adopted standard for password-based key derivation, PBKDF2, as implemented in most applications, is highly […]

OpenCL

May, 22

Parallel and Improved PageRank Algorithm for GPU-CPU Collaborative Environment

The internet is a huge collection of websites in the order of 10^8 bytes. Around 90% of the world’s population uses search engines for getting relevant information. According to Wikipedia, more than 200 million Indians use the Internet every day. Thus the correct data retrieval least time domain is the most important task. Hence need […]

CUDA

May, 22

Toward GPUs being mainstream in analytic processing: An initial argument using simple scan-aggregate queries

There have been a number of research proposals to use discrete graphics processing units (GPUs) to accelerate database operations. Although many of these works show up to an order of magnitude performance improvement, discrete GPUs are not commonly used in modern database systems. However, there is now a proliferation of integrated GPUs which are on […]

OpenCL

May, 22

Accelerating SWHE based PIRs using GPUs

In this work we focus on tailoring and optimizing the computational Private Information Retrieval (cPIR) scheme proposed in WAHC 2014 for efficient execution on graphics processing units (GPUs). Exploiting the mass parallelism in GPUs is a commonly used approach in speeding up cPIRs. Our goal is to eliminate the efficiency bottleneck of the Dor"{o}z et […]

CUDA

May, 20

A Performance and Scalability Analysis of the Tsunami Simulation EasyWave for Different Multi-Core Architectures and Programming Models

In this paper, the performance and scalability of different multi-core systems is experimentally evaluated for the Tsunami simulation EasyWave. The target platforms include a standard Ivy Bridge Xeon processor, an Intel Xeon Phi accelerator card, and also a GPU. OpenMP, MPI and CUDA were used to parallelize the program to these platforms. The absolute performance […]

CUDA

May, 20

Physically Based Rendering: Implementation of Path Tracer

The main topic of this thesis was to implement a computer program that can render photorealistic images by simulating the laws of physics. In practice the program builds an image by finding every possible path that a light ray can travel. Technique presented in this thesis will naturally simulate many physical phenomenons such as reflections, […]

OpenCL

•

OpenGL

May, 20

Kalman Filter Tracking on Parallel Architectures

Power density constraints are limiting the performance improvements of modern CPUs. To address this we have seen the introduction of lower-power, multi-core processors, but the future will be even more exciting. In order to stay within the power density limits but still obtain Moore’s Law performance/price gains, it will be necessary to parallelize algorithms to […]

May, 20

U-Net: Convolutional Networks for Biomedical Image Segmentation

There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a […]

CUDA

May, 20

An Efficient, Automatic Approach to High Performance Heterogeneous Computing

Users of heterogeneous computing systems face two problems: firstly, understanding the trade-off relationship between the observable characteristics of their applications, such as latency and quality of the result, and secondly, how to exploit knowledge of these characteristics to allocate work to distributed resources efficiently. A domain specific approach addresses both of these problems. By considering […]

OpenCL

May, 19

CHO: Towards a Benchmark Suite for OpenCL FPGA Accelerators

Programming FPGAs with OpenCL-based high-level synthesis frameworks is gaining attention with a number of commercial and research frameworks announced. However, there are no benchmarks for evaluating these frameworks. To this end, we present CHO benchmark suite an extension of CHStone, a commonly used C-based high-level synthesis benchmark suite, for OpenCL. We characterise CHO at various […]

OpenCL

May, 19

Optimizing Full Correlation Matrix Analysis of fMRI Data on Intel Xeon Phi Coprocessors

Full correlation matrix analysis (FCMA) is an unbiased approach for exhaustively studying interactions among brain regions in functional magnetic resonance imaging (fMRI) data from human participants. In order to answer neuro-scientific questions efficiently, we are developing a closedloop analysis system with FCMA on a cluster of nodes with Intel Xeon Phi coprocessors. We have proposed […]

high performance computing on graphics processing units: hgpu.org

Posts

An Introduction to OpenCL C++

Key derivation functions and their GPU implementation

Parallel and Improved PageRank Algorithm for GPU-CPU Collaborative Environment

Toward GPUs being mainstream in analytic processing: An initial argument using simple scan-aggregate queries

Accelerating SWHE based PIRs using GPUs

A Performance and Scalability Analysis of the Tsunami Simulation EasyWave for Different Multi-Core Architectures and Programming Models

Physically Based Rendering: Implementation of Path Tracer

Kalman Filter Tracking on Parallel Architectures

U-Net: Convolutional Networks for Biomedical Image Segmentation

An Efficient, Automatic Approach to High Performance Heterogeneous Computing

CHO: Towards a Benchmark Suite for OpenCL FPGA Accelerators

Optimizing Full Correlation Matrix Analysis of fMRI Data on Intel Xeon Phi Coprocessors

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)