high performance computing on graphics processing units: hgpu.org

Posts

Sep, 10

The 2nd International Conference on Cryptography, Security and Privacy (ICCSP), 2018

The 2nd International Conference on Cryptography, Security and Privacy (ICCSP 2018) will be held during March 16-19, 2018, in Guiyang, China. ICCSP 2018 aims to bring together researchers, scientists, engineers, and scholar students to exchange and share their experiences, new ideas, and research results about all aspects of Cryptography, Security and Privacy, and discuss the […]

Sep, 10

The 3rd International Conference on Multimedia and Image Processing (ICMIP), 2018

The 3rd International Conference on Multimedia and Image Processing (ICMIP 2018) will be held in Guiyang, China during March 16-19, 2018. ICMIP has been held successfully in Bandar Seri Begawan, Brunei Darussalam, Wuhan, China in the last two years respectively. The objective of ICMIP is to present the latest research and results of scientists related […]

Sep, 9

Optimization of Spatial Convolution in ConvNets on Intel KNL

Most of the experts admit that the true behavior of the neural network is hard to predict. It is quite impossible to deterministically prove the working of the neural network as the architecture gets bigger, yet, it is observed that it is possible to apply a well engineered network to solve one of the most […]

Sep, 9

Beyond 16GB: Out-of-Core Stencil Computations

Stencil computations are a key class of applications, widely used in the scientific computing community, and a class that has particularly benefited from performance improvements on architectures with high memory bandwidth. Unfortunately, such architectures come with a limited amount of fast memory, which is limiting the size of the problems that can be efficiently solved. […]

CUDA

Sep, 9

Progressive Clustering of Big Data with GPU Acceleration and Visualization

Clustering has become an unavoidable step in big data analysis. It may be used to arrange data into a compact format, making operations on big data manageable. However, clustering of big data requires not only the capability of handling data with large volume and high dimensionality, but also the ability to process streaming data, all […]

CUDA

Sep, 9

Analysis of GPU accelerated OpenCL applications on the Intel HD 4600 GPU

GPU acceleration is the concept of accelerating the execution speed of an application by running it on the GPU. Researchers and developers have always wanted to achieve greater speed for their applications and GPU acceleration is a very common way of doing so. This has been done a long time for highly graphical applications using […]

OpenCL

Sep, 9

Data Layout Oriented Compilation Techniques in Vectorization for Multi-/Many-cores

Single instruction, multiple data (SIMD) architectures are widely adopted in both general-purpose processors and graphic processing units for exploiting data-level parallelism. It is tedious and error-prone for programmers to write high performance code to utilize SIMD execution units on both platforms. Therefore, users often rely on automatic code generation techniques in compilers. However, it is […]

CUDA

Sep, 7

Generating Custom Code for Efficient Query Execution on Heterogeneous Processors

Processor manufacturers build increasingly specialized processors to mitigate the effects of the power wall to deliver improved performance. Currently, database engines are manually optimized for each processor: A costly and error prone process. In this paper, we propose concepts to enable the database engine to perform per-processor optimization automatically. Our core idea is to create […]

OpenCL

Sep, 7

FLASH: Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search

We present FLASH (Fast LSH Algorithm for Similarity search accelerated with HPC (High-Performance Computing)), a similarity search system for ultra-high dimensional datasets on a single machine, which does not require similarity computation. Our system is an auspicious illustration of the power of randomized algorithms carefully tailored for high-performance computing platforms. We leverage LSH style randomized […]

OpenCL

Sep, 7

From MPI to MPI+OpenACC: Conversion of a legacy FORTRAN PCG solver for the spherical Laplace equation

A real-world example of adding OpenACC to a legacy MPI FORTRAN Preconditioned Conjugate Gradient code is described, and timing results for multi-node multi-GPU runs are shown. The code is used to obtain three-dimensional spherical solutions to the Laplace equation. Its application is finding potential field solutions of the solar corona, a useful tool in space […]

CUDA

Sep, 7

Accelerating recurrent neural network training using sequence bucketing and multi-GPU data parallelization

An efficient algorithm for recurrent neural network training is presented. The approach increases the training speed for tasks where a length of the input sequence may vary significantly. The proposed approach is based on the optimal batch bucketing by input sequence length and data parallelization on multiple graphical processing units. The baseline training performance without […]

Sep, 7

Multi-Tasking Scheduling for Heterogeneous Systems

Heterogeneous platforms play an increasingly important role in modern computer systems. They combine high performance with low power consumption. From mobiles to supercomputers, we see an increasing number of computer systems that are heterogeneous. The most well-known heterogeneous system, CPU+GPU platforms have been widely used in recent years. As they become more mainstream, serving multiple […]

OpenCL

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

The 2nd International Conference on Cryptography, Security and Privacy (ICCSP), 2018

The 3rd International Conference on Multimedia and Image Processing (ICMIP), 2018

Optimization of Spatial Convolution in ConvNets on Intel KNL

Beyond 16GB: Out-of-Core Stencil Computations

Progressive Clustering of Big Data with GPU Acceleration and Visualization

Analysis of GPU accelerated OpenCL applications on the Intel HD 4600 GPU

Data Layout Oriented Compilation Techniques in Vectorization for Multi-/Many-cores

Generating Custom Code for Efficient Query Execution on Heterogeneous Processors

FLASH: Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search

From MPI to MPI+OpenACC: Conversion of a legacy FORTRAN PCG solver for the spherical Laplace equation

Accelerating recurrent neural network training using sequence bucketing and multi-GPU data parallelization

Multi-Tasking Scheduling for Heterogeneous Systems

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)