28752

Posts

Nov, 19

Creating a Dataset for High-Performance Computing Code Translation using LLMs: A Bridge Between OpenMP Fortran and C+

In this study, we present a novel dataset for training machine learning models translating between OpenMP Fortran and C++ code. To ensure reliability and applicability, the dataset is created from a range of representative open-source OpenMP benchmarks. It is also refined using a meticulous code similarity test. The effectiveness of our dataset is assessed using […]
Nov, 19

GPU Auto-tuning Framework for Optimal Performance and Power Consumption

An auto-tuning framework for GPU devices is presented for tuning application kernels of OpenCL. The GPU tuner employs multi-objective optimization methodology to improve the performance and power consumption of applications. It efficiently explores a user defined solution space comprising of possible tunable algorithmic and hardware counter variations through code transformations. The methodology targets GPU code […]
Nov, 19

CHARM-SYCL: New Unified Programming Environment for Multiple Accelerator Types

Addressing performance portability across diverse accelerator architectures has emerged as a major challenge in the development of application and programming systems for high-performance computing environments. Although recent programming systems that focus on performance portability have significantly improved productivity in an effort to meet this challenge, the problem becomes notably more complex when compute nodes are […]
Nov, 19

ExaNBody: a HPC framework for N-Body applications

Increasing heterogeneity among HPC platforms requires applications to be frequently ported and tuned, adding burden to developers. Fast evolution of hardware mandates adaptation of algorithms and data structures to get higher performance, while application complexity constantly grows accordingly. Ensuring portability while preserving high performance at large scale along with minimal changes to an already existing […]
Nov, 19

AFOCL: Portable OpenCL Programming of FPGAs via Automated Built-in Kernel Management

OpenCL provides a consistent programming model across CPUs, GPUs, and FPGAs. However, to get reasonable performance out of FPGAs, OpenCL programs created for other platforms need to be modified. These modifications are often vendor-specific, limiting the portability of OpenCL programs between devices from different vendors. In this paper, we propose AFOCL: a cross-vendor portable programming […]
Nov, 12

On the Three P’s of Parallel Programming for Heterogeneous Computing: Performance, Productivity, and Portability

As FPGAs and GPUs continue to make inroads into high-performance computing (HPC), the need for languages and frameworks that offer performance, productivity, and portability across heterogeneous platforms, such as FPGAs and GPUs, continues to grow. OpenCL and SYCL have emerged as frameworks that offer cross-platform functional portability between FPGAs and GPUs. While functional portability across […]
Nov, 12

Solving MaxSAT with Matrix Multiplication

We propose an incomplete algorithm for Maximum Satisfiability (MaxSAT) specifically designed to run on neural network accelerators such as GPUs and TPUs. Given a MaxSAT problem instance in conjunctive normal form, our procedure constructs a Restricted Boltzmann Machine (RBM) with an equilibrium distribution wherein the probability of a Boolean assignment is exponential in the number […]
Nov, 12

An approach to performance portability through generic programming

The expanding hardware diversity in high performance computing adds enormous complexity to scientific software development. Developers who aim to write maintainable software have two options: 1) To use a so-called data locality abstraction that handles portability internally, thereby, performance-productivity becomes a trade off. Such abstractions usually come in the form of libraries, domain-specific languages, and […]
Nov, 12

An Evaluation of Directive-Based Parallelization on the GPU Using a Parboil Benchmark

Heterogeneous architectures consisting of both central processing units and graphics processing units are common in contemporary computer systems. For that reason, several programming models have been developed to exploit available parallelism, such as low-level CUDA and OpenCL, and directive-based OpenMP and OpenACC. In this paper we explore and evaluate the applicability of OpenACC, which is […]
Nov, 12

GDlog: A GPU-Accelerated Deductive Engine

Modern deductive database engines (e.g., LogicBlox and Soufflé) enable their users to write declarative queries which compute recursive deductions over extensional data, leaving their high-performance operationalization (query planning, semi-naïve evaluation, and parallelization) to the engine. Such engines form the backbone of modern high-throughput applications in static analysis, security auditing, social-media mining, and business analytics. State-of-the-art […]
Nov, 5

Performance Optimization of Deep Learning Sparse Matrix Kernels on Intel Max Series GPU

In this paper, we focus on three sparse matrix operations that are relevant for machine learning applications, namely, the sparse-dense matrix multiplication (SPMM), the sampled dense-dense matrix multiplication (SDDMM), and the composition of the SDDMM with SPMM, also termed as FusedMM. We develop optimized implementations for SPMM, SDDMM, and FusedMM operations utilizing Intel oneAPI’s Explicit […]
Nov, 5

Applying the Midas Touch of Reproducibility to High-Performance Computing

With the serial performance of CPUs improving exponentially through the 1980s and 1990s and then plateauing by the mid-2000s, the high-performance computing community has seen parallel computing become ubiquitous, which, in turn, has led to a proliferation of parallel programming models. This diversity in hardware platform and programming model has forced programmers to port their […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: