high performance computing on graphics processing units: hgpu.org

Posts

Jun, 28

Performance and Efficiency Analysis of Modern Accelerators: Fine-Grained Parallelism on the Intel Xeon Phi

Supercomputers define the pinnacle of computational power and are an essential tool for solving vast scientific computational problems. They employ increasingly parallel architectures to ever increase their nominal peak performance and to allow them to solve larger problems. Employing the vast amount of computation power is however difficult and optimising for many-core architectures has become […]

Jun, 28

Evaluation of DGEMM Implementation on Intel Xeon Phi Coprocessor

In this paper we will present a detailed study of implementing double-precision matrix-matrix multiplication (DGEMM) utilizing the Intel Xeon Phi Coprocessor. We discuss a DGEMM algorithm implementation running "natively" on the coprocessor, minimizing communication with the host CPU. We will run DGEMM across a range of matrix sizes natively as well using Intel Math Kernel […]

Jun, 28

Modified Bloom filter for high performance hybrid NoSQL systems

This article addresses problems of implementation of a modified Bloom filter as an additional module for mass data storage systems in supercomputers with hybrid CPU/GPU architecture. It is proposed to use a modified filter with counters, which makes it possible to monitor not only data addition, but also data removal. A comparative analysis has been […]

CUDA

Jun, 28

Implementation of the genetic algorithm by means of CUDA technology involved in travelling salesman problem

The research was intended to solve the travelling salesman problem by means of genetic algorithms. The implementation of the algorithm was by virtue of CUDA technology. The research was focused on checking how much the system can improve if instead of classical CPU processors one uses GPU graphical processors enabled to perform the operations parallel. […]

CUDA

Jun, 28

Various String Matching Algorithms for DNA Sequences to Detect Breast Cancer using CUDA Processors

The main aim of string matching algorithm is to locate the appearance of a specific pattern in an array of larger size text. String matching algorithms has been used in many applications such as DNA analysis. This report introduces a new approach of string matching algorithm to detect the occurrence of several gene patterns in […]

CUDA

Jun, 27

7th International Conference on Computer and Electrical Engineering, ICCEE 2014

Submission Deadline: 2014-07-20 Publication: All accepted paper will be published in International Journal of Electrical Energy (IJOEE), which will be indexed by Ulrich’s Periodicals Directory, Google Scholar, EBSCO, Engineering & Technology Digital Library and Electronic Journals Digital Library. Call for Paper: Computer Engineering Algorithm Computer Vision, Graphics and Intelligence Computational and Artificial Intelligence Image Processing […]

Jun, 27

Performance Evaluation of Parallel AES Implementations over CUDA GPU Framework

With a high computational complexity of encryption algorithm, AES, especially for huge real-time data, GPU has recently offered an alternate computational system instead of a traditional CPU (thread), incurring a significant improvement in speeding up the computational intensive parallel data encryption in various aspects – tremendous number of processing cores and non-generic computational processing architecture […]

CUDA

Jun, 27

Industrial Robot Collision Handling in Harsh Environments

The focus in this thesis is on robot collision handling systems, mainly collision detection and collision avoidance for industrial robots operating in harsh environments (e.g. potentially explosive atmospheres found in the oil and gas sector). Collision detection should prevent the robot from colliding and therefore avoid a potential accident. Collision avoidance builds on the concept […]

CUDA

Jun, 27

Computation on GPU of Eigenvalues and Eigenvectors of a Large Number of Small Hermitian Matrices

This paper presents an implementation on Graphics Processing Units of QR-Householder algorithm used to find all the eigenvalues and eigenvectors of many small hermitian matrices (double precision) in a very short time to address time constraints for Radar issues.

CUDA

Jun, 27

Image Noise Removal on Heterogeneous CPU-GPU Configurations

A parallel algorithm to remove impulsive noise in digital images using heterogeneous CPU/GPU computing is proposed. The parallel denoising algorithm is based on the peer group concept and uses an Euclidean metric. In order to identify the amount of pixels to be allocated in multi-core and GPUs, a performance analysis using large images is presented. […]

CUDA

Jun, 27

Speeding up a Video Summarization Approach Using GPUs and Multicore CPUs

The recent progress of digital media has stimulated the creation, storage and distribution of data, such as digital videos, generating a large volume of data and requiring efficient technologies to increase the usability of these data. Video summarization methods generate concise summaries of video contents and enable faster browsing, indexing and accessing of large video […]

CUDA

Jun, 26

On the Characterization of OpenCL Dwarfs on Fixed and Reconfigurable Platforms

The proliferation of heterogeneous computing platforms presents the parallel computing community with new challenges. One such challenge entails evaluating the efficacy of such parallel architectures and identifying the architectural innovations that ultimately benefit applications. To address this challenge, we need benchmarks that capture the execution patterns (i.e., dwarfs or motifs) of applications, both present and […]

OpenCL

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Performance and Efficiency Analysis of Modern Accelerators: Fine-Grained Parallelism on the Intel Xeon Phi

Evaluation of DGEMM Implementation on Intel Xeon Phi Coprocessor

Modified Bloom filter for high performance hybrid NoSQL systems

Implementation of the genetic algorithm by means of CUDA technology involved in travelling salesman problem

Various String Matching Algorithms for DNA Sequences to Detect Breast Cancer using CUDA Processors

7th International Conference on Computer and Electrical Engineering, ICCEE 2014

Performance Evaluation of Parallel AES Implementations over CUDA GPU Framework

Industrial Robot Collision Handling in Harsh Environments

Computation on GPU of Eigenvalues and Eigenvectors of a Large Number of Small Hermitian Matrices

Image Noise Removal on Heterogeneous CPU-GPU Configurations

Speeding up a Video Summarization Approach Using GPUs and Multicore CPUs

On the Characterization of OpenCL Dwarfs on Fixed and Reconfigurable Platforms

Recent source codes

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

CuFuzz: An API-Knowledge-Graph Coverage-Driven Fuzzing Framework for CUDA Libraries

AutoPass: Evidence-Guided LLM Agents for Compiler Performance Tuning

Probe-and-Refine Tuning of Repository Guidance for AI Coding Agents

CUDAnalyst (CUDA + Analyst)

CodegenBench

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels

CUDA Kernel Fusion Benchmarks

IntelliKit: Agent-first tooling for AMD hardware

DITRON: Distributed Compiler based on Triton for Parallel Systems

Most viewed papers (last 30 days)