high performance computing on graphics processing units: hgpu.org

Posts

Oct, 8

A new ray-tracing scheme for 3D diffuse radiation transfer on highly parallel architectures

We present a new numerical scheme to solve the transfer of diffuse radiation on three-dimensional mesh grids which is efficient on processors with highly parallel architecture such as recently popular GPUs and CPUs with multi- and many-core architectures. The scheme is based on the ray-tracing method and the computational cost is proportional to N^5/3_m where […]

CUDA

Oct, 8

Redução de Complexidade de Tempo em GPUs

Este artigo aborda a questão da construção de algoritmos paralelos e avaliação dos resultados a partir da redução de complexidade obtida pelo emprego massivo do paralelismo, em contraponto a obtenção de speedups como delineadores da construção de algoritmos paralelos. Mostra-se que, em um problema simples de pesquisa em um vetor, é mais proveitosa.

Oct, 6

International Conference on Computer and Information Technology, ICCIT 2015

Submission Deadline: 2015-02-10 Publications: Accepted papers will be published in the one of the following Journal with ISSN. *International Journal of Computer Theory and Engineering (IJCTE) (ISSN: 1793-8201) Abstracting/Indexing: Index Copernicus, Electronic Journals Library, EBSCO, Engineering & Technology Digital Library, Google Scholar, Ulrich’s Periodicals Directory, Crossref, ProQuest, WorldCat, and EI (INSPEC, IET), Cabell’s Directories. *International […]

Oct, 6

Using Graphics Processing Unit to Accelerate Database Query Execution

One of the major problems in database management systems is handling large amounts of data while providing short response time. Problem is not only proper manner of storing records but also efficient way of processing them. In the meantime GPUs developed computational power many times greater than that offered by comparable CPUs. In our research […]

OpenCL

Oct, 6

Real-time Multi-view Depth Generation Using CUDA Multi-GPU

In this paper, we propose a real-time multi-view depth generation method using compute unified device architecture (CUDA) multi-graphics processing units (GPU). The objective is to generate multi-view depth maps in real-time. We employ eight color cameras and three depth cameras. After capturing multi-view color and depth data, we warp the depth information to color camera […]

CUDA

Oct, 6

Accelerating NTRU Encryption with Graphics Processing Units

Lattice based cryptography is attractive for its quantum computing resistance and efficient encryption/ decryption process. However, the Big Data issue has perplexed most lattice based cryptographic systems since the overall processing is slowed down too much. This paper intends to analyze one of the major lattice-based cryptographic systems, Nth-degree truncated polynomial ring (NTRU), and accelerate […]

CUDA

Oct, 6

Load Balancing in Data Warehouse – Evolution and Perspectives

The problem of load balancing is one of the crucial features in distributed data warehouse systems. In this article original load balancing algorithms are presented. The Adaptive Load Balancing Algorithms for Queries (ALBQ) and the algorithms that use grammars and learning machines in managing the ETL process. These two algorithms base the load balancing on […]

CUDA

Oct, 6

Embedding GPU Computations in Hadoop

As the size of high performance applications increases, four major challenges including heterogeneity, programmability, fault resilience, and energy efficiency have arisen in the underlying distributed systems. To tackle with all of them without sacrificing performance, traditional approaches in resource utilization, task scheduling and programming paradigm should be reconsidered. While Hadoop has handled data-intensive applications well […]

CUDA

Oct, 4

Deep Dynamic Neural Networks for Gesture Segmentation and Recognition

The purpose of this paper is to describe a novel method called Deep Dynamic Neural Networks(DDNN) for the Track 3 of the Chalearn Looking at People 2014 challenge [1]. A generalised semi-supervised hierarchical dynamic framework is proposed for simultaneous gesture segmentation and recognition taking both skeleton and depth images as input modules. First, Deep Belief […]

CUDA

Oct, 4

GPU Accelerated Radio Wave Propagation Modeling Using Ray Tracing

Radar producers, which are mostly in defense industry, need radar environment simulator to test their products during the development. Such a simulator helps them to be able to get rid of costly field tests. For developing a radar environment simulator, radio wave propagation should be modeled. However, this is a computationally expensive and time consuming […]

CUDA

Oct, 4

Parallel Shortest Path Algorithm for Voronoi Diagrams with Generalized Distance Functions

Voronoi diagrams are fundamental data structures in computational geometry with applications on different areas. Recent soft object simulation algorithms for real time physics engines require the computation of Voronoi diagrams over 3D images with non-Euclidean distances. In this case, the computation must be performed over a graph, where the edges encode the required distance information. […]

CUDA

Oct, 4

Teaching Parallel Programming Using Java

This paper presents an overview of the "Applied Parallel Computing" course taught to final year Software Engineering undergraduate students in Spring 2014 at NUST, Pakistan. The main objective of the course was to introduce practical parallel programming tools and techniques for shared and distributed memory concurrent systems. A unique aspect of the course was that […]

CUDA

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

A new ray-tracing scheme for 3D diffuse radiation transfer on highly parallel architectures

Redução de Complexidade de Tempo em GPUs

International Conference on Computer and Information Technology, ICCIT 2015

Using Graphics Processing Unit to Accelerate Database Query Execution

Real-time Multi-view Depth Generation Using CUDA Multi-GPU

Accelerating NTRU Encryption with Graphics Processing Units

Load Balancing in Data Warehouse – Evolution and Perspectives

Embedding GPU Computations in Hadoop

Deep Dynamic Neural Networks for Gesture Segmentation and Recognition

GPU Accelerated Radio Wave Propagation Modeling Using Ray Tracing

Parallel Shortest Path Algorithm for Voronoi Diagrams with Generalized Distance Functions

Teaching Parallel Programming Using Java

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)