high performance computing on graphics processing units: hgpu.org

Posts

Oct, 22

CuMF_SGD: Fast and Scalable Matrix Factorization

Matrix factorization (MF) has been widely used in e.g., recommender systems, topic modeling and word embedding. Stochastic gradient descent (SGD) is popular in solving MF problems because it can deal with large data sets and is easy to do incremental learning. We observed that SGD for MF is memory bound. Meanwhile, single-node CPU systems with […]

CUDA

Oct, 15

Efficient molecular dynamics simulations with many-body potentials on graphics processing units

Graphics processing units have been extensively used to accelerate classical molecular dynamics simulations. However, there is much less progress on the acceleration of force evaluations for many-body potentials compared to pairwise ones. In the conventional force evaluation algorithm for many-body potentials, the force, virial stress, and heat current for a given atom are accumulated within […]

CUDA

Oct, 15

Reordering strategy for blocking optimization in sparse linear solvers

Solving sparse linear systems is a problem that arises in many scientific applications, and sparse direct solvers are a time consuming and key kernel for those applications and for more advanced solvers such as hybrid direct-iterative solvers. For this reason, optimizing their performance on modern architectures is critical. The preprocessing steps of sparse direct solvers, […]

CUDA

Oct, 15

Machine Learning Based Intrusion Detection in Controller Area Networks

This project examines the feasibility of machine learning based fingerprinting of CAN transceivers for the purpose of uniquely identifying signal sources during intrusion detection. A working multi-node CAN bus development environment was constructed, and an OpenCL Deep Learning Python Wrapper was ported to the platform. Multiple Machine Learning Algorithms were compared Systematically, and two models […]

OpenCL

Oct, 15

Embedded real-time stereo estimation via Semi-Global Matching on the GPU

Dense, robust and real-time computation of depth information from stereo-camera systems is a computationally demanding requirement for robotics, advanced driver assistance systems (ADAS) and autonomous vehicles. Semi-Global Matching (SGM) is a widely used algorithm that propagates consistency constraints along several paths across the image. This work presents a real-time system producing reliable disparity estimation results […]

CUDA

Oct, 15

GPU-accelerated real-time stixel computation

The Stixel World is a medium-level, compact representation of road scenes that abstracts millions of disparity pixels into hundreds or thousands of stixels. The goal of this work is to implement and evaluate a complete multi-stixel estimation pipeline on an embedded, energy-efficient, GPU-accelerated device. This work presents a full GPU-accelerated implementation of stixel estimation that […]

CUDA

Oct, 14

International Conference on Network and Cyber Security (ICNCS), 2017

ICNCS 2017, International Conference on Network and Cyber Security, will take place in Lakeland,Florida, United States from May 19-23, 2017. ICNCS 2017 is a not-to-be-missed opportunity that distills the most current knowledge on a rapidly advancing discipline in one conference. Join key researchers and established professionals in the field of Network and Cyber Security as […]

Oct, 14

The 2nd International Conference on Electronics Engineering and Informatics (ICEEI), 2017

2017 The 2nd International Conference on Electronics Engineering and Informatics (ICEEI 2017) will be held during June 25-27, in Beijing,China, as the workshop of WCSE 2017. ICEEI’s set is based on the success of WCSE conferences, the research papers published in WCSE proceedings had been indexed by EI, Scopus each year, as the workshop of […]

Oct, 14

The 5th International Conference on Information Technology and Science (ICITS), 2017

The unique idea behind 2017 The 5th International Conference on Information Technology and Science (ICITS) is to provide an opportunity for leading academicians, scientists, researchers and industry professionals from around the world to network and have scientific discussion on the latest advancements in the interlinked domains of science, business and engineering and it’s research benefits […]

Oct, 14

International Conference on Robotics and Automation Sciences (ICRAS), 2017

Paper Publication The paper accepted by ICRAS 2017 will be published in conference proceedings by IEEE.The proceedings will be submitted and reviewed by the IEEE Xplore and indexed by *Ei Compendex* and *Scopus* after the conference. Submission Methods 1. Full Paper (Presentation and publication) 2. Abstract (Presentation only) Please submit paper in the Electronic Submission […]

Oct, 12

Overtaking CPU DBMSes with a GPU in Whole-Query Analytic Processing with Parallelism-Friendly Execution Plan Optimization

Existing work on accelerating analytic DB query processing with (discrete) GPUs fails to fully realize their potential for speedup through parallelism: Published results do not achieve significant speedup over more performant CPU-only DBMSes when processing complete queries. This paper presents a successful e!ort to better meet this challenge, in the form of a proof-of-concept query […]

CUDA

Oct, 12

Understanding Latency Hiding on GPUs

Modern commodity processors such as GPUs may execute up to about a thousand of physical threads per chip to better utilize their numerous execution units and hide execution latencies. Understanding this novel capability, however, is hindered by the overall complexity of the hardware and complexity of typical workloads. In this dissertation, we suggest a better […]

CUDA

* * *

high performance computing on graphics processing units: hgpu.org

Posts

CuMF_SGD: Fast and Scalable Matrix Factorization

Efficient molecular dynamics simulations with many-body potentials on graphics processing units

Reordering strategy for blocking optimization in sparse linear solvers

Machine Learning Based Intrusion Detection in Controller Area Networks

Embedded real-time stereo estimation via Semi-Global Matching on the GPU

GPU-accelerated real-time stixel computation

International Conference on Network and Cyber Security (ICNCS), 2017

The 2nd International Conference on Electronics Engineering and Informatics (ICEEI), 2017

The 5th International Conference on Information Technology and Science (ICITS), 2017

International Conference on Robotics and Automation Sciences (ICRAS), 2017

Overtaking CPU DBMSes with a GPU in Whole-Query Analytic Processing with Parallelism-Friendly Execution Plan Optimization

Understanding Latency Hiding on GPUs

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)