high performance computing on graphics processing units: hgpu.org

Posts

Jul, 29

Reverberant speech recognition combining deep neural networks and deep autoencoders augmented with a phone-class feature

We propose an approach to reverberant speech recognition adopting deep learning in the front-end as well as back-end of a reverberant speech recognition system, and a novel method to improve the dereverberation performance of the front-end network using phone-class information. At the front-end, we adopt a deep autoencoder (DAE) for enhancing the speech feature parameters, […]

CUDA

Jul, 29

OKL: A Unified Language for Parallel Architectures

Rapid evolution of computer processor architectures has spawned multiple programming languages and standards. This thesis strives to address the challenges caused by fast and cyclical changes in programming models. The novel contribution of this thesis is the introduction of an abstract unified framework which addresses portability and performance for programming manycore devices. To test this […]

CUDA

•

OpenCL

Jul, 28

Experiences in Speeding Up Computer Vision Applications on Mobile Computing Platforms

Computer vision (CV) is widely expected to be the next big thing in mobile computing. The availability of a camera and a large number of sensors in mobile devices will enable CV applications that understand the environment and enhance people’s lives through augmented reality. One of the problems yet to solve is how to transfer […]

OpenCL

Jul, 28

Optimization of a finite element code implemented in MATLAB: On the use of GPUs for High Performance Computing

The Department of Mechanical and Materials Engineering has developed a 2D Finite Element code based on geometry independent Cartesian grids (cgFEM) capable of solving shape optimization problems as well as making patientspecific analyses using medical images. A similar code in 3D (FEAVox) is currently under development. Both codes are implemented in MATLAB, a simple and […]

OpenCL

Jul, 27

Processing Large-scale XML Files on GPGPU Cluster

XML has been used as a textual data format for transporting and storing information in many areas. However, the cost to process the large-scale XML file will become a serious issue for general processing methods. In this paper, we propose a design and implementation of a large-scale XML processing system on GPU cluster to address […]

CUDA

Jul, 27

Real-time Ray tracing and Editing of Large Voxel Scenes

A novel approach is presented to render large voxel scenes in real-time. The approach differs from existing solutions in that a large emphasis is put on allowing the user to edit and stream large datasets. Previous solutions often use compression schemes involving hierarchical data layouts such as sparse voxel octrees that require some form of […]

OpenCL

Jul, 27

Fast-Coding Robust Motion Estimation Model in a GPU

Nowadays vision systems are used with countless purposes. Moreover, the motion estimation is a discipline that allow to extract relevant information as pattern segmentation, 3D structure or tracking objects. However, the real-time requirements in most applications has limited its consolidation, considering the adoption of high performance systems to meet response times. With the emergence of […]

Jul, 27

An efficient KNN algorithm implemented on FPGA based heterogeneous computing system using OpenCL

Accurate and efficient data classification techniques are of vital importance to many problems, and are rapidly developing in recent decades. K-Nearest Neighbor algorithm (KNN), as one of the most important algorithms, is widely used in text categorization, predictive analysis, data mining and image recognition, etc. To accelerate the algorithm and to optimize the parallel implementation […]

OpenCL

Jul, 27

Irregular algorithms on the Xeon Phi

The Xeon Phi is a coprocessor first released in 2012 by Intel. With x86 instruction set support, 60 cores and up to 2 teraflops of single-precision performance, the Xeon Phi seems promising and has gained wide interest. The world’s fastest supercomputer to date, the Tianhe-2, features the Xeon Phi, so does the recently announced 180 […]

Jul, 26

The 5th International Conference on Information Computer Application (ICICA), 2016

Publication: Submissions will be peer reviewed and evaluated based on originality, relevance to conference, contributions, and presentation. Selected papers of ICICA 2016 will be published in one of the below Journals: * Journal of Computers (ISSN: 1796-203X) Indexing/Abstracting: DBLP, EBSCO, DOAJ, ProQuest, INSPEC, ULRICH’s Periodicals Directory, WorldCat, CNKI,etc. * Journal of Advances in Information Technology […]

Jul, 26

International conference on VLSI, Communication and Instrumentation (ICVCI), 2015

Topics: VLSI Design & Testing Low Power VLSI Image & Signal Processing Grid & Cloud Computing Routing & Optimization Techniques Wired & Wireless Communication Sensor Networks & Network Security Mobile Communication & Computing High Speed Communication Networks Embedded System Intelligent Controllers Modeling & Simulation System Identification Advanced Control Systems Expert Systems Biomedical Instrumentation Process Control […]

Jul, 26

International Conference on Remote Sensing and Development (ICRSD), 2015

Topics: • Remote Sensing • Innovative Algorithms • Automated feature extraction • Object extraction • Point cloud processing • 3D scene reconstruction • Spatio-temporal data fusion • Range image processing • Photogrammetric computer vision • Multi-scale segmentation • Support Vector Machine classifiers • Random forest classifiers • Marked point process extractors • Sparse and redundant […]

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Reverberant speech recognition combining deep neural networks and deep autoencoders augmented with a phone-class feature

OKL: A Unified Language for Parallel Architectures

Experiences in Speeding Up Computer Vision Applications on Mobile Computing Platforms

Optimization of a finite element code implemented in MATLAB: On the use of GPUs for High Performance Computing

Processing Large-scale XML Files on GPGPU Cluster

Real-time Ray tracing and Editing of Large Voxel Scenes

Fast-Coding Robust Motion Estimation Model in a GPU

An efficient KNN algorithm implemented on FPGA based heterogeneous computing system using OpenCL

Irregular algorithms on the Xeon Phi

The 5th International Conference on Information Computer Application (ICICA), 2016

International conference on VLSI, Communication and Instrumentation (ICVCI), 2015

International Conference on Remote Sensing and Development (ICRSD), 2015

Recent source codes

WiLLM: An Open Wireless LLM Communication System

Vcc: the Vulkan Clang Compiler

hpcbench: A set of benchmarking utilities for biomolecular simulation tools

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

chemtrain: Training Molecular Dynamics Potentials in JAX

microSYCL: SYCL micro-benchmarks repository

XaaS containers

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

SYCL Container

Most viewed papers (last 30 days)