high performance computing on graphics processing units: hgpu.org

Posts

Jul, 8

Using the pyMIC Offload Module in PyFR

PyFR is an open-source high-order accurate computational fluid dynamics solver for unstructured grids. It is designed to efficiently solve the compressible Navier-Stokes equations on a range of hardware platforms, including GPUs and CPUs. In this paper we will describe how the Python Offload Infrastructure for the Intel Many Integrated Core Architecture (pyMIC) was used to […]

CUDA

•

OpenCL

Jul, 8

TTC: A Tensor Transposition Compiler for Multiple Architectures

We consider the problem of transposing tensors of arbitrary dimension and describe TTC, an open source domain-specific parallel compiler. TTC generates optimized parallel C++/CUDA C code that achieves a significant fraction of the system’s peak memory bandwidth. TTC exhibits high performance across multiple architectures, including modern AVX-based systems (e.g.,~Intel Haswell, AMD Steamroller), Intel’s Knights Corner […]

CUDA

Jul, 8

GPU Based Detection of Topological Changes in Voronoi Diagrams

The Voronoi diagrams are an important tool having theoretical and practical applications in a large number of fields. We present a new procedure, implemented as a set of CUDA kernels, which detects, in a general and efficient way, topological changes in case of dynamic Voronoi diagrams whose generating points move in time. The solution that […]

CUDA

Jul, 8

Matrix Multiplication Beyond Auto-Tuning: Rewrite-based GPU Code Generation

Graphics Processing Units (GPUs) are used as general purpose parallel accelerators in a wide range of applications. They are found in most computing systems, and mobile devices are no exception. The recent availability of programming APIs such as OpenCL for mobile GPUs promises to open up new types of applications on these devices. However, producing […]

OpenCL

Jul, 8

A Survey of Techniques for Designing and Managing CPU Register File

Processor register file (RF) is an important microarchitectural component used for storing operands and results of instructions. The design and operation of RF has crucial impact on the performance, energy efficiency and reliability of the processor and hence, several techniques have been recently proposed to manage RF in modern processors. In this paper, we present […]

Jul, 5

1st International Workshop on Theoretical Approaches to Performance Evaluation, Modeling and Simulation (TAPEMS), 2016

Performance and an aspect of it, energy efficiency, has become a key issue in both high performance and embedded computing. The objective of the 1st TAPEMS International Workshop on Theoretical Approaches to Performance Evaluation, Modeling and Simulation is to bring together researchers and practitioners from academia and industry to discuss current advances and trends in […]

Jul, 5

International Conference on Intelligent Computing and Applications (ICICA), 2017

Publication: Submissions will be peer reviewed and evaluated based on originality, relevance to conference, contributions, and presentation. Accepted papers of ICICA 2017 will be collected in one of the following publications. A: Conference Proceedings Indexing/Abstracting: DBLP, ProQuest, INSPEC, CNKI, EI Compendex, Scopus etc. B: Journal of Computers (ISSN: 1796-203X) Indexing/Abstracting: DBLP, EBSCO, DOAJ, ProQuest, INSPEC, […]

Jul, 5

3rd IEEE International Conference on Control, Automation and Robotics (ICCAR), 2017

Publication: ICCAR 2017 conference proceedings will be published by IEEE Conference Publication, which would be indexed by IEEE Xplore and Ei Compendex. Keynote & Plenary Speakers: Prof. Wei-Hsin Liao, The Chinese University of Hong Kong, Hong Kong; Dr. Ferial El-Hawary, Dalhousie U., Canada; Prof. Mo El-Hawary, Dalhousie University, Canada, past president of IEEE Canada. Contact: […]

Jul, 5

3rd International Conference on Robotics and Computer Vision (ICRCV), 2017

Publication and Indexing: Accepted and registered papers will be published into the conference proceedings, which will be online and included in the major data base, such as SCOPUS, Ei. Selected papers will be recommended to IJMERR, Indexed by: Index Corpernicus, ProQuest, UDL, Google Scholar, Open J-Gate, Scopus (since 2016) etc. Keynote Speakers: Prof. Wei-Hsin Liao, […]

Jul, 5

High Performance Computing and Cluster Technologies Conference (HPCCT), 2016

Publication: All the accepted papers will be published in the HPCCT 2016 conference Proceedings, All the accepted papers will be published in the HPCCT 2016 conference Proceedings, and which will be indexed by Ei Compendex. Conference Schedule: December 17th, 2016 : Registration, collecting conference materials December 18th, 2016 : Opening Remarks & Keynote Speeches Oral […]

Jul, 5

International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence (ISMSI), 2017

Publication: All the accepted papers will be published in the ISMSI 2017 conference Proceedings,and reviewed by the IEEE Conference Publication Program for IEEE Xplore and Ei Compendex. Contact: Ms.Yvonne Miller Email:sub@ismsi.org

Jul, 5

4th International Conference on Mechatronics, Electronics and Automation Engineering (ICMEAE), 2017

Publication: All accepted papers of ICMEAE 2017 will be published in ICMEAE 2017 Conference Proceedings,which will be indexed by Ei Compendex, Inspec, DOAJ, CPCI (Web of Science) and Scopus. Conference Venue: Chulalongkorn University, Thailand Conference Schedule: April 1st, 2017: Registration and Collecting conference materials April 2nd, 2017: Keynote speeches and Oral presentation April 3rd, 2017: […]

high performance computing on graphics processing units: hgpu.org

Posts

Using the pyMIC Offload Module in PyFR

TTC: A Tensor Transposition Compiler for Multiple Architectures

GPU Based Detection of Topological Changes in Voronoi Diagrams

Matrix Multiplication Beyond Auto-Tuning: Rewrite-based GPU Code Generation

A Survey of Techniques for Designing and Managing CPU Register File

1st International Workshop on Theoretical Approaches to Performance Evaluation, Modeling and Simulation (TAPEMS), 2016

International Conference on Intelligent Computing and Applications (ICICA), 2017

3rd IEEE International Conference on Control, Automation and Robotics (ICCAR), 2017

3rd International Conference on Robotics and Computer Vision (ICRCV), 2017

High Performance Computing and Cluster Technologies Conference (HPCCT), 2016

International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence (ISMSI), 2017

4th International Conference on Mechatronics, Electronics and Automation Engineering (ICMEAE), 2017

Recent source codes

OpScanner

Atlas CLI: Machine Learning (ML) Lifecycle & Transparency Manager

transformers_tvm: Implementation of Encoder Decoder transformer on TVM

INT v.s. FP: A framework to compare low-bit integer and float-point formats

AutoDock-GPU: AutoDock for GPUs and other accelerators

NCCLX: collective communication framework

Tutoring LLM into a Better CUDA Optimizer

Adaptivity in AdaptiveCpp: Optimizing Performance by Leveraging Runtime Information During JIT-Compilation

Kernel Library for LLM Serving

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

Most viewed papers (last 30 days)