high performance computing on graphics processing units: hgpu.org

Posts

Jul, 30

An Autotuning Framework for Intel Xeon Phi Platforms

In this thesis, we develop an auto-tuning framework for Intel Xeon Phi co-processors based on analytical methods. Its purpose is to relieve the application developer from configuring the compiler and the execution environment by efficiently and optimally finding the solution that delivers the best outcome in respect of performance and power. Shortly, the Autotuner has […]

Jul, 30

PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures

Computing k-Nearest Neighbors (KNN) is one of the core kernels used in many machine learning, data mining and scientific computing applications. Although kd-tree based O(log n) algorithms have been proposed for computing KNN, due to its inherent sequentiality, linear algorithms are being used in practice. This limits the applicability of such methods to millions of […]

Jul, 29

International Conference on Machine Learning and Soft Computing (ICMLSC), 2017

Proceedings ICMLSC 2017 conference will be published by: conference proceedings, which will be included in the major data base, Ei, Scopus, etc. Speakers Prof. Pham The Bao, University of Science, Vietnam Prof. Hieu Trung Huynh, Dean of IT faculty of Industrial University of Ho Chi Minh City Prof. Genci Capi, University of Toyama, Japan Submission […]

Jul, 29

9th International Conference on Computer Research and Development (ICCRD), 2017

Proceedings The submitted full papers will be peer-reviewed, accepted and registered full papers will be published into selected Journals. The journals information is as below: JEST- Journal of Electronic Science and Technology, ISSN: 1674-862X, Indexed by SCOPUS, Ei in INSPEC, etc. JSW- Journal of Software, ISSN 1796-217X, Indexed by: DBLP, EBSCO, DOAJ, ProQuest, INSPEC, etc. […]

Jul, 29

3rd International Conference on Mechatronics and Robotics Engineering (ICMRE), 2017

Keynote Speakers Prof. Dan Zhang York University, Canada | Full Professor; Prof. Said Hanafi University of Valenciennes, France | Full Professor; Prof. Farouk Yalaoui Université de Technologie de Troyes, France | Full Professor Sponsor University Of Valencienne, France; York University, Canada; Beijing Jiaotong University, China; etc. Publication Submitted papers will be peer reviewed by the […]

Jul, 29

8th International Conference on Computer Modeling and Simulation (ICCMS), 2017

Keynote Speakers Prof. Girija Chetty, University of Canberra, Australia; Prof. William Guo, Central Queensland University, Australia; Prof. Ghassan Beydoun, Univeristy of Wollongong, Australia; Prof. Jun Shen, University of Wollongong, Australia. Publication *All accepted papers of ICCMS 2017 (Registered & Presented) will be collected in the conference proceedings, which will be indexed by EI and Scopus,etc. […]

Jul, 29

4th International Conference on Robotics, Mechanics and Mechatronics (ICRMM), 2017

Keynote Speaker Alexander M. Korsunsky Professor of Engineering Science and Fellow, Trinity College, Oxford Paper publication All accepted papers of ICRMM 2017 will be published in the conference proceedings, which is indexed by Ei Compendex, Inspec, DOAJ, CPCI (Web of Science) and Scopus. Visit & Tour will be organized on March 13, 2017. Detailed information […]

Jul, 29

7th International Conference on Bioscience, Biochemistry and Bioinformatics (ICBBB), 2017

Famous professor as Keynote Speaker Prof. Orawan Siriratpiriya from Aquatic Resources Research Institute, Chulalongkorn University (ARRIC)； Prof. Manoj R. Tarambale from Marathwada Mitra Mandal’s College of Engineering, Pune, India Publication ICBBB 2017 accepted and registered papers will be published in one of the following journals： International Journal of Bioscience, Biochemistry and Bioinformatics (IJBBB, ISSN: 2010-3638), […]

Jul, 29

International Conference on Virtual and Augmented Reality Simulations (ICVARS), 2017

Publication After a careful reviewing process, all accepted papers after proper registration and presentation, will be published in the conference Proceedings,which will be indexed by Ei & Scoups. Submission Methods Log in Electronic Submission System (.pdf) http://www.easychair.org/conferences/?conf=icvars2017 to sumbit your full paper or send it directly to conference email: icvars_conf@sina.com

Jul, 28

Many-Core Algorithms for Combinatorial Optimization

Combinatorial Optimization is becoming ever more crucial, in these days. From natural sciences to economics, passing through urban centers administration and personnel management, methodologies and algorithms with a strong theoretical background and a consolidated real-word effectiveness is more and more requested, in order to find, quickly, good solutions to complex strategical problems. Resource optimization is, […]

CUDA

Jul, 28

PASSATA – Object oriented numerical simulation software for adaptive optics

We present the last version of the PyrAmid Simulator Software for Adaptive opTics Arcetri (PASSATA), an IDL and CUDA based object oriented software developed in the Adaptive Optics group of the Arcetri observatory for Monte-Carlo end-to-end adaptive optics simulations. The original aim of this software was to evaluate the performance of a single conjugate adaptive […]

CUDA

Jul, 28

An Optimized Multiple Right-Hand Side Dslash Kernel for Intel Xeon Phi

Lattice quantum chromodynamics (LQCD) stands unique as the only computationally tractable, non-perturbative, and model-independent quantum field theory of the strong nuclear force. The computational core of LQCD is the Wilson Dslash operator, a nearest neighbor stencil operator summing matrix-vector multiplications over lattice points, whose performance is bandwidth-bound on most architectures. Reportedly, up to 90% of […]

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations

* * *

high performance computing on graphics processing units: hgpu.org

Posts

An Autotuning Framework for Intel Xeon Phi Platforms

PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures

International Conference on Machine Learning and Soft Computing (ICMLSC), 2017

9th International Conference on Computer Research and Development (ICCRD), 2017

3rd International Conference on Mechatronics and Robotics Engineering (ICMRE), 2017

8th International Conference on Computer Modeling and Simulation (ICCMS), 2017

4th International Conference on Robotics, Mechanics and Mechatronics (ICRMM), 2017

7th International Conference on Bioscience, Biochemistry and Bioinformatics (ICBBB), 2017

International Conference on Virtual and Augmented Reality Simulations (ICVARS), 2017

Many-Core Algorithms for Combinatorial Optimization

PASSATA – Object oriented numerical simulation software for adaptive optics

An Optimized Multiple Right-Hand Side Dslash Kernel for Intel Xeon Phi

Recent source codes

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

CUDABench: Benchmarking LLMs for Text-to-CUDA Generation

CL4SE: A Context Learning Benchmark For Software Engineering Tasks

CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

KernelGYM & Dr. Kernel: A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations

Vortex-Optimized Light-weight Toolchain (VOLT)

SciDef: Automated Definition Extraction from Scientific Literature

bioagent-bench: Benchmark for evaluating LLM agents in bioinformatics

Most viewed papers (last 30 days)