high performance computing on graphics processing units: hgpu.org

Posts

May, 16

Using Butterfly-Patterned Partial Sums to Optimize GPU Memory Accesses for Drawing from Discrete Distributions

We describe a technique for drawing values from discrete distributions, such as sampling from the random variables of a mixture model, that avoids computing a complete table of partial sums of the relative probabilities. A table of alternate ("butterfly-patterned") form is faster to compute, making better use of coalesced memory accesses. From this table, complete […]

CUDA

May, 15

Speeding up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves

Deep neural networks (DNNs) show very strong performance on many machine learning problems, but they are very sensitive to the setting of their hyperparameters. Automated hyperparameter optimization methods have recently been shown to yield settings competitive with those found by human experts, but their widespread adoption is hampered by the fact that they require more […]

CUDA

May, 15

MRCUDA: MapReduce Acceleration Framework Based on GPU

GPU programming model for general purpose computing is complex and difficult to be maintained. A MapReduce acceleration framework named MRCUDA is designed and implemented in this paper. There are four loosely coupled stages in MRCUDA, including Pre-Processing, Map, Group and Reduce, which can support flexible configurations for different applications. In order to take full advantage […]

CUDA

May, 15

The 3D Flow Field Around an Embedded Planet

Understanding the 3D flow topology around a planet embedded in its natal disk is crucial to the study of planet formation. 3D modifications to the well-studied 2D flow topology have the potential to resolve longstanding problems in both planet migration and accretion. We present a detailed analysis of the 3D isothermal flow field around a […]

CUDA

May, 15

Adaptive discrete cosine transform-based image compression method on a heterogeneous system platform using Open Computing Language

Discrete cosine transform (DCT) is one of the major operations in image compression standards and it requires intensive and complex computations. Recent computer systems and handheld devices are equipped with high computing capability devices such as a general-purpose graphics processing unit (GPGPU) in addition to the traditional multicores CPU. We develop an optimized parallel implementation […]

OpenCL

May, 15

Power, Energy and Speed of Embedded and Server Multi-Cores applied to Distributed Simulation of Spiking Neural Networks: ARM in NVIDIA Tegra vs Intel Xeon quad-cores

This short note regards a comparison of instantaneous power, total energy consumption, execution time and energetic cost per synaptic event of a spiking neural network simulator (DPSNN-STDP) distributed on MPI processes when executed either on an embedded platform (based on a dual socket quad-core ARM platform) or a server platform (INTEL-based quad-core dual socket platform). […]

May, 15

5th International Conference on Computer and Communication Devices (ICCCD), 2015

Publication: All accepted papers will be published in one of the indexed Journals after being selected. * International Journal of Future Computer and Communication (IJFCC, ISSN: 2010-3751) Abstracting/ Indexing: Google Scholar, Engineering & Technology Digital Library, and Crossref, DOAJ, Electronic Journals Library, EI (INSPEC, IET). * International Journal of Computer and Communication Engineering (IJCCE, ISSN: […]

May, 15

5th International Conference on Robotics and Automation Sciences (ICRAS, former ICSIA), 2015

Publication: All the papers of ICRAS 2015 will be indexed by Ei Compendex and ISI. Topics: AREA 1: Intelligent Control Systems and Optimization • Genetic Algorithms • Fuzzy Control • Decision Support Systems • Machine Learning in Control Applications • Knowledge-based Systems Applications • Hybrid Learning Systems • Distributed Control Systems • Evolutionary Computation […]

May, 15

6th International Conference on Networking and Information Technology (ICNIT), 2015

Topics: Antennas & Propagation Bioinformatics and Scientific Computing Broadband & Intelligent networks Business Information Systems Communication Systems and Networks Complex Systems: Modeling and Simulation Computational Intelligence Applications Computer Vision & Pattern Recognition Data Base Management Data Mining and Data Fusion Data Warehousing, Ontologies and Databases Distributed Sensor Networks E-Commerce & E-government E-Health & Biomedical Applications […]

May, 14

OpenMPCon 2015 – Developer Conference

OpenMPCon is the annual, face-to-face developer gathering organized by the OpenMP community, for the community. Enjoy keynotes, inspirational talks, and a friendly atmosphere that helps attendees meet interesting people, learn more about OpenMP from each other, and have a stimulating experience. Multiple diverse technical tracks are being formulated that will appeal to anyone: from the […]

May, 13

A Survey of CPU-GPU Heterogeneous Computing Techniques

As both CPU and GPU become employed in a wide range of applications, it has been acknowledged that both of these processing units (PUs) have their unique features and strengths and hence, CPU-GPU collaboration is inevitable to achieve high-performance computing. This has motivated significant amount of research on heterogeneous computing techniques, along with the design […]

CUDA

•

OpenCL

May, 13

CUDA 7 Performance Overview webinar

CUDA 7 Toolkit has lots of new features – and also many performance enhancements. Ujval Kapasi is Director, CUDA Product Management at NVIDIA. Ujval received his Ph.D. in Electrical Engineering from Stanford University and his Bachelor of Science in Engineering from Brown University. Download slides (PDF) View slides (PDF) via Google Docs

* * *

high performance computing on graphics processing units: hgpu.org

Posts

Using Butterfly-Patterned Partial Sums to Optimize GPU Memory Accesses for Drawing from Discrete Distributions

Speeding up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves

MRCUDA: MapReduce Acceleration Framework Based on GPU

The 3D Flow Field Around an Embedded Planet

Adaptive discrete cosine transform-based image compression method on a heterogeneous system platform using Open Computing Language

Power, Energy and Speed of Embedded and Server Multi-Cores applied to Distributed Simulation of Spiking Neural Networks: ARM in NVIDIA Tegra vs Intel Xeon quad-cores

5th International Conference on Computer and Communication Devices (ICCCD), 2015

5th International Conference on Robotics and Automation Sciences (ICRAS, former ICSIA), 2015

6th International Conference on Networking and Information Technology (ICNIT), 2015

OpenMPCon 2015 – Developer Conference

A Survey of CPU-GPU Heterogeneous Computing Techniques

CUDA 7 Performance Overview webinar

Recent source codes

SYCL Container

CASS: Cuda-Amd aSSembly

Cluser of smartphones for edge computing application using TensorFlow

CFAL-bench

Efficient Graph Embedding at Scale: Optimizing CPU-GPU-SSD Integration

Can Large Language Models Predict Parallel Code Performance?

PELSI: Power-Efficient Layer-Switched Inference

Ouroboros: Virtualized Queues for dynamic memory management

MSCCL++: A GPU-driven communication stack for scalable AI applications

Benchmark compute shader of Unity against InteropUnityCUDA

Most viewed papers (last 30 days)