5974

Posts

Oct, 14

OptiML: An implicitly parallel domain-specific language for machine learning

As the size of datasets continues to grow, machine learning applications are becoming increasingly limited by the amount of available computational power. Taking advantage of modern hardware requires using multiple parallel programming models targeted at different devices (e.g. CPUs and GPUs). However, programming these devices to run efficiently and correctly is difficult, error-prone, and results […]
Oct, 14

Liszt: A Domain Specific Language for Building Portable Mesh-based PDE Solvers

Heterogeneous computers with processors and accelerators are becoming widespread in scientific computing. However, it is difficult to program hybrid architectures and there is no commonly accepted programming model. Ideally, applications should be written in a way that is portable to many platforms, but providing this portability for general programs is a hard problem. By restricting […]
Oct, 14

GPU Computing Gems: Jade Edition

This is the second volume of Morgan Kaufmann’s GPU Computing Gems, offering an all-new set of insights, ideas, and practical ";hands-on"; skills from researchers and developers worldwide. Each chapter gives you a window into the work being performed across a variety of application domains, and the opportunity to witness the impact of parallel GPU computing […]
Oct, 14

Towards scalar synchronization in SIMT architectures

An important class of compute accelerators are graphics processing units (GPUs). Popular programming models for non-graphics computation on GPUs, such as CUDA and OpenCL, provide an abstraction of many parallel scalar threads. Contemporary GPU hardware groups 32 to 64 scalar threads as a single warp or wavefront and executes this group of scalar threads in […]
Oct, 14

A Heterogeneous Parallel Framework for Domain-Specific Languages

Computing systems are becoming increasingly parallel and heterogeneous, and therefore new applications must be capable of exploiting parallelism in order to continue achieving high performance. However, targeting these emerging devices often requires using multiple disparate programming models and making decisions that can limit forward scalability. In previous work we proposed the use of domain-specific languages […]
Oct, 14

Fast Multipole Method vs. Spectral Method for the Simulation of Isotropic Turbulence on GPUs

This paper presents calculations of homogeneous isotropic turbulence at Re_{lambda} = 100 using both a pseudo-spectral method and a fast multipole vortex method on a 256^3 grid. For the vortex method, both algorithmic and hardware acceleration are applied using a highly parallel fast multipole method (FMM) on GPUs. The spectral methods uses the FFTW library […]
Oct, 13

Benchmarking Across Platforms: European Option Pricing

Using a popular Monte Carlo workload which implements European option pricing, we tested a variety of architectures including NVIDIA and AMD GPUs, ClearSpeed accelerator and multi-core processors and different programming approaches. We conclude that this particular workload seems most suitable for running on GPU type of architecture compared to other alternatives such as CPU or […]
Oct, 13

Firepile: Run-time Compilation for GPUs in Scala

Recent advances have enabled GPUs to be used as general-purpose parallel processors on commodity hardware for little cost. However, the ability to program these devices has not kept up with their performance. The programming model for GPUs has a number of restrictions that make it dif?cult to program. For example, software running on the GPU […]
Oct, 13

A rendering method for simulated emission nebulae

Emission nebulae are some of the most beautiful stellar phenomena. The newly formed hot stars inside the nebulae ionize the surrounding gas making it glow in variety of colors. The focus of this work is to find a method for interactive rendering of simulated emission nebulae. A rendering program has been developed to render and […]
Oct, 13

Introduction to GPU Radix Sort

Radix sort is one of the fastest sorting algorithms. It is fast especially for a large problem size. Radix sort is not a comparison sort but a counting sort. When we sort n bit keys, 2^n counters are prepared for each number.
Oct, 13

Input Sensitivity of GPU Program Optimizations

Graphic Processing Units (GPU) have become increasingly adopted for the enhancement of computing throughput. However, the development of a high-quality GPU application is challenging, due to the large optimization space and complex unpredictable effects of optimizations on GPU program performance. Many recent efforts have been employing empirical search-based auto-tuners to tackle the problem, but few […]
Oct, 13

gpustats: GPU Library for Statistical Computing in Python

In this talk we will discuss gpustats, a new Python library for assisting in "big data" statistical computing applications, particularly Monte Carlobased inference algorithms. The library provides a general code generation / metaprogramming framework for easily implementing discrete and continuous probability density functions and random variable samplers. These functions can be utilized to achieve more […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: