5314

Posts

Aug, 21

Reducing data access latency in SDSM systems using runtime optimizations

Software Distributed Shared Memory (SDSM) systems offer a convenient way to run applications developed for shared memory systems on distributed systems with no changes to them. However, since SDSM systems add an extra layer of abstraction to the memory hierarchy, applications may suffer performance problems when running on top of them. Our main research interest […]
Aug, 21

A new method for GPU based irregular reductions and its application to k-means clustering

A frequently used method of clustering is a technique called k-means clustering. The k-means algorithm consists of two steps: A map step, which is simple to execute on a GPU, and a reduce step, which is more problematic. Previous researchers have used a hybrid approach in which the map step is computed on the GPU […]
Aug, 21

Multi- and many-core data mining with adaptive sparse grids

Gaining knowledge out of vast datasets is a main challenge in data-driven applications nowadays. Sparse grids provide a numerical method for both classification and regression in data mining which scales only linearly in the number of data points and is thus well-suited for huge amounts of data. Due to the recursive nature of sparse grid […]
Aug, 21

Sponge: portable stream programming on graphics engines

Graphics processing units (GPUs) provide a low cost platform for accelerating high performance computations. The introduction of new programming languages, such as CUDA and OpenCL, makes GPU programming attractive to a wide variety of programmers. However, programming GPUs is still a cumbersome task for two primary reasons: tedious performance optimizations and lack of portability. First, […]
Aug, 21

Breaking the GPU programming barrier with the auto-parallelising SAC compiler

Over recent years, the use of Graphics Processing Units (GPUs) for general-purpose computing has become increasingly popular. The main reasons for this development are the attractive performance/price and performance/power ratios of these architectures. However, substantial performance gains from GPUs come at a price: they require extensive programming expertise and, typically, a substantial re-coding effort. Although […]
Aug, 21

The openip open source image processing library

The openIP open source image processing library is a set of c++ libraries providing tools for education, research and industrial purposes. The aim of the development is to fill in the gap between the academic and commercial utilization of image processing. The openIP libraries are interoperable, open source and easy to install. To provide fast […]
Aug, 20

GRace: a low-overhead mechanism for detecting data races in GPU programs

In recent years, GPUs have emerged as an extremely cost-effective means for achieving high performance. Many application developers, including those with no prior parallel programming experience, are now trying to scale their applications using GPUs. While languages like CUDA and OpenCL have eased GPU programming for non-graphical applications, they are still explicitly parallel languages. All […]
Aug, 20

Parallel 3D multigrid methods on the STI cell BE architecture

The STI Cell Broadband Engine (BE) is a highly capable heterogeneous multicore processor with large bandwidth and computing power perfectly suited for numerical simulation. However, all performance benefits come at the price of productivity since more responsibility is put to the programmer. In particular, programming with the IBM Cell SDK is hampered by not only […]
Aug, 19

A framework for dynamically instrumenting GPU compute applications within GPU Ocelot

In this paper we present the design and implementation of a dynamic instrumentation infrastructure for PTX programs that procedurally transforms kernels and manages related data structures. We show how performing instrumentation within the GPU Ocelot dynamic compiler infrastructure provides unique capabilities not available to other profiling and instrumentation toolchains for GPU computing. We demonstrate the […]
Aug, 19

A balanced programming model for emerging heterogeneous multicore systems

Computer systems are moving towards a heterogeneous architecture with a combination of one or more CPUs and one or more accelerator processors. Such heterogeneous systems pose a new challenge to the parallel programming community. Languages such as OpenCL and CUDA provide a program environment for such systems. However, they focus on data parallel programming where […]
Aug, 19

Extending abstract GPU APIs to shared memory

Parallel programming is used extensively for general-purpose computations. However, performance of parallel APIs varies for a given problem and a given architecture. This gives rise to the need for having an abstract way to express the parallel problems. This poster presents a new approach through which programmers can access these APIs without having to focus […]
Aug, 19

A framework for lab-based real-time video analysis on distributed camera networks

In the field of video analytics for surveillance, the trend towards the use of multi-camera and high definition video is increasing. This poses significant technical challenges in terms of video transmission and real-time processing for surveillance analytics, such as people recognition and tracking. Currently, available solutions are typically proprietary commercial systems which are costly to […]

Recent source codes

* * *

* * *

HGPU group © 2010-2026 hgpu.org

All rights belong to the respective authors

Contact us: