5678

Posts

Sep, 16

NIMBLE: a toolkit for the implementation of parallel data mining and machine learning algorithms on mapreduce

In the last decade, advances in data collection and storage technologies have led to an increased interest in designing and implementing large-scale parallel algorithms for machine learning and data mining (ML-DM). Existing programming paradigms for expressing large-scale parallelism such as MapReduce (MR) and the Message Passing Interface (MPI) have been the de facto choices for […]
Sep, 16

Programmable and Scalable Architecture for Graphics Processing Units

Graphics processing is an application area with high level of parallelism at the data level and at the task level. Therefore, graphics processing units (GPU) are often implemented as multiprocessing systems with high performance floating point processing and application specific hardware stages for maximizing the graphics throughput. In this paper we evaluate the suitability of […]
Sep, 16

Searching for Concurrent Design Patterns in Video Games

The transition to multicore architectures has dramatically underscored the necessity for parallelism in software. In particular, while new gaming consoles are by and large multicore, most existing video game engines are essentially sequential and thus cannot easily take advantage of this hardware. In this paper we describe techniques derived from our experience parallelizing an open-source […]
Sep, 16

A Light-Weight Approach to Dynamical Runtime Linking Supporting Heterogenous, Parallel, and Reconfigurable Architectures

When targeting hardware accelerators and reconfigurable processing units, the question of programmability arises, i.e. how different implementations of individual, configuration-specific functions are provided. Conventionally, this is resolved either at compilation time with a specific hardware environment being targeted, by initialization routines at program start, or decision trees at run-time. Such technique are, however, hardly applicable […]
Sep, 16

Optimizations and Performance of a Robotics Grasping Algorithm Described in Geometric Algebra

The usage of Conformal Geometric Algebra leads to algorithms that can be formulated in a very clear and easy to grasp way. But it can also increase the performance of an implementation because of its capabilities to be computed in parallel. In this paper we show how a grasping algorithm for a robotic arm is […]
Sep, 16

Parallel Medical Image Reconstruction: From Graphics Processors to Grids

We present a variety of possible parallelization approaches for a real-world case study using several modern parallel and distributed computer architectures. Our case study is a production-quality, time-intensive algorithm for medical image reconstruction used in computer tomography. We describe how this algorithm can be parallelized for the main kinds of contemporary parallel architectures: shared-memory multiprocessors, […]
Sep, 16

A Generic Approach to Topic Models

This article contributes a generic model of topic models. To define the problem space, general characteristics for this class of models are derived, which give rise to a representation of topic models as "mixture networks", a domain-specific compact alternative to Bayesian networks. Besides illustrating the interconnection of mixtures in topic models, the benefit of this […]
Sep, 16

Implicit and dynamic trees for high performance rendering

Recent advances in GPU architecture and programmability have enabled the computation of ray casted or ray traced images at interactive frame rates. However, the rapid performance gains of the hardware cannot by themselves address the challenge posed by the steady growth in the geometric and temporal complexity of computer graphics datasets. In this paper we […]
Sep, 16

Fast Monte Carlo Simulation for Patient-specific CT/CBCT Imaging Dose Calculation

Recently, X-ray imaging dose from computed tomography (CT) or cone beam CT (CBCT) scans has become a serious concern. Patient-specific imaging dose calculation has been proposed for the purpose of dose management. While Monte Carlo (MC) dose calculation can be quite accurate for this purpose, it suffers from low computational efficiency. In response to this […]
Sep, 15

Analytical motion blur rasterization with compression

We present a rasterizer, based on time-dependent edge equations, that computes analytical visibility in order to render accurate motion blur. The theory for doing the computations in a rasterization framework is derived in detail, and then implemented. To keep the frame buffer requirements low, we also present a new oracle-based compression algorithm for the time […]
Sep, 15

Processing data streams with hard real-time constraints on heterogeneous systems

Data stream processing applications such as stock exchange data analysis, VoIP streaming, and sensor data processing pose two conflicting challenges: short per-stream latency — to satisfy the milliseconds-long, hard real-time constraints of each stream, and high throughput — to enable efficient processing of as many streams as possible. High-throughput programmable accelerators such as modern GPUs […]
Sep, 15

Strategies for preparing computer science students for the multicore world

Multicore computers have become standard, and the number of cores per computer is rising rapidly. How does the new demand for understanding of parallel computing impact computer science education? In this paper, we examine several aspects of this question: (i) What parallelism body of knowledge do todaya’s students need to learn? (ii) How might these […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: