Posts
Nov, 1
Architecture Comparisons between Nvidia and ATI GPUs: Computation Parallelism and Data Communications
In recent years, modern graphics processing units have been widely adopted in high performance computing areas to solve large scale computation problems. The leading GPU manufacturers Nvidia and ATI have introduced series of products to the market. While sharing many similar design concepts, GPUs from these two manufacturers differ in several aspects on processor cores […]
Nov, 1
Extremely large scale simulation of a Kardar-Parisi-Zhang model using graphics cards
The octahedron model introduced recently has been implemented onto graphics cards, which permits extremely large scale simulations via binary lattice gases and bit coded algorithms. We confirm scaling behavior belonging to the 2d Kardar-Parisi-Zhang universality class and find a surface growth exponent: beta=0.2415(15) on 2^17 x 2^17 systems, ruling out beta=1/4 suggested by field theory. […]
Oct, 31
Parallel Smoothers for Matrix-based Multigrid Methods on Unstructured Meshes Using Multicore CPUs and GPUs
Multigrid methods are efficient and fast solvers for problems typically modeled by partial differential equations of elliptic type. For problems with complex geometries and local singularities stencil-type discrete operators on equidistant Cartesian grids need to be replaced by more flexible concepts for unstructured meshes in order to properly resolve all problem-inherent specifics and for maintaining […]
Oct, 31
Optimal Control of the Process Systems Using Graphic Processing Unit
In this paper the Graphic Processing Unit (GPU) is applied in order to improve the computational performance of process systems optimal control calculations. To apply GPU massive parallel architecture, a simplified version of interior point optimisation algorithm was selected and modified to fulfil special hardware requirements of GPU architecture. In this algorithm, a damped nonlinear […]
Oct, 31
SIMD Re-Convergence At Thread Frontiers
Hardware and compiler techniques for mapping data-parallel programs with divergent control flow to SIMD architectures have recently enabled the emergence of new GPGPU programming models such as CUDA, OpenCL, and DirectX Compute. The impact of branch divergence can be quite different depending upon whether the program’s control flow is structured or unstructured. In this paper, […]
Oct, 31
Rapid Performance of a Generalized Distance Calculation
The ever-increasing size of data sets and the need for real-time processing drives the need for high speed analysis. Since traditional CPUs are designed to execute a small number of sequential process, they are ill-suited to keep pace with this growth and exploit the massive parallelism inherent in these problem spaces. In the last several […]
Oct, 31
Low Latency Complex Event Processing on Parallel Hardware
Several application domains involve observing events, processing them, and reacting. This asks for a Complex Event Processing (CEP) engine in charge of interpreting, filtering, and combining primitive events that occur in the external environment, to identify higher level composite events, according to a set of rules written in an ad-hoc rule definition language. A key […]
Oct, 31
Fast Speaker Diarization Using a High-Level Scripting Language
Most current speaker diarization systems use agglomerative clustering of Gaussian Mixture Models (GMMs) to determine "who spoke when" in an audio recording. While stateof-the-art in accuracy, this method is computationally costly, mostly due to the GMM training, and thus limits the performance of current approaches to be roughly real-time. Increased sizes of current datasets require […]
Oct, 31
Workload Balancing on Heterogeneous Systems: A Case Study of Sparse Grid Interpolation
Multi-core parallelism and accelerators are becoming common features of today’s computer systems, as they allow for computational power without sacrificing energy efficiency. Due to heterogeneity, tuning for each type of compute unit and adequate load balancing is essential. This paper proposes static and dynamic solutions for load balancing in the context of an application for […]
Oct, 31
Environment Segmentation in Service Robotics
In the field of robotics a common problem is attempting to understand the world or environment in which the robot is operating. This is a common issue, as robots do not have an "intuitive" sense about its environment. Environment segmentation is a technique that is used to allow for the isolation of different parts of […]
Oct, 31
High-performance software rasterization on GPUs
In this paper, we implement an efficient, completely software-based graphics pipeline on a GPU. Unlike previous approaches, we obey ordering constraints imposed by current graphics APIs, guarantee hole-free rasterization, and support multisample antialiasing. Our goal is to examine the performance implications of not exploiting the fixed-function graphics pipeline, and to discern which additional hardware support […]
Oct, 31
Parallel implematation of flow and matching algorithms
In our work we present two parallel algorithms and their lock-free implementations using a popular GPU environment Nvidia CUDA. The first algorithm is the push-relabel method for the flow problem in grid graphs. The second is the cost scaling algorithm for the assignment problem in complete bipartite graphs.