SMARAD is the Smart and Novel Radios research unit at Aalto University in Helsinki. In the context of their smart radio research the area of influence of existing television transmitters is important data for the placement of experimental transmitters. Currently these areas are calculated with a regular Voronoi tessellation ignoring variation in transmitter characteristics. This […]

October 31, 2014 by hgpu

Dataflow models are widely used for expressing the functionality of digital signal processing (DSP) applications due to their useful features, such as providing formal mechanisms for description of application functionality, imposing minimal data-dependency constraints in specifications, and exposing task and data level parallelism effectively. Due to the increased complexity of dynamics in modern DSP applications, […]

October 31, 2014 by hgpu

In the present article we describe the implementation of the finite element numerical integration algorithm for the Xeon Phi coprocessor. The coprocessor is an extension of the idea of the many-core specialized unit for calculations and, by assumption, its performance has to be competitive with the current families of GPUs. Its main advantage is the […]

October 31, 2014 by hgpu

This paper presents a novel extended dynamic programming approach for energy minimization (EDP) to solve the correspondence problem for stereo and motion. A significant speedup is achieved using a recursive minimum search strategy (RMS). The mentioned speedup is particularly important if the disparity space is 2D as well as 3D. The proposed RMS can also […]

October 31, 2014 by hgpu

Field Programmable Gate Arrays (FPGAs) are an ideal platform for building systems with custom hardware accelerators, however managing these systems is still a major challenge. The OpenCL standard has become accepted as a good programming model for managing heterogeneous platforms due to its rich constructs. Although commercial OpenCL frameworks are now emerging, there is a […]

October 31, 2014 by hgpu

This paper presents an implementation of different matrix-matrix multiplication routines in OpenCL. We utilize the high-performance GEMM (GEneral Matrix-Matrix Multiply) implementation from our previous work for the present implementation of other matrix-matrix multiply routines in Level-3 BLAS (Basic Linear Algebra Subprograms). The other routines include SYMM (Symmetric Matrix-Matrix Multiply), SYRK (Symmetric Rank-K Update), SYR2K (Symmetric […]

October 29, 2014 by hgpu

Solving linear inverse problems where the solution is known to be sparse is of interest to both signal processing and machine learning research. The standard algorithms for solving such problems are sequential in nature – they tend to be slow for large scale problems. In the past, researchers have used Graphics Processing Units to accelerate […]

October 29, 2014 by hgpu

The particle-mesh spreading operation maps a value at an arbitrary particle position to contributions at regular positions on a mesh. This operation is often used when a calculation involving irregular positions is to be performed in Fourier space. We study several approaches for particle mesh spreading on GPUs. A central concern is the use of […]

October 29, 2014 by hgpu

Today, heterogeneous computing has truly reshaped the way scientists think and approach high-performance computing (HPC). Hardware accelerators such as general-purpose graphics processing units (GPUs) and Intel Many Integrated Core (MIC) architecture continue to make in-roads in accelerating large-scale scientific applications. These advancements, however, introduce new sets of challenges to the scientific community such as: selection […]

October 29, 2014 by hgpu

We describe the neural-network training framework used in the Kaldi speech recognition toolkit, which is geared towards training DNNs with large amounts of training data using multiple GPU-equipped or multi-core machines. In order to be as hardware-agnostic as possible, we needed a way to use multiple machines without generating excessive network traffic. Our method is […]

October 29, 2014 by hgpu

Graphics Processing Units (GPUs) are highly parallel shared memory microprocessors, and as such, they are prone to the same concurrency considerations as their traditional multicore CPU counterparts. In this thesis, we consider shared memory consistency, i.e. what values can be read when issued concurrently with writes on current GPU hardware. While memory consistency has been […]

October 27, 2014 by hgpu

Capabilities of using Graphic Processing Units (GPU) as a computational tool in CFD have been investigated here. Several solvers for solving linear matrix equations have been benchmarked on GPU and is shown that Gauss-Seidle gives the best performance for the GPU architecture. Compared to CPU on a case of lid-driven cavity flow, speedups of up […]

October 27, 2014 by hgpu