Posts
Dec, 16
Affine Vector Cache for memory bandwidth savings
Preserving memory locality is a major issue in highly-multithreaded architectures such as GPUs. These architectures hide latency by maintaining a large number of threads in flight. As each thread needs to maintain a private working set, all threads collectively put tremendous pressure on on-chip memory arrays, at significant cost in area and power. We show […]
Dec, 15
Simultaneous Branch and Warp Interweaving for Sustained GPU Performance
Single-Instruction Multiple-Thread (SIMT) micro-architectures implemented in Graphics Processing Units (GPUs) run fine-grained threads in lockstep by grouping them into so-called warps to amortize the cost of instruction fetch, decode and control logic over multiple execution units. As individual threads take divergent execution paths, their processing takes place sequentially, defeating part of the efficiency advantage of […]
Dec, 15
Memory-level and Thread-level Parallelism Aware GPU Architecture Performance Analytical Model
GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Programming thousands of massively parallel threads is a big challenge for software engineers, but understanding the performance bottlenecks of those parallel programs on GPU architectures to improve application performance is even more dif?cult. Current approaches rely on programmers […]
Dec, 15
A GPU-based Approximate SVD Algorithm
Approximation of matrices using the Singular Value Decomposition (SVD) plays a central role in many science and engineering applications. However, the computation cost of an exact SVD is prohibitively high for very large matrices. In this paper, we describe a GPU-based approximate SVD algorithm for large matrices. Our method is based on the QUIC-SVD introduced […]
Dec, 15
GPU Algorithms for the Estimation of Environmental Models Based on Large Datasets
Statistical environmental models are computationally intensive due to the high dimension of the data, both in space and time, and due to the inferential techniques required for parameter estimation and spatial prediction. In particular, the complexity of these procedures is related to matrix operations (inversion, solution of linear systems, factorization) involving large matrices. Recently, much […]
Dec, 15
GPU Collision Detection in Conformal Geometric Space
We derive a conformal algebra treatment unifying all types of collisions among points, vectors, areas (defined by bivectors and trivectors) and 3D solid objects (defined by trivectors and quadvectors), based in a reformulation of collision queries from R^3 to conformal R^4,1 space. The algebraic formulation in this 5D space is then implemented in GPU to […]
Dec, 15
Performance in GPU Architectures: Potentials and Distances
GPUs can execute up to one TFLOPs at their peak performance. This peak performance, however, is rarely reached as a result of resource underutilization. Three parameters contribute to this inefficiency: branch divergence, memory access delays and limited workload parallelism. To this end we suggest machine models to estimate performance gain potentials obtainable by eliminating each […]
Dec, 15
Minimising Testing in Genetic Programming
The cost of optimisation can be reduced by evaluating candidate designs on only a fraction of all possible use cases. We show how genetic programming (GP) can avoid overfitting and evolve general solutions from fitness test suites as small as just one dynamic training case. Search effort can be greatly reduced.
Dec, 15
Free surface flow simulations on GPGPUs using the LBM
In this paper, we present the implementation of a volume-of-fluid-(VOF)-based algorithm for the simulation of free-surface flow problems on general purpose graphical processing units (GPGPUs). For the solution of the flow field and the additional advection equation for the VOF fill level, the lattice Boltzmann method on the basis of an MRT collision operator is […]
Dec, 15
Speed sign detection and recognition by convolutional neural networks
From the desire to update the maximum road speed data for navigation devices, a speed sign recognition and detection system is proposed. This system should prevent accidental speeding at roads where the map data is incorrect for example due to construction work. Multiple examples of road sign classification systems already exist but none uses a […]
Dec, 15
On the numerical solution of chaotic dynamical systems using extend precision floating point arithmetic and very high order numerical methods
Multiple results in the literature exist that indicate that all computed solutions to chaotic dynamical systems are time-step dependent. That is, solutions with small but different time steps will decouple from each other after a certain (small) finite amount of simulation time. When using double precision floating point arithmetic time step independent solutions have been […]
Dec, 14
Graph Generation on GPUs using Dynamic Memory Allocation
Complex networks are often studied using statistical measurements over many independently generated samples. Irregular data structures such as graphs that involve dynamical memory management and "pointer chasing" are an important class of application and have attracted recent interest in the form of the Graph500 benchmark formulation. The generation of simulated sample network graphs and measurement […]