Posts
May, 10
GPU Ray-Traced Collision Detection: Fine Pipeline Reorganization
Ray-tracing algorithms can be used to render a virtual scene and to detect collisions between objects. Numerous ray-tracing algorithms have been proposed which use data structures optimized for specific cases (rigid objects, deformable objects, etc.). Some solutions try to optimize performance by combining several algorithms to use the most efficient algorithm for each ray. This […]
May, 7
SparkCL: A Unified Programming Framework for Accelerators on Heterogeneous Clusters
We introduce SparkCL, an open source unified programming framework based on Java, OpenCL and the Apache Spark framework. The motivation behind this work is to bring unconventional compute cores such as FPGAs/GPUs/APUs/DSPs and future core types into mainstream programming use. The framework allows equal treatment of different computing devices under the Spark framework and introduces […]
May, 7
Supporting input dependent access pattern algorithms on GPUs using GPUfs
Accelerating processing of very large datasets on GPUs is challenging, in particular when algorithms exhibit unpredictable data access patterns. In this paper we investigate the utility of GPUfs, a library that provides direct access to files from GPU programs, to implement such algorithms. We analyze the system’s bottlenecks, and suggest several modification to the GPUfs […]
May, 7
Activity recognition from videos with parallel hypergraph matching on GPUs
In this paper, we propose a method for activity recognition from videos based on sparse local features and hypergraph matching. We benefit from special properties of the temporal domain in the data to derive a sequential and fast graph matching algorithm for GPUs. Traditionally, graphs and hypergraphs are frequently used to recognize complex and often […]
May, 7
AccFFT: A library for distributed-memory 3-D FFT on CPU and GPU architectures
We present a new library for scalable 3-D Fast Fourier Transforms (FFT). Despite the large amount of work on 3-D FFTs, we show that significant speedups can be achieved for large problem sizes and core counts. The importance of FFT in science and engineering and the advances in high performance computing necessitate further improvements in […]
May, 7
Fireflies: New software for interactively exploring dynamical systems using GPU computing
In non-linear systems, where explicit analytic solutions usually can’t be found, visualisation is a powerful approach which can give insights into the dynamical behaviour of models; it is also crucial for teaching this area of mathematics. In this paper we present new software, Fireflies, which exploits the power of graphical processing unit (GPU) computing to […]
May, 5
OMP2HMPP: Compiler Framework for Energy-Performance Trade-off Analysis of Automatically Generated Codes
We present OMP2HMPP, a tool that, in a first step, automatically translates OpenMP code into various possible transformations of HMPP. In a second step OMP2HMPP executes all variants to obtain the performance and power consumption of each transformation. The resulting trade-off can be used to choose the more convenient version. After running the tool on […]
May, 5
Coherent Photon Mapping on the Intel MIC Architecture
Photon mapping is a global illumination algorithm which is composed of two steps: photon tracing and photon searching. During photon searching step, each shading point needs to search the photon-tree to find k-neighbouring photons for reflected radiance estimation. As the number of shading points and the size of photon-tree are dramatically large, the photon searching […]
May, 5
GPU Accelerated Real-Time Collision Handling in Virtual Disassembly
Previous collision detection methods for virtual disassembly mainly detect collisions at discrete time interval, and use oriented bounding boxes to speedup the process. However, these discrete methods cannot guarantee no penetration occurs when the components moving. Meanwhile, because some of the components are embedded into each other, these components cannot be separated in the subsequent […]
May, 5
Workload Aware Algorithms for Heterogeneous Platforms
Algorithms that aim to simultaneously run on a heterogeneous collection of devices on a commodity platform have been in recent research focus. On such platforms, individual devices can have very differing architectures, clock rates, and execution models. Hence, one of the fundamental challenges in designing and implementing such algorithms is to identify load balancing mechanisms […]
May, 5
PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions
This paper proposes a novel approach to reduce the computational cost of evaluation of convolutional neural networks, a factor that has hindered their deployment in low-power devices such as mobile phones. Our method is inspired by the loop perforation technique from source code optimization and accelerates the evaluation of bottleneck convolutional layers by exploiting the […]
May, 3
IPMACC: Translating OpenACC API to OpenCL
In this paper, we introduce IPMACC a framework for executing OpenACC for C applications over OpenCL runtime. We use over framework to compare performance of OpenACC and OpenCL. OpenACC API abstractions remove the low-level control from programmers’ hand. To understand the low-level OpenCL optimizations that are not applicable in OpenACC, we compare highly-optimized OpenCL and […]