Posts
May, 10
Age and Gender Classification using Convolutional Neural Networks
Automatic age and gender classification has become relevant to an increasing amount of applications, particularly since the rise of social platforms and social media. Nevertheless, performance of existing methods on real-world images is still significantly lacking, especially when compared to the tremendous leaps in performance recently reported for the related task of face recognition. In […]
May, 10
Numerical Simulation of Melting with Natural Convection Based on Lattice Boltzmann Method and Performed with CUDA Enabled GPU
A new solver is developed to numerically simulate the melting phase change with natural convection. This solver was implemented on a single Nvidia GPU based on the CUDA technology in order to simulate the melting phase change in a 2D rectangular enclosure. The Rayleigh number is of the order of magnitude of 108 and Prandlt […]
May, 10
Tracking Many Solution Paths of a Polynomial Homotopy on a Graphics Processing Unit
Polynomial systems occur in many areas of science and engineering. Unlike general nonlinear systems, the algebraic structure enables to compute all solutions of a polynomial system. We describe our massive parallel predictor-corrector algorithms to track many solution paths of a polynomial homotopy. The data parallelism that provides the speedups stems from the evaluation and differentiation […]
May, 10
GPU-accelerated micromagnetic simulations using cloud computing
Highly-parallel graphics processing units (GPUs) can improve the speed of micromagnetic simulations significantly as compared to conventional computing using central processing units (CPUs). We present a strategy for performing GPU-accelerated micromagnetic simulations by utilizing cost-effective GPU access offered by cloud computing services with an open-source Python-based program for running the MuMax3 micromagnetics code remotely. We […]
May, 10
GPU Ray-Traced Collision Detection: Fine Pipeline Reorganization
Ray-tracing algorithms can be used to render a virtual scene and to detect collisions between objects. Numerous ray-tracing algorithms have been proposed which use data structures optimized for specific cases (rigid objects, deformable objects, etc.). Some solutions try to optimize performance by combining several algorithms to use the most efficient algorithm for each ray. This […]
May, 7
SparkCL: A Unified Programming Framework for Accelerators on Heterogeneous Clusters
We introduce SparkCL, an open source unified programming framework based on Java, OpenCL and the Apache Spark framework. The motivation behind this work is to bring unconventional compute cores such as FPGAs/GPUs/APUs/DSPs and future core types into mainstream programming use. The framework allows equal treatment of different computing devices under the Spark framework and introduces […]
May, 7
Supporting input dependent access pattern algorithms on GPUs using GPUfs
Accelerating processing of very large datasets on GPUs is challenging, in particular when algorithms exhibit unpredictable data access patterns. In this paper we investigate the utility of GPUfs, a library that provides direct access to files from GPU programs, to implement such algorithms. We analyze the system’s bottlenecks, and suggest several modification to the GPUfs […]
May, 7
Activity recognition from videos with parallel hypergraph matching on GPUs
In this paper, we propose a method for activity recognition from videos based on sparse local features and hypergraph matching. We benefit from special properties of the temporal domain in the data to derive a sequential and fast graph matching algorithm for GPUs. Traditionally, graphs and hypergraphs are frequently used to recognize complex and often […]
May, 7
AccFFT: A library for distributed-memory 3-D FFT on CPU and GPU architectures
We present a new library for scalable 3-D Fast Fourier Transforms (FFT). Despite the large amount of work on 3-D FFTs, we show that significant speedups can be achieved for large problem sizes and core counts. The importance of FFT in science and engineering and the advances in high performance computing necessitate further improvements in […]
May, 7
Fireflies: New software for interactively exploring dynamical systems using GPU computing
In non-linear systems, where explicit analytic solutions usually can’t be found, visualisation is a powerful approach which can give insights into the dynamical behaviour of models; it is also crucial for teaching this area of mathematics. In this paper we present new software, Fireflies, which exploits the power of graphical processing unit (GPU) computing to […]
May, 5
OMP2HMPP: Compiler Framework for Energy-Performance Trade-off Analysis of Automatically Generated Codes
We present OMP2HMPP, a tool that, in a first step, automatically translates OpenMP code into various possible transformations of HMPP. In a second step OMP2HMPP executes all variants to obtain the performance and power consumption of each transformation. The resulting trade-off can be used to choose the more convenient version. After running the tool on […]
May, 5
Coherent Photon Mapping on the Intel MIC Architecture
Photon mapping is a global illumination algorithm which is composed of two steps: photon tracing and photon searching. During photon searching step, each shading point needs to search the photon-tree to find k-neighbouring photons for reflected radiance estimation. As the number of shading points and the size of photon-tree are dramatically large, the photon searching […]