Posts
Jan, 28
maxDNN: An Efficient Convolution Kernel for Deep Learning with Maxwell GPUs
This paper describes maxDNN, a computationally efficient convolution kernel for deep learning with the NVIDIA Maxwell GPU. maxDNN reaches 96.3% computational efficiency on typical deep learning network architectures using a single kernel. The design combines ideas from cuda-convnet2 with the Maxas SGEMM assembly code. We only address forward propagation (FPROP) operation of the network, but […]
Jan, 28
Accelerating Polynomial Homotopy Continuation on a Graphics Processing Unit with Double Double and Quad Double Arithmetic
Numerical continuation methods apply predictor-corrector algorithms to track a solution path defined by a family of systems, the so-called homotopy. The systems we consider are defined by polynomials in several variables with complex coefficients. For larger dimensions and degrees, the numerical conditioning worsens and hardware double precision becomes often insufficient to reach the end of […]
Jan, 28
GPU Programming – Speeding Up the 3D Surface Generator VESTA
The novel "Volume-Enclosing Surface exTraction Algorithm" (VESTA) generates triangular isosurfaces from computed tomography volumetric images and/or three-dimensional (3D) simulation data. Here, we present various benchmarks for GPU-based code implementations of both VESTA and the current state-of-the-art Marching Cubes Algorithm (MCA). One major result of this study is that VESTA runs significantly faster than the MCA.
Jan, 28
On Longest Repeat Queries Using GPU
Repeat finding in strings has important applications in subfields such as computational biology. The challenge of finding the longest repeats covering particular string positions was recently proposed and solved by Ileri et al., using a total of the optimal O(n) time and space, where n is the string size. However, their solution can only find […]
Jan, 28
The GPU vs Phi Debate: Risk Analytics Using Many-Core Computing
The risk of reinsurance portfolios covering globally occurring natural catastrophes, such as earthquakes and hurricanes, is quantified by employing simulations. These simulations are computationally intensive and require large amounts of data to be processed. The use of many-core hardware accelerators, such as the Intel Xeon Phi and the NVIDIA Graphics Processing Unit (GPU), are desirable […]
Jan, 26
Adjoint Lattice Boltzmann for Topology Optimization on multi-GPU architecture
In this paper we present a topology optimization technique applicable to a broad range of flow design problems. We propose also a discrete adjoint formulation effective for a wide class of Lattice Boltzmann Methods (LBM). This adjoint formulation is used to calculate sensitivity of the LBM solution to several type of parameters, both global and […]
Jan, 26
A High Performance Framework for Coupled Urban Microclimate Models
Urban form modifies the microclimate and may trap in heat and pollutants. This causes a rise of energy demands to heat and cool building interiors. Mitigating these effects is a growing concern due to the increasing urbanization of major cities. Researchers, urban planners, and city architects rely on sophisticated simulations to investigate how to reduce […]
Jan, 26
Tangram: a High-level Language for Performance Portable Code Synthesis
We propose Tangram, a general-purpose high-level language that achieves high performance across architectures. In Tangram, a program is written by synthesizing elemental pieces of code snippets, called codelets. A codelet can have multiple semantic-preserving implementations to enable automated algorithm and implementation selection. An implementation of a codelet can be written with tunable knobs to allow […]
Jan, 26
GPU computing architecture for irregular parallelism
Many applications with regular parallelism have been shown to benefit from using Graphics Processing Units (GPUs). However, employing GPUs for applications with irregular parallelism tends to be a risky process, involving significant effort from the programmer and an uncertain amount of performance/efficiency benefit. One known challenge in developing GPU applications with irregular parallelism is the […]
Jan, 26
Performance Analysis of Join Algorithms on GPUs
Implementing database operations on parallel platforms has gain a lot of momentum in the past decade, due to the increasing popularity of many-core processors. A number of studies have shown the potential of using GPUs to speed up database operations. In this paper, we present empirical evaluations of a state-of-the-art work published in SIGMOD’08 on […]
Jan, 23
Real-time physically cloth simulation with CUDA
With the development of the simulation technique, deformable cloth simulation has become highly desired. It can be widely used in many fields such as game, animation, virtual surgery, etc. Real-time algorithm is the most urgent bottleneck problem that needs to be solved. This paper introduces a solution to implement deformable simulation of cloth in real […]
Jan, 23
Revisit Long Short-Term Memory: An Optimization Perspective
Long Short-Term Memory (LSTM) is a deep recurrent neural network architecture with high computational complexity. Contrary to the standard practice to train LSTM online with stochastic gradient descent (SGD) methods, we propose a matrix-based batch learning method for LSTM with full Backpropagation Through Time (BPTT). We further solve the state drifting issues as well as […]