Posts
Dec, 5
CUBPT: Lock-free bulk insertions to B+ tree on GPU architecture
B+-tree is one of the most widely-used index structures. To improve insertion process, several batch algorithms are proposed, which all use one thread to complete one node insertion and cannot make full use of GPU’s parallel throughput. So, a batch building and insertion method on GPU named CUBPT is proposed in this paper. During the […]
Dec, 5
Coulomb and Landau Gauge Fixing in GPUs using CUDA and MILC
In this work, we present the GPU implementation of the overrelaxation and steepest descent method with Fourier acceleration methods for Laudau and Coulomb gauge fixing using CUDA for SU(N) with N>2. A multi-GPU implementation of the overrelaxation method is also presented using MPI and CUDA. The GPU performance was measured on BlueWaters and compared against […]
Dec, 5
Software Polarization Spectrometer "PolariS"
We have developed a software-based polarization spectrometer, PolariS, to acquire full-Stokes spectra with a very high spectral resolution of 61 Hz. The primary aim of PolariS is to measure the magnetic fields in dense star-forming cores by detecting the Zeeman splitting of molecular emission lines. The spectrometer consists of a commercially available digital sampler and […]
Dec, 5
Heterogeneous High Throughput Scientific Computing with APM X-Gene and Intel Xeon Phi
Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. […]
Dec, 5
IPMACC: Open Source OpenACC to CUDA/OpenCL Translator
In this paper we introduce IPMACC, a framework for translating OpenACC applications to CUDA or OpenCL. IPMACC is composed of set of translators translating OpenACC for C applications to CUDA or OpenCL. The framework uses the system compiler (e.g. nvcc) for generating final accelerator’s binary. The framework can be used for extending the OpenACC API, […]
Dec, 3
OpenCL Based High-Quality HEVC Motion Estimation on GPU
This paper presents a high quality H.265/HEVC motion estimation implementation with the cooperation of CPU and GPU. The data dependency from MVP (Motion Vector Predictor) restricts the degree of parallelism on GPU. To overcome the constraint from MVP, we propose to use an estimated MVP on GPU and the accurate MVP to refine the motion […]
Dec, 3
Implementation of k-Means Clustering Algorithm in CUDA
Big Data poses a very great computational challenge for programmers as well as machines as a lot of number crunching is to be done.Due to recent development in the shared memory inexpensive architecture like Graphics Processing Units (GPU), an alternative has emerged. In this paper, we target at decreasing runtime for k-Means, which is one […]
Dec, 3
Numerical cosmology on the GPU with Enzo and Ramses
A number of scientific numerical codes can currently exploit GPUs with remarkable performance. In astrophysics, Enzo and Ramses are prime examples of such applications. The two codes have been ported to GPUs adopting different strategies and programming models, Enzo adopting CUDA and Ramses using OpenACC. We describe here the different solutions used for the GPU […]
Dec, 3
24.77 Pflops on a Gravitational Tree-Code to Simulate the Milky Way Galaxy with 18600 GPUs
We have simulated, for the first time, the long term evolution of the Milky Way Galaxy using 51 billion particles on the Swiss Piz Daint supercomputer with our $N$-body gravitational tree-code Bonsai. Herein, we describe the scientific motivation and numerical algorithms. The Milky Way model was simulated for 6 billion years, during which the bar […]
Dec, 3
Parallelization of a novel frequent itemset hiding algorithm on a CPU-GPU platform
Data mining is used to extract useful information from large data. But the organizations which mine the data might not be the owner of the data. So, before the owners can make their data accessible for data mining they want to make sure that no sensitive information can be mined from the released data whose […]
Dec, 2
Real-Time Hair Rendering
An approach is represented to render hair in real-time by using a small number of guide strands to generate interpolated hairs on the graphics processing unit (GPU). Hair interpolation methods are based on a single guide strand or on multiple guide strands. Each hair strand is composed by segments, which can be further subdivided to […]
Dec, 2
SiftCU: An Accelerated Cuda Based Implementation of SIFT
Scale Invariant Feature Transform (SIFT) is a popular image feature extraction algorithm. SIFT’s features are invariant to many image related variables including scale and change in viewpoint. Despite its broad capabilities, it is computationally expensive. This characteristic makes it hard for researchers to use SIFT in their works especially in real time application. This is […]