Posts
Sep, 13
Neptune: An astrophysical smooth particle hydrodynamics code for massively parallel computer architectures
Smooth particle hydrodynamics is an efficient method for modeling the dynamics of fluids. It is commonly used to simulate astrophysical processes such as binary mergers. We present a newly developed GPU accelerated smooth particle hydrodynamics code for astrophysical simulations. The code is named neptune after the Roman god of water. It is written in OpenMP […]
Sep, 13
Fast computation of computer-generated hologram using Xeon Phi coprocessor
We report fast computation of computer-generated holograms (CGHs) using Xeon Phi coprocessors, which have massively x86-based processors on one chip, recently released by Intel. CGHs can generate arbitrary light wavefronts, and therefore, are promising technology for many applications: for example, three-dimensional displays, diffractive optical elements, and the generation of arbitrary beams. CGHs incur enormous computational […]
Sep, 11
Histogram Computations on GPUs Kernel using Global and Shared Memory Atomics
In this paper we implement histogram computations on a Graphics Processing Unit (GPU). Our Histogram computations is implemented using compute unified device architecture (CUDA) which is a minimal extension to C/C++. In this development Histogram computations, computed on GPU’s global memory as well as on shared memory. We also perform Histogram computations on CPU and […]
Sep, 11
Evaluating the Viability of Application-Driven Cooperative CPU/GPU Fault Detection
Trends in high performance computing are bringing increased heterogeneity among the computational resources within a single machine. The heterogeneous CPU/GPU platforms, however, exacerbate resilience problems faced by current large-scale systems. How to design efficient resilience strategies is critical for the wider adoption of heterogeneous platforms for future exascale systems. The conventional resilience strategy for GPU […]
Sep, 11
Coherent transport by adiabatic passage on atom chips
Adiabatic techniques offer some of the most promising tools to achieve high-fidelity control of the centre-of-mass degree of freedom of single atoms. As their main requirement is to follow an eigenstate of the system, constraints on timing and field strength stability are usually low, especially for trapped systems. In this paper we present a detailed […]
Sep, 11
D5.5.4 – Characterization of Redundancy and Definition of Work Reuse
This task involves the following work: – Establishing the relation of Quality of Service (QoS) and energy to accuracy. – Design and development of techniques to dynamically decrease accuracy (e.g., ignore low order bits in computations). Deliberately ignoring a few low order bits in calculations where the application allows it (in terms of impact to […]
Sep, 11
Hardware-Oblivious Parallelism for In-Memory Column-Stores
The multi-core architectures of today’s computer systems make parallelism a necessity for performance critical applications. Writing such applications in a generic, hardware-oblivious manner is a challenging problem: Current database systems thus rely on labor-intensive and error-prone manual tuning to exploit the full potential of modern parallel hardware architectures like multi-core CPUs and graphics cards. We […]
Sep, 11
Iterative and Predictive Ray-Traced Collision Detection for Multi-GPU Architectures
Collision detection is a complex task that can be described simply: given a set of objects, we want to know which ones collide. In the literature, we can found numerous algorithms that depend on objects property, but we can’t find an overall solution that works on every objects. The internship focuses on a recent algorithm […]
Sep, 11
GPU Implementations of Object Detection using HOG Features and Deformable Models
Vision-based object detection using camera sensors is an essential piece of perception for autonomous vehicles. Various combinations of features and models can be applied to increase the quality and the speed of object detection. A well-known approach uses histograms of oriented gradients (HOG) with deformable models to detect a car in an image [15]. A […]
Sep, 11
OCT on CUDA: Speeding up the image reconstruction algorithm for an Optical Coherence Tomography system using NVIDIA’s CUDA platform
GPGPU or general purpose GPU programming was dramatically changed and demystified when NVIDIA introduced the CUDA architecture for their GPUs in 2006. Since then, problems in Engineering, Economics or any fields do not need to be translated to graphics problems to be processed by the GPU. The GPU programming paradigm was dramatically shifted when CUDA […]
Sep, 11
A Mixed Hierarchical Algorithm for Nearest Neighbor Search
The k nearest neighbor (kNN) search is a computationally intensive application critical to fields such as image processing, statistics, and biology. Recent works have demonstrated the efficacy of k-d tree based implementations on multi-core CPUs. It is unclear, however, whether such tree based implementations are amenable for execution in high-density processors typified today by the […]
Sep, 11
Memory transfer optimization for a lattice Boltzmann solver on Kepler architecture nVidia GPUs
The Lattice Boltzmann method (LBM) for solving fluid flow is naturally well suited to an efficient implementation for massively parallel computing, due to the prevalence of local operations in the algorithm. This paper presents and analyses the performance of a 3D lattice Boltzmann solver, optimized for third generation nVidia GPU hardware, also known as `Kepler’. […]