Posts
Mar, 8
HPerf: A Lightweight Profiler for Task Distribution on CPU+GPU Platforms
Heterogeneous computing has emerged as one of the major computing platforms in many domains. Although there have been several proposals to aid programming for heterogeneous computing platforms, optimizing applications on heterogeneous computing platforms is not an easy task. Identifying which parallel regions (or tasks) should run on GPUs or CPUs is one of the critical […]
Mar, 8
Generating Performance Portable Code using Rewrite Rules: From High-level Functional Expressions to High-Performance OpenCL Code
Computing systems have become increasingly complex with the emergence of heterogeneous hardware combining multicore CPUs and GPUs. These parallel systems exhibit tremendous computational power at the cost of increased programming effort. This results in a tension between performance and code portability. Typically, code is either tuned in an low-level imperative language using hardware-specific optimizations to […]
Mar, 6
Lyra2: Password Hashing Scheme with improved security against time-memory trade-offs
We present Lyra2, a password hashing scheme (PHS) based on cryptographic sponges. Lyra2 was designed to be strictly sequential (i.e., not easily parallelizable), providing strong security even against attackers that uses multiple processing cores (e.g., custom hardware or a powerful GPU). At the same time, it is very simple to implement in software and allows […]
Mar, 6
High-Performance Computation of a Jet in Cross Flow by Lattice Boltzmann Based Parallel Direct Numerical Simulation
Direct numerical simulation (DNS) of a round jet in crossflow based on lattice-Boltzmann method (LBM) is carried out on multi-GPU cluster. Data-parallel SIMT (Single- Instruction Multiple-Thread) characteristic of GPU matches the parallelism of LBM well, which leads to the high efficiency of GPU on the LBM solver. With present GPU settings (6 Nvidia Telsa K20M), […]
Mar, 6
PONDER – A Real time software backend for pulsar and IPS observations at the Ooty Radio Telescope
This paper describes a new real-time versatile backend, the Pulsar Ooty Radio Telescope New Digital Efficient Receiver (PONDER), which has been designed to operate along with the legacy analog system of the Ooty Radio Telescope (ORT). PONDER makes use of the current state of the art computing hardware, a Graphical Processing Unit (GPU) and sufficiently […]
Mar, 6
Multi-GPU implementation of a VMAT treatment plan optimization algorithm
VMAT optimization is a computationally challenging problem due to its large data size, high degrees of freedom, and many hardware constraints. High-performance graphics processing units have been used to speed up the computations. However, its small memory size cannot handle cases with a large dose-deposition coefficient (DDC) matrix. This paper is to report an implementation […]
Mar, 6
An OpenCL-based Monte Carlo dose calculation engine (oclMC) for coupled photon-electron transport
Monte Carlo (MC) method has been recognized the most accurate dose calculation method for radiotherapy. However, its extremely long computation time impedes clinical applications. Recently, a lot of efforts have been made to realize fast MC dose calculation on GPUs. Nonetheless, most of the GPU-based MC dose engines were developed in NVidia CUDA environment. This […]
Mar, 3
An efficient solution for hazardous geophysical flows simulation using GPUs
The movement of poorly sorted material over steep areas constitutes a hazardous environmental problem. Computational tools help in the understanding and predictions of such landslides. The main drawback is the high computational effort required for obtaining accurate numerical solutions due to the high number of cells involved in the calculus. In order to overcome this […]
Mar, 3
Energy-and cost-efficient Lattice-QCD computations using graphics processing units
Quarks and gluons are the building blocks of all hadronic matter, like protons and neutrons. Their interaction is described by Quantum Chromodynamics (QCD), a theory under test by large scale experiments like the Large Hadron Collider (LHC) at CERN and in the future at the Facility for Antiproton and Ion Research (FAIR) at GSI. However, […]
Mar, 3
Adaptive Video Encoding Based on OpenCL Face Recognition
Video chatting is now a popular way of communication. However, poor network ruins the experience as the faces are blurred. To solve this problem, the team offers a solution to preserve the clarity of faces under limited transmission rate. In this project, the primary goal is to design a video encoder that reduces the size […]
Mar, 3
Adaptive Kinetic-Fluid Solvers for Heterogeneous Computing Architectures
This paper describes recent progress towards porting a Unified Flow Solver (UFS) to heterogeneous parallel computing. UFS is an adaptive kinetic-fluid simulation tool, which combines Adaptive Mesh Refinement (AMR) with automatic cell-by-cell selection of kinetic or fluid solvers based on continuum breakdown criteria. The main challenge of porting UFS to graphics processing units (GPUs) comes […]
Mar, 3
Counting Triangles in Large Graphs on GPU
The clustering coefficient and the transitivity ratio are concepts often used in network analysis, which creates a need for fast practical algorithms for counting triangles in large graphs. Previous research in this area focused on sequential algorithms, MapReduce parallelization, and fast approximations. In this paper we propose a parallel triangle counting algorithm for CUDA GPU. […]

