11324

Posts

Jan, 25

Performance Evaluation of the Intel Many Integrated Core Architecture for 3D Image Reconstruction in Computed Tomography

The computational effort of 3D image reconstruction in Computed Tomography (CT) has required special purpose hardware for a long time. Systems such as custom-built FPGA-systems and GPUs are still widely-used today, in particular in interventional settings, where radiologists require a hard time constraint for reconstruction. However, recently is has been shown that today even commodity […]
Jan, 25

Improvement of the fused CUDA kernels performance prediction

In this thesis a tool for improving the performance prediction of a source-to-source compiler of mapped functions developed on the Faculty of Informatics is presented. This tool integrates the modification of the original compiler and static and dynamic data gathering to provide as much data about the fusions as possible in order to analyze them. […]
Jan, 25

Finite differences numerical method for two-dimensional superlattice Boltzmann transport equation and case comparison of CPU(C) and GPGPU(CUDA) implementations

We present finite differences numerical algorithm for solving 2D spatially homogeneous Boltzmann transport equation for semiconductor superlattices (SL) subject to time dependant electric field along SL axis and constant perpendicular magnetic field. Algorithm is implemented in C language targeted to CPU and in CUDA C language targeted to commodity NVidia GPUs. We compare performance and […]
Jan, 23

OpenSSL acceleration using Graphics Processing Units

Cryptography: The study of techniques focused on security. Typically, an implementation of cryptography is computationally heavy, leading to performance issues on general purpose systems. Adding the possibility of offloading cryptographic operations to a Graphics Processing Unit (GPU) onto a widespread, open-source cryptographic library such as OpenSSL would be extremely useful in lightening the CPU load […]
Jan, 23

On the Portability of the OpenCL Dwarfs on Fixed and Reconfigurable Parallel Platforms

The proliferation of heterogeneous computing systems presents the parallel computing community with the challenge of porting legacy and emerging applications to multiple processors with diverse programming abstractions. OpenCL is a vendor-agnostic and industry-supported programming model that offers code portability on heterogeneous platforms, allowing applications to be developed once and deployed "anywhere". In this paper, we […]
Jan, 23

clpeak – peak performance of your opencl device

clpeak is a benchmarking tool intended toward developers to fine-tune opencl kernels for a particular device/class of device. It calculates bandwidth & compute performance for different vector-widths of a datatype, say float, float4. Traditionally it is recommended to use scalar code and we expect opencl compiler to auto-vectorize it. But, most of the times compiler […]
Jan, 23

A comparison between parallelization approaches in molecular dynamics simulations on GPUs

We test the performances of two different approaches to the computation of forces for molecular dynamics simulations on Graphics Processing Units. A "vertex-based" approach, where a computing thread is started per particle, is compared to a newly proposed "edge-based" approach, where a thread is started per each potentially non-zero interaction. We find that the former […]
Jan, 23

Multi-GPU parallel memetic algorithm for capacitated vehicle routing problem

The goal of this paper is to propose and test a new memetic algorithm for the capacitated vehicle routing problem in parallel computing environment. In this paper we consider simple variation of vehicle routing problem in which the only parameter is the capacity of the vehicle and each client only needs one package. We present […]
Jan, 19

A Lattice Boltzmann Method Simulator for Microfluidics on GPU Cluster

A simulator for microfluidic systems, based on lattice Boltzmann method (LBM) was developed for running on a Graphics Processing Unit (GPU) cluster. It was written on CUDA C language, implementing single component single phase fluids, and includes periodic, velocity, bounce-back and pressure boundary conditions. The program was run on a cluster with four node, where […]
Jan, 19

GPU based Implementation of Film Flicker Reduction Algorithms

In this work we propose an algorithm for film restoration aimed at reducing the flicker effect while preserving the original overall illumination of the film. We also present a comparative study of the performance of this algorithm implemented following a sequential approach on a CPU and following a parallel approach on a GPU using OpenCL.
Jan, 19

FlowTour: An Automatic Guide for Exploring Internal Flow Features

We present FlowTour, a novel framework that provides an automatic guide for exploring internal flow features. Our algorithm first identifies critical regions and extracts their skeletons for feature characterization and streamline placement. We then create candidate viewpoints based on the construction of a simplified mesh enclosing each critical region and select best viewpoints based on […]
Jan, 19

Finite-difference time-domain solver for room acoustics using graphics processing units

Several acoustic simulation methods have been introduced during the past decades. Wave-based simulation methods have been one of the alternatives, but their applicability for wideband acoustic simulation has been limited by the computing power of available hardware. During recent years, the processing power and programmability of graphics processing units have improved, and therefore several wave-based […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org