Posts
Jan, 7
Fat versus Thin Threading Approach on GPUs: Application to Stochastic Simulation of Chemical Reactions
We explore two different threading approaches on a graphics processing unit (GPU) exploiting two different characteristics of the current GPU architecture. The fat thread approach tries to minimize data access time by relying on shared memory and registers potentially sacrificing parallelism. The thin thread approach maximizes parallelism and tries to hide access latencies. We apply […]
Jan, 7
Efficient Parallel Graph Exploration on Multi-Core CPU and GPU
Graphs are a fundamental data representation that has been used extensively in various domains. In graph-based applications, a systematic exploration of the graph such as a breadth-first search (BFS) often serves as a key component in the processing of their massive data sets. In this paper, we present a new method for implementing the parallel […]
Jan, 7
Interactive rendering of acquired materials on dynamic geometry using bandwidth prediction
Shading complex materials such as acquired reflectances in multi-light environments is computationally expensive. Estimating the shading integral involves sampling the incident illumination independently at several pixels. The number of samples required for this integration varies across the image, depending on an intricate combination of several factors. Adaptively distributing computational budget across the pixels for shading […]
Jan, 7
On-the-Fly Computing on Many-Core Processors in Nuclear Applications
Many nuclear applications still require more computational power than the current computers can provide. Furthermore, some of them require dedicated machines, because they must run constantly or no delay is allowed. To satisfy these requirements, we introduce computer accelerators which can provide higher computational power with lower prices than the current commodity processors. However, the […]
Jan, 7
Graduate Operating Systems: Project Report
Due to the high demand for secure Internet usage, an improvement of the SSL performance is needed. This paper describes a technique to improve the performance of SSL by creating a CPU/GPU hybrid proxy to sit in front of a web server to only handle the SSL overheads. This will allow the utilization of high […]
Jan, 7
Multiphase Fluid Simulations on a Multiple GPGPU PC Using Unsplit Time Integration VSIAM3
This talk presents the implementation of simulations on multiphase fluid dynamics on hardware of multiple GPGPU architecture by using robust and efficient numerical methods. An unsplit formulation for the advection computation is proposed to take the place of the original split formulation in the so-called VSIAM3 method. The new formulation improves dimensional symmetry of numerical […]
Jan, 6
GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement
In hardware-aware high performance computing, block- asynchronous iteration and mixed precision iterative refinement are two techniques that are applied to leverage the computing power of SIMD accelerators like GPUs. Although they use a very different approach for this purpose, they share the basic idea of compensating the convergence behaviour of an inferior numerical algorithm by […]
Jan, 6
Artifact-Free JPEG Decompression with Total Generalized Variation
We propose a new model for the improved reconstruction of JPEG (Joint Photographic Experts Group) images. Given a JPEG compressed image, our method first determines the set of possible source images and then specifically chooses one of these source images satisfying additional regularity properties. This is realized by employing the recently introduced Total Generalized Variation […]
Jan, 6
Hyper neural network on OpenCL
The goal of this thesis is to design and implement a hyper neural network that has a topology with limited number inputs of individual neurons and uses genetic programming as the learning algorithm. Parallelization of this neural network is done with use of OpenCL standard which allows running it on wide range of devices. From […]
Jan, 6
Lossless Compression of Variable-Precision Floating-Point Buffers on GPUs
In this work, we explore the lossless compression of 32-bit floating-point buffers on graphics hardware. We first adapt a state-of-the-art 16-bit floating-point color and depth buffer compression scheme for operation on 32-bit data and propose two specific enhancements: dynamic bucket selection and a Fibonacci encoder. Next, we describe a unified codec for any type of […]
Jan, 6
Optimizing a Biomedical Imaging Orientation Score Framework
A branch of Biomedical image processing involves analyzing images containing elongates structures. The enhancement of these structures in noisy image data is often required to enable automatic image analysis. A framework for such noise reduction based on Coherence Enhancing Diffusion (CED) using Orientation Scores (OS) has been developed. However, owing to the high computational complexity […]
Jan, 6
CodePy
The C/C++ metaprogramming toolkit for Python [16], CodePy [2], is analysed according to its source code generation possibility and its way to generate extension modules for Python. The combination of both results in generating C code in a Python script and executing it from within the same script. Insights are given on how this roundtrip […]