Posts
Mar, 8
Enhancing productivity and performance portability of OpenCL applications on heterogeneous systems using runtime optimizations
Initially driven by a strong need for increased computational performance in science and engineering, heterogeneous systems have become ubiquitous and they are getting increasingly complex. The single processor era has been replaced with multi-core processors, which have quickly been surrounded by satellite devices aiming to increase the throughput of the entire system. These auxiliary devices, […]
Mar, 8
Compiler and runtime techniques for bulk-synchronous programming models on CPU architectures
The rising pressure to simultaneously improve performance and reduce power consumption is driving more heterogeneity into all aspects of computing devices. However, wide adoption of specialized computing devices such as GPUs and Xeon Phis comes with a programming challenge. A carefully optimized program that is well matched to the target hardware can run many times […]
Mar, 8
A Novel Mapping of Arbitrary Precision Integer Operations to the GPU
With modern processing hardware converging on the physical barrier in terms of transistor size and speed per single core, hardware manufacturers have shifted their focus to improve performance from raw clock power towards parallelization. Solutions to utilize the computation power of GPUs are published and supported by graphics card manufacturers. While there exist solutions for […]
Mar, 7
Topology optimization design of 3D electrothermomechanical actuators by using GPU as a co-processor
The topology optimization method (TOM) requires high computational resources to be solved, especially in multiphysics problems. The high number of computational requirements is because TOM is an iterative technique, in which the iterations go from tens to thousands. Furthermore, at each TOM iteration, it is necessary to execute several routines such as the finite element […]
Mar, 5
Fast LZW compression using a GPU
The LZW compression is a well known patented lossless compression method used in Unix file compression utility "compress" and in GIF and TIFF image formats. It converts an input string of characters (or 8-bit unsigned integers) into a string of codes using a code table (or dictionary) that maps strings into codes. Since the code […]
Mar, 5
Input Space Splitting for OpenCL
The performance of OpenCL programs suffers from memory and control flow divergence. Therefore, OpenCL compilers employ static analyses to identify non-divergent control flow and memory accesses in order to produce faster code. However, divergence is often input-dependent, hence can be observed for some, but not all inputs. In these cases, vectorizing compilers have to generate […]
Mar, 5
Performance Analysis of kNN on large datasets using CUDA & Pthreads
Several organizations have large databases which are growing at a rapid rate day by day, which need to be regularly maintained. Content based searches are similar searched based on certain features that are obtained from various multi media data. For various applications like multimedia content retrieval, data mining, pattern recognition, etc., performing the nearest neighbor […]
Mar, 5
Heterogeneous parallel algorithms for Computational Fluid Dynamics on unstructured meshes
Frontiers of computational fluid dynamics (CFD) are constantly expanding and eagerly demanding more computational resources. Currently, we are experiencing an rapid evolution in the high performance computing systems driven by power consumption constraints. New HPC nodes incorporate accelerators that are used as math co-processors for increasing the throughput and the FLOP per watt ratio. On […]
Mar, 5
Metamorphic Testing for (Graphics) Compilers
We present strategies for metamorphic testing of compilers using opaque value injection, and experiences using the method to test compilers for the OpenGL shading language.
Mar, 3
Hierarchical Semantic Parsing for Object Pose Estimation in Densely Cluttered Scenes
Densely cluttered scenes are composed of multiple objects which are in close contact and heavily occlude each other. Few existing 3D object recognition systems are capable of accurately predicting object poses in such scenarios. This is mainly due to the presence of objects with textureless surfaces, similar appearances and the difficulty of object instance segmentation. […]
Mar, 3
Hadoop Mapreduce OpenCL Plugin
Modern systems generates huge amounts of information right from areas like finance, telematics, healthcare, IOT devices to name a few, the modern day computing frameworks like Mapreduce needs an ever increasing amount of computing power to sort, arrange and generate insights from the data. This project is an attempt to harness the power of heterogeneous […]
Mar, 3
Full reconstruction of a 14-qubit state within four hours
Full quantum state tomography (FQST) plays a unique role in the estimation of the state of a quantum system without a priori knowledge or assumptions. Unfortunately, since FQST requires informationally (over)complete measurements, both the number of measurement bases and the computational complexity of data processing suffer an exponential growth with the size of the quantum […]