Posts
Apr, 10
An innovative compilation tool-chain for embedded multi-core architectures
In this paper, we propose a compilation tool-chain supporting the effective exploitation of multi-core architectures offering hundreds of cores. The tool-chain leverages on both the application requirements and the platform-specific features to provide developers with a powerful parallel-programming environment able to generate efficient parallel code. The design of parallel applications follows a semi-automatic approach enabling […]
Apr, 9
New Basic Linear Algebra Methods for Simulation on GPUs
We have used Graphics Processing Units (GPUs) to accelerate the solution of the types of equations typically encountered in dynamic system simulators. Compared to commercial matrix solvers that run on a CPU, we realized speedups ranging from 5 (for system size ~700) to 460 (for system size ~5800). While calculation time for the commercial matrix […]
Apr, 9
A Study of Productivity and Performance of Modern Vector Processors
This bachelor thesis carries out a case study describing the performance and productivity of modern vector processors such as graphics processing units (GPUs) and central processing units (CPUs) based on three different computational routines arising from a magnetoencephalography application. I apply different programming paradigms to these routines targeting either the CPU or the GPU. Furthermore, […]
Apr, 9
Tiled Shading
Abstract In this article we describe and investigate tiled shading. The tiled techniques, though simple, enable substantial improvements to both deferred and forward shading. Tiled Shading has been previously discussed only in terms of deferred shading (tiled deferred shading). We contribute a more detailed description of the technique, introduce tiled forward shading (a generalization of […]
Apr, 9
A GPU-Based Accelerator for Chinese Word Segmentation
The task of Chinese word segmentation is to split sequence of Chinese characters into tokens so that the Chinese information can be more easily retrieved by web search engine. Due to the dramatic increase in the amount of Chinese literature in recent years, it becomes a big challenge for web search engines to analyze massive […]
Apr, 9
Efficient computational noise in GLSL
We present GLSL implementations of Perlin noise and Perlin simplex noise that run fast enough for practical consideration on current generation GPU hardware. The key benefits are that the functions are purely computational, i.e. they use neither textures nor lookup tables, and that they are implemented in GLSL version 1.20, which means they are compatible […]
Apr, 7
A Scalable Framework for Heterogeneous GPU-Based Clusters
GPU-based heterogeneous clusters continue to draw attention from vendors and HPC users due to their high energy efficiency and much improved single-node computational performance, however, there is little parallel software available that can utilize all CPU cores and all GPUs on the heterogeneous system efficiently. On a heterogeneous cluster, the performance of a GPU (or […]
Apr, 7
Bound the Peak Performance of SGEMM on GPU with software-controlled fast memory
In this paper, we studied the NVIDIA GPU architecture characteristics concerning the SGEMM routine and the potential peak performance of SGEMM on Fermi GPU. Guiding by the analysis, our SGEMM routine achieved about 11% (NN), 4.5% (TN), 3% (NT) and 9% (TT) better performance than cublas in CUDA 4.1 package for large matrices on GTX580 […]
Apr, 7
Robust Computational Tools for Multiple Testing With Genetic Association Studies
Resolving the interplay of the genetic components of a complex disease is a challenging endeavor. Over the past several years, genome-wide association studies (GWAS) have emerged as a popular approach at locating common genetic variation within the human genome associated with disease risk. Assessing genetic-phenotype associations upon hundreds of thousands of genetic markers using the […]
Apr, 7
GPU-based Line Probing Techniques for Mikami Routing Algorithm
Graphic processing unit (GPU), which contains hundreds of processing cores, is becoming a popular device for high performance computation in multi-core era. With strictly computation regularity characteristic, specific algorithms are key challenges for performance speed-up. In this paper, we propose a parallel CUDA-Mikami routing algorithm on NVIDIA’s GPU. A 32-bit routing grid encoding is proposed […]
Apr, 7
An Efficient Parallel GPU Evaluation of Small Angle X-Ray Scattering Profiles
The inference of protein structure from experimental data is of crucial interest in science, medicine and biotechnology. Unfortunately, high-resolution experimental methods can not yet provide a detailed analysis of the ensemble of conformations adopted under physiological conditions. Low resolution techniques are often better suited for this task. Small angle X-ray scattering (SAXS) plays a major […]
Apr, 6
OpenCL framework for a CPU, GPU, and FPGA Platform
With the availability of multi-core processors, high capacity FPGAs, and GPUs, a heterogeneous platform with tremendous raw computing capacity can be constructed consisting of any number of these computing elements. However, one of the major challenges for constructing such a platform is the lack of a standardized framework under which an application’s computational task and […]