Posts
Oct, 29
Jit4OpenCL: a compiler from Python to OpenCL
Heterogeneous computing platforms that use GPUs and CPUs in tandem for computation have become an important choice to build low-cost high-performance computing platforms. The computing ability of modern GPUs surpasses that of CPUs can offer for certain classes of applications. GPUs can deliver several Tera-Flops in peak performance. However, programmers must adopt a more complicated […]
Oct, 29
GPU based particle system
GPGPU (General purpose computing on graphics processing unit) is quite common in today’s modern computer games when doing heavy simulation calculations like game physics or particle systems. GPU programming is not only used in games but also in scientific research when doing heavy calculations on molecular structures and protein folding etc. The reason why you […]
Oct, 29
Using OpenCL for image analysis
This thesis investigates the suitability of OpenCL for acceleration of Image analysis operations from a developers perspective. To achieve this four representative problems: Morphological operations, Convolution, Watershedding and Markov random field-based texture segmentation are evaluated. The selected problems offers different implementation issues in terms of locality of the operations and load versus computation. The thesis […]
Oct, 29
Using GPUs to Accelerate Installed Antenna Performance Simulations
Savant is a asymptotic ray-tracing CEM tool used to predict the performance of antennas installed on electrically large platforms, including far-field antenna patterns, near-field distributions, and antenna-to-antenna coupling. Savant is based on the shooting and bouncing rays (SBR) formulation. While asymptotic solvers like Savant have significantly smaller computational and memory requirements for electrically large problems […]
Oct, 29
An Exploration of OpenCL on Multiple Hardware Platforms for a Numerical Relativity Application
Currently there is considerable interest in making use of many-core processor architectures, such as Nvidia and AMD graphics processing units (GPUs) for scientific computing. In this work we explore the use of the Open Computing Language (OpenCL) for a typical Numerical Relativity application: a time-domain Teukolsky equation solver (a linear, hyperbolic, partial differential equation solver […]
Oct, 28
An Adaptive Framework for Managing Heterogeneous Many-Core Clusters
The computing needs and the input and result datasets of modern scientific and enterprise applications are growing exponentially. To support such applications, High-Performance Computing (HPC) systems need to employ thousands of cores and innovative data management. At the same time, an emerging trend in designing HPC systems is to leverage specialized asymmetric multicores, such as […]
Oct, 28
Compiling Stream Applications for Heterogeneous Architectures
Heterogeneous processing systems have become the industry standard in almost every segment of the computing market from servers to mobile systems. In addition to employing shared/distributed memory processors, the current trend is to use hardware components such as field programmable gate arrays (FPGAs), single instruction multiple data (SIMD) engines and graphics processing units (GPUs) in […]
Oct, 28
Architectural Exploration and Scheduling Methods for Coarse Grained Reconfigurable Arrays
Coarse Grained Reconfigurable Arrays have emerged, in recent years, as promising candidates to realize efficient reconfigurable platforms. CGRAs feature high computational density, flexible routing interconnect and rapid reconfiguration, characteristics that make them well-suited to speed up execution of computational kernels. A number of designs embodying the CGRA concept have been proposed in literature, most of […]
Oct, 28
Architecture-based Performance Evaluation of Genetic Algorithms on Multi/Many-core Systems
A Genetic Algorithm (GA) is a heuristic to find exact or approximate solutions to optimization and search problems within an acceptable time. We discuss GAs from an architectural perspective, offering a general analysis of GAs on multi-core CPUs and on GPUs, with solution quality considered. We describe widely-used parallel GA schemes based on Master-Slave, Island […]
Oct, 28
Matrix inversion speed up with CUDA
In this project several mathematic algorithms are developed to obtain a matrix inversion method – that combines CUDA’s parallel architecture and MATLAB which is actually faster than MATLAB’s built in inverse matrix function. This matrix inversion method is intended to be used for image reconstruction as a faster alternative to iterative methods with a comparable […]
Oct, 28
Parallel Random Numbers: As Easy as 1, 2, 3
Most pseudorandom number generators (PRNGs) scale poorly to massively parallel high-performance computation because they are designed as sequentially dependent state transformations. We demonstrate that independent, keyed transformations of counters produce a large alternative class of PRNGs with excellent statistical properties (long period, no discernable structure or correlation). These counter-based PRNGs are ideally suited to modern […]
Oct, 28
Programming Massively Parallel Processors with CUDA (audio course)
Virtually all semiconductor market domains, including PCs, game consoles, mobile handsets, servers, supercomputers, and networks, are converging to concurrent platforms. There are two important reasons for this trend. First, these concurrent processors can potentially offer more effective use of chip space and power than traditional monolithic microprocessors for many demanding applications. Second, an increasing number […]