Posts
Oct, 28
An Adaptive Framework for Managing Heterogeneous Many-Core Clusters
The computing needs and the input and result datasets of modern scientific and enterprise applications are growing exponentially. To support such applications, High-Performance Computing (HPC) systems need to employ thousands of cores and innovative data management. At the same time, an emerging trend in designing HPC systems is to leverage specialized asymmetric multicores, such as […]
Oct, 28
Compiling Stream Applications for Heterogeneous Architectures
Heterogeneous processing systems have become the industry standard in almost every segment of the computing market from servers to mobile systems. In addition to employing shared/distributed memory processors, the current trend is to use hardware components such as field programmable gate arrays (FPGAs), single instruction multiple data (SIMD) engines and graphics processing units (GPUs) in […]
Oct, 28
Architectural Exploration and Scheduling Methods for Coarse Grained Reconfigurable Arrays
Coarse Grained Reconfigurable Arrays have emerged, in recent years, as promising candidates to realize efficient reconfigurable platforms. CGRAs feature high computational density, flexible routing interconnect and rapid reconfiguration, characteristics that make them well-suited to speed up execution of computational kernels. A number of designs embodying the CGRA concept have been proposed in literature, most of […]
Oct, 28
Architecture-based Performance Evaluation of Genetic Algorithms on Multi/Many-core Systems
A Genetic Algorithm (GA) is a heuristic to find exact or approximate solutions to optimization and search problems within an acceptable time. We discuss GAs from an architectural perspective, offering a general analysis of GAs on multi-core CPUs and on GPUs, with solution quality considered. We describe widely-used parallel GA schemes based on Master-Slave, Island […]
Oct, 28
Matrix inversion speed up with CUDA
In this project several mathematic algorithms are developed to obtain a matrix inversion method – that combines CUDA’s parallel architecture and MATLAB which is actually faster than MATLAB’s built in inverse matrix function. This matrix inversion method is intended to be used for image reconstruction as a faster alternative to iterative methods with a comparable […]
Oct, 28
Parallel Random Numbers: As Easy as 1, 2, 3
Most pseudorandom number generators (PRNGs) scale poorly to massively parallel high-performance computation because they are designed as sequentially dependent state transformations. We demonstrate that independent, keyed transformations of counters produce a large alternative class of PRNGs with excellent statistical properties (long period, no discernable structure or correlation). These counter-based PRNGs are ideally suited to modern […]
Oct, 28
Programming Massively Parallel Processors with CUDA (audio course)
Virtually all semiconductor market domains, including PCs, game consoles, mobile handsets, servers, supercomputers, and networks, are converging to concurrent platforms. There are two important reasons for this trend. First, these concurrent processors can potentially offer more effective use of chip space and power than traditional monolithic microprocessors for many demanding applications. Second, an increasing number […]
Oct, 28
Implementing Domain-Specific Languages for Heterogeneous Parallel Computing
Domain-specific languages offer a solution to the performance and the productivity issues in heterogeneous computing systems. The Delite compiler framework simplifies the process of building embedded parallel DSLs. DSL developers can implement domain-specific operations by extending the DSL framework, which provides static optimizations and code generation for heterogeneous hardware. The Delite runtime automatically schedules and […]
Oct, 28
Dax: Data Analysis at Extreme
Experts agree that the exascale machine will comprise processors that contain many cores, which in turn will necessitate a much higher degree of concurrency. Software will require a minimum of a 1,000 times more concurrency. Most parallel analysis and visualization algorithms today work by partitioning data and running mostly serial algorithms concurrently on each data […]
Oct, 28
Parallel Computing the Longest Common Subsequence (LCS) on GPUs: Efficiency and Language Suitability
Sequence alignment is one of the most used tools in bioinformatic to find the resemblance among many sequences like ADN, ARN, amino acids. The longest common subsequence (LCS) of biological sequences is an essential and effective technique in sequence alignment. For solving the LCS problem, we resort to dynamic programming approach. Due to the growth […]
Oct, 27
Techniques to maximize memory bandwidth on the Rigel compute accelerator
The Rigel compute accelerator has been developed to explore alternative architectures for massively parallel processor chips. Currently GPUs that use wide SIMD are the primary implementations in this space. Many applications targeted to this space are performance limited by the memory all, so comparing the memory system performance of Rigel and GPUs is desirable. Memory […]
Oct, 27
Efficient Simulation of Ocean and Land Scenes Based on Digital Earth
Efficient and realistic simulation of ocean and land scenes is one of the hotspot and difficult problems of computer graphic. Most simulation of the recent ocean and land scenes is based on plane and is in a limited region. They didn’t consider the factors of earth curvature, nor the edge between ocean and land, can’t […]