Posts
Sep, 20
Automatic abstraction and fault tolerance in cortical microachitectures
Recent advances in the neuroscientific understanding of the brain are bringing about a tantalizing opportunity for building synthetic machines that perform computation in ways that differ radically from traditional Von Neumann machines. These brain-like architectures, which are premised on our understanding of how the human neocortex computes, are highly fault-tolerant, averaging results over large numbers […]
Sep, 20
SRAM-DRAM hybrid memory with applications to efficient register files in fine-grained multi-threading
Large register files are common in highly multi-threaded architectures such as GPUs. This paper presents a hybrid memory design that tightly integrates embedded DRAM into SRAM cells with a main application to reducing area and power consumption of multi-threaded register files. In the hybrid memory, each SRAM cell is augmented with multiple DRAM cells so […]
Sep, 20
Brief announcement: better speedups for parallel max-flow
We present a parallel solution to the Maximum-Flow (Max-Flow) problem, suitable for a modern many-core architecture. We show that by starting from a PRAM algorithm, following an established "programmer’s workflow" and targeting XMT, a PRAM-inspired many-core architecture, we achieve significantly higher speed-ups than previous approaches. Comparison with the fastest known serial max-flow implementation on a […]
Sep, 20
Hermes: an integrated CPU/GPU microarchitecture for IP routing
With the constantly increasing Internet traffic and fast changing network protocols, future routers have to simultaneously satisfy the requirements for throughput, QoS, flexibility, and scalability. In this work, we propose a novel integrated CPU/GPU microarchitecture, Hermes, for QoS-aware high speed routing. We also develop a new thread scheduling mechanism, which significantly improves all QoS metrics.
Sep, 20
CuMAPz: a tool to analyze memory access patterns in CUDA
CUDA programming model provides a simple interface to program on GPUs, but tuning GPGPU applications for high performance is still quite challenging. Programmers need to consider several architectural details, and small changes in source code, especially on memory access pattern, affect performance significantly. This paper presents CuMAPz, a tool to compare the memory performance of […]
Sep, 20
EFFEX: an embedded processor for computer vision based feature extraction
The deployment of computer vision algorithms in mobile applications is growing at a rapid pace. A primary component of the computer vision software pipeline is feature extraction, which identifies and encodes relevant image features. We present an embedded heterogeneous multicore design named EFFEX that incorporates novel functional units and memory architecture support, making it capable […]
Sep, 20
Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework
Driven by the emergence of GPUs as a major player in high performance computing and the rapidly growing popularity of cloud environments, GPU instances are now being offered by cloud providers. The use of GPUs in a cloud environment, however, is still at initial stages, and the challenge of making GPU a true shared resource […]
Sep, 20
Acceleration of genetic algorithms for sudoku solution on many-core processors
In this paper, we use the problem of solving Sudoku puzzles to demonstrate the possibility of achieving practical processing time through the use of many-core processors for parallel processing in the application of genetic computation. To increase accuracy, we propose a genetic operation that takes building-block linkage into account. As a parallel processing model for […]
Sep, 19
Debugging CUDA
During six months of intensive nVidia CUDA C programming many bugs were created. We pass on the software engineering lessons learnt, particularly those relevant to parallel general-purpose computation on graphics hardware GPGPU.
Sep, 19
Parallel divide-and-evolve: experiments with OpenMP on a multicore machine
Multicore machines are becoming a standard way to speed up the system performance. After having instantiated the evolutionary metaheuristic DAEX with the forward search YAHSP planner, we investigate on the global parallelism approach, which exploits the intrinsic parallelism of the individual evaluation. This paper describes a parallel shared-memory version of the DAEYAHSP planning system using […]
Sep, 19
DAMS: distributed adaptive metaheuristic selection
We present a distributed algorithm, Select Best and Mutate (SBM), in the Distributed Adaptive Metaheuristic Selection (DAMS) framework. DAMS is dedicated to adaptive optimization in distributed environments. Given a set of metaheuristics, the goal of DAMS is to coordinate their local execution on distributed nodes in order to optimize the global performance of the distributed […]
Sep, 19
ACO with tabu search on a GPU for solving QAPs using move-cost adjusted thread assignment
This paper proposes a parallel ant colony optimization (ACO) for solving quadratic assignment problems (QAPs) on a graphics processing unit (GPU) by combining tabu (TS) with ACO in CUDA (ompute unified device architecture). In TS for QAP, all neighbor moves are tested. These moves form two groups based on computing of move cost. In one […]