7047

Posts

Jan, 20

Efficient fMRI Analysis and Clustering on GPUs

Graphics processing units (GPUs) traditionally have been used to accelerate only parts of the graphics pipelines. The emergence of the new age GPUs as highly parallel, multi-threaded and many core processor systems with the ability to perform general purpose computations has opened doors for new form of heterogeneous computing where the GPU and CPU can […]
Jan, 20

Coherent Spatiotemporal Filtering, Upsampling and Rendering of RGBZ Videos

Sophisticated video processing effects require both image and geometry information. We explore the possibility to augment a video camera with a recent infrared time-of-flight depth camera, to capture high-resolution RGB and low-resolution, noisy depth at video frame rates. To turn such a setup into a practical RGBZ video camera, we develop efficient data filtering techniques […]
Jan, 19

Dynamically Managed Data for CPU-GPU Architectures

GPUs are flexible parallel processors capable of accelerating real applications. To exploit them, programmers must ensure a consistent program state between the CPU and GPU memories by managing data. Manually managing data is tedious and error-prone. In prior work on automatic CPU-GPU data management, alias analysis quality limits performance, and type-inference quality limits applicability. This […]
Jan, 19

Combining approximate inference methods for efficient learning on large computer clusters

An important challenge in machine learning is to develop learning algorithms that can handle large amounts of data at a realistically large scale. This entails not only the development of algorithms that can be efficiently trained to infer parameters of the model in a given dataset, but also demands careful thought about the tools (both […]
Jan, 19

Accelerated Large-Scale Multiple Sequence Alignment

BACKGROUND: Multiple sequence alignment (MSA) is a fundamental analysis method used in bioinformatics and many comparative genomic applications. Prior MSA acceleration attempts with reconfigurable computing have only addressed the first stage of progressive alignment and consequently exhibit performance limitations according to Amdahl’s Law. This work is the first known to accelerate the third stage of […]
Jan, 19

A High Performance Parallel FDTD Method Enhanced By Using SSE Instruction Set

In this paper, we introduce a hardware acceleration technique for the parallel Finite Difference Time Domain (FDTD) method using the SSE (Streaming SIMD (Single Instruction Multiple Data) Extensions) instruction set. The implementation of SSE instruction set to parallel FDTD method has been achieved the significant improvement on the simulation performance. The benchmarks of the SSE […]
Jan, 19

Platform Characterization for Domain-Specific Computing

We believe that by adapting architectures to fit the requirements of a given application domain, we can significantly improve the efficiency of computation. To validate the idea for our application domain, we evaluate a wide spectrum of commodity computing platforms to quantify the potential benefits of heterogeneity and customization for the domain-specific applications. In particular, […]
Jan, 19

Human Re-identification System On Highly Parallel GPU and CPU Architectures

The paper presents a new approach to the human reidentification problem using covariance features. In many cases a distance operator between signatures based on generalized eigenvalues has to be computed efficiently, especially once the real-time response is expected from the system. This is a challenging problem as many procedures are computationally intensive tasks and must […]
Jan, 19

Power Profiling and Optimization for Heterogeneous Multi-Core Systems

Processing speed and energy efficiency are two of the most critical issues for computer systems. This paper presents a systematic approach for profiling the power and performance characteristics of application targeting heterogeneous multi-core computing platforms. Our approach enables rapid and automated design space exploration involving optimisation of workload distribution for systems with accelerators such as […]
Jan, 19

Asymptotic Peak Utilisation in Heterogeneous Parallel CPU/GPU Pipelines: A Decentralised Queue Monitoring Strategy

Heterogeneous parallel computing has become an unavoidable consequence of the emergence of GeneralPurpose computing on graphics processing units (GPGPU). The characteristics of a Graphics Processing Unit (GPU)-including significant memory transfer latency and complex performance characteristics-demand new approaches to ensuring that all available computational resources are geared towards optimal utilisation. This paper considers the simple case […]
Jan, 19

A Compile-Time Managed Multi-Level Register File Hierarchy

As processors increasingly become power limited, performance improvements will be achieved by rearchitecting systems with energy efficiency as the primary design constraint. While some of these optimizations will be hardware based, combined hardware and software techniques likely will be the most productive. This work redesigns the register file system of a modern throughput processor with […]
Jan, 19

Heterogeneity-Aware Resource Allocation and Scheduling in the Cloud

Data analytics are key applications running in the cloud computing environment. To improve performance and cost-effectiveness of a data analytics cluster in the cloud, the data analytics system should account for heterogeneity of the environment and workloads. In addition, it also needs to provide fairness among jobs when multiple jobs share the cluster. In this […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: