13375

Posts

Jan, 10

Face Recognition: A Tutorial on Computational Aspects

Face recognition is a sophisticated problem requiring a significant commitment of computer resources. A modern GPU architecture provides a practical platform for performing face recognition in real time. The majority of the calculations of an eigenpicture implementation of face recognition are matrix multiplications. For this type of computation, a conventional computer GPU is capable of […]
Jan, 10

Dynamic Feature-Adaptive Subdivision

Feature-adaptive subdivision (FAS) is one of the state-of-the art real-time rendering methods for subdivision surfaces on modern GPUs. It enables efficient and accurate rendering of subdivision surfaces in many interactive applications, such as video games or authoring tools. In this paper, we present dynamic feature-adaptive subdivision (DFAS), which improves upon FAS by enabling an independent […]
Jan, 10

Digital Signal Processing using Stream High Performance Computing: A 512-input Broadband Correlator for Radio Astronomy

A "large-N" correlator that makes use of Field Programmable Gate Arrays and Graphics Processing Units has been deployed as the digital signal processing system for the Long Wavelength Array station at Owens Valley Radio Observatory (LWA-OV), to enable the Large Aperture Experiment to Detect the Dark Ages (LEDA). The system samples a ~100MHz baseband and […]
Jan, 10

Exploring GPU Memory Performance Using Digital Image Processing Algorithms

Leveraging the incredible parallel computational power of graphics processing units (GPUs) is a proven method for accelerating general applications. Efficient utilization of the GPU remains one of the greatest challenges facing programmers. The performance of GPU applications is extremely reliant on memory performance, to the point that it can be considered a critical bottleneck. This […]
Jan, 10

Image Super-Resolution Using Deep Convolutional Networks

We propose a deep learning method for single image super-resolution (SR). Our method directly learns an end-to-end mapping between the low/high-resolution images. The mapping is represented as a deep convolutional neural network (CNN) that takes the low-resolution image as the input and outputs the high-resolution one. We further show that traditional sparse-coding-based SR methods can […]
Jan, 9

International Workshop on OpenCL

The International Workshop on OpenCL (IWOCL – “eye-wok-ul”) is an annual meeting and community of users, researchers, developers and suppliers that share best practice, and promote the evolution and advancement of the OpenCL standard for parallel programming of heterogeneous systems.
Jan, 8

CHO: A Benchmark Suite for OpenCL-based FPGA Accelerators

Programming FPGAs with OpenCL-based high-level synthesis frameworks is gaining attention with a number of commercial and research frameworks announced. However, there are no benchmarks for evaluating these frameworks. To this end, we present CHO benchmark suite an extension of CHStone, a commonly used C-based high-level synthesis benchmark suite, for OpenCl. We characterise CHO at various […]
Jan, 8

Cardiac Dysrhythmia Detection with GPU-Accelerated Neural Networks

Cardiac dysrhythmia is responsible for over half a million deaths in the United States annually. In this work, we evaluate the performance of neural networks on classifying electrocardiogram (ECG) sequences as normal or abnormal (arrhythmia). Using neural networks as our primary learning model, we explain our model’s performance and discuss hyperparameter tuning. Comparing the results […]
Jan, 8

An Extensible Framework for Composing Stencils with Common Scientific Computing Patterns

The SEJITS framework supports creating embedded domain-specific languages (DSELs) and code generators, a pair of which is called a specializer, with much less effort than creating a full DSL compiler-typically just a few hundred lines of code. SEJITS’ main benefit is allowing application writers to stay entirely in high-level languages such as Python by using […]
Jan, 8

Simulating and Visualizing Real-Time Crowds on GPU Clusters

We present a set of algorithms for simulating and visualizing real-time crowds in GPU (Graphics Processing Units) clusters. First we will present crowd simulation and rendering techniques that take advantage of single GPU machines, then using as an example a wandering crowd behavior simulation algorithm, we explain how this kind of algorithms can be extended […]
Jan, 8

Performance and Power Comparisons Between Nvidia and ATI GPUs

In recent years, modern graphics processing units have been widely adopted in high performance computing areas to solve large scale computation problems. The leading GPU manufacturers Nvidia and ATI have introduced series of products to the market. While sharing many similar design concepts, GPUs from these two manufacturers differ in several aspects on processor cores […]
Jan, 7

A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems

Recent technological advances have greatly improved the performance and features of embedded systems. With the number of just mobile devices now reaching nearly equal to the population of earth, embedded systems have truly become ubiquitous. These trends, however, have also made the task of managing their power consumption extremely challenging. In recent years, several techniques […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: