Posts
Mar, 16
Exploring power efficiency and optimizations targeting heterogeneous applications
Graphics processing units (GPUs) have become widely accepted as the computing platform of choice in many high performance computing domains, due to the potential for approaching or exceeding the performance of a large cluster of CPUs for many parallel applications. The availability of programming standards such as OpenCL makes the use of GPUs even more […]
Mar, 15
iTree: Exploring Time-Varying Data using Indexable Tree
Significant advances have been made in time-varying data analysis and visualization, mainly in improving our ability to identify temporal trends and classify the underlying data. However, the ability to perform cost-effective data querying and indexing is often not incorporated, which posts a serious limitation as the size of timevarying data continue to grow. In this […]
Mar, 15
Real-time Rendering of Melting Objects in Video Games
We present a method for simulating the melting and flowing of material in burning objects fast enough to be of use in video games where most of the graphical and computational resources are needed elsewhere. The standard practice of using particle engines or fluid dynamics for melting are far too costly for use in this […]
Mar, 15
Convergence and Scalarization for Data-Parallel Architectures
Modern throughput processors such as GPUs achieve high performance and efficiency by exploiting data parallelism in application kernels expressed as threaded code. One drawback of this approach compared to conventional vector architectures is redundant execution of instructions that are common across multiple threads, resulting in energy inefficiency due to excess instruction dispatch, register file accesses, […]
Mar, 15
Prius: A Runtime for Hybrid Computing
Prius is a framework for seamless execution of OpenCL programs across integrated, heterogeneous systems. Applications interfacing with Prius need not be aware of the characteristics of the hardware; instead the framework will automatically map kernel executions to suitable processors at run-time. The modular nature of the framework allows easy evaluation of new mapping strategies.
Mar, 15
Input-Aware Auto-Tuning for Directive-based GPU Programming
The difficulties posed by GPGPU programming and the need to increase productivity have guided research towards directive-based high-level programs for accelerators. This effort has led to the definition of the OpenACC industry standard. It significantly simplifies writing code for graphics engines leaving the programmer the opportunity to tune the application for the target hardware and […]
Mar, 14
Simulation of a flowing snow avalanche using molecular dynamics
This paper presents an approach for modelling and simulation of a flowing snow avalanche, which is formed of dry and liquefied snow that slides down a slope, by using molecular dynamics and discrete element method. A particle system is utilized as a base method for the simulation and marching cubes with real-time shaders are employed […]
Mar, 14
Selection of Task Implementations in the Nanos++ Runtime
New heterogeneous systems and hardware accelerators can give higher levels of computational power to high performance computers. However, this does not come for free, since the more heterogeneity the system presents, the more complex becomes the programming task in terms of resource utilization. OmpSs is a task-based programming model and framework focused on the automatic […]
Mar, 14
Automated and interactive approaches for optimal surface finding based segmentation of medical image data
Optimal surface finding (OSF), a graph-based optimization approach to image segmentation, represents a powerful framework for medical image segmentation and analysis. In many applications, a pre-segmentation is required to enable OSF graph construction. Also, the cost function design is critical for the success of OSF. In this thesis, two issues in the context of OSF […]
Mar, 14
Parallel Particle Swarm Optimization for Image Segmentation
One of the problems faced with Particle Swarm Optimization (PSO) is that this method is simply time consuming. It is so, especially when it deals with a problem that needs a lot of particles to represent. This paper tries to compare the speed of PSO run at parallel mode to ordinary one. The testing applies […]
Mar, 14
CPU and/or GPU: Revisiting the GPU Vs. CPU Myth
Parallel computing using accelerators has gained widespread research attention in the past few years. In particular, using GPUs for general purpose computing has brought forth several success stories with respect to time taken, cost, power, and other metrics. However, accelerator based computing has signifi- cantly relegated the role of CPUs in computation. As CPUs evolve […]
Mar, 12
GPU implementation of a deep learning network for image recognition tasks
Image recognition and classification is one of the primary challenges of the machine learning community. Recent advances in learning systems, coupled with hardware developments have enabled general object recognition systems to be learned on home computers with graphics processing units. Presented is a Deep Belief Network engineered using NVIDIA’s CUDA programming language for general object […]