17606

Posts

Sep, 16

Out-of-core Implementation for Accelerator Kernels on Heterogeneous Clouds

Cloud environments today are increasingly featuring hybrid nodes containing multicore CPU processors and a diverse mix of accelerators such as Graphics Processing Units (GPUs), Intel Xeon Phi co-processors, and Field-Programmable Gate Arrays (FPGAs) to facilitate easier migration to them of HPC workloads. While virtualization of accelerators in clouds is a leading research challenge, we address […]
Sep, 7

Multi-Tasking Scheduling for Heterogeneous Systems

Heterogeneous platforms play an increasingly important role in modern computer systems. They combine high performance with low power consumption. From mobiles to supercomputers, we see an increasing number of computer systems that are heterogeneous. The most well-known heterogeneous system, CPU+GPU platforms have been widely used in recent years. As they become more mainstream, serving multiple […]
Sep, 3

Real-Time Rendering of Molecular Dynamics Simulation Data: A Tutorial

Achieving real-time molecular dynamics rendering is a challenge, especially when the rendering requires intensive computation involving a large simulation data-set. The task becomes even more challenging when the size of the data is too large to fit into random access memory (RAM) and the final imagery depends on the input and output (I/O) performance. The […]
Aug, 26

eccCL: parallelized GPU implementation of Ensemble Classifier Chains

BACKGROUND: Multi-label classification has recently gained great attention in diverse fields of research, e.g., in biomedical application such as protein function prediction or drug resistance testing in HIV. In this context, the concept of Classifier Chains has been shown to improve prediction accuracy, especially when applied as Ensemble Classifier Chains. However, these techniques lack computational […]
Aug, 14

Accelerating digital forensic searching through GPGPU parallel processing techniques

Background String searching within a large corpus of data is a critical component of digital forensic (DF) analysis techniques such as file carving. The continuing increase in capacity of consumer storage devices requires similar improvements to the performance of string searching techniques employed by DF tools used to analyse forensic data. As string searching is […]
Aug, 1

Directive-Based Partitioning and Pipelining for Graphical Processing Units

The community needs simpler mechanisms to access the performance available in accelerators, such as GPUs, FPGAs, and APUs, due to their increasing use in stateof-the-art supercomputers. Programming models like CUDA, OpenMP, OpenACC and OpenCL can efficiently offload compute-intensive workloads to these devices. By default these models naively offload computation without overlapping it with communication (copying […]
Jul, 25

On Simplifying and Optimizing Programs for Heterogeneous Computing Systems

Today, with the growth of highly parallel and heterogeneous architectures, systems composed of a combination of multicore CPUs, GPUs, and accelerators are becoming more common in HPC. Although heterogeneous architectures bring considerable benefits from a performance and energy perspective, they also make application development very challenging introducing the necessity of different parallel programming paradigms. Recently, […]
Jul, 25

FUX-Sim: Implementation of a fast universal simulation/reconstruction framework for X-ray systems

The availability of digital X-ray detectors, together with advances in reconstruction algorithms, creates an opportunity for bringing 3D capabilities to conventional radiology systems. The downside is that reconstruction algorithms for non-standard acquisition protocols are generally based on iterative approaches that involve a high computational burden. The development of new flexible X-ray systems could benefit from […]
Jul, 22

GPU accelerated computation of Polarized Subsurface BRDF for Flat Particulate Layers

BRDF of most real world materials has two components, the surface BRDF due to the light reflecting at the surface of the material and the subsurface BRDF due to the light entering and going through many scattering events inside the material. Each of these events modifies light’s path, power, polarization state. Computing polarized subsurface BRDF […]
Jul, 14

Cooperative Kernels: GPU Multitasking for Blocking Algorithms

There is growing interest in accelerating irregular data-parallel algorithms on GPUs. These algorithms are typically blocking, so they require fair scheduling. But GPU programming models (e.g. OpenCL) do not mandate fair scheduling, and GPU schedulers are unfair in practice. Current approaches avoid this issue by exploiting scheduling quirks of today’s GPUs in a manner that […]
Jul, 2

Snowflake: A Lightweight Portable Stencil DSL

Stencil computations are not well optimized by general-purpose production compilers and the increased use of multicore, manycore, and accelerator-based systems makes the optimization problem even more challenging. In this paper we present Snowflake, a Domain Specific Language (DSL) for stencils that uses a "micro-compiler" approach, i.e., small, focused, domain-specific code generators. The approach is similar […]
Jul, 2

Synthesis of Embedded Software using Dataflow Schedule Graphs

In the design and implementation of digital signal processing (DSP) systems, dataflow is recognized as a natural model for specifying applications, and dataflow enables useful model-based methodologies for analysis, synthesis, and optimization of implementations. A wide range of embedded signal processing applications can be designed efficiently using the high level abstractions that are provided by […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: