16306

Posts

Jul, 26

Strategies for Protecting Intellectual Property when Using CUDA Applications on Graphics Processing Units

Recent advances in the massively parallel computational abilities of graphical processing units (GPUs) have increased their use for general purpose computation, as companies look to take advantage of big data processing techniques. This has given rise to the potential for malicious software targeting GPUs, which is of interest to forensic investigators examining the operation of […]
Jul, 26

A Data Parallel Algorithm for Seismic Raytracing

Dijkstra’s single-source shortest path algorithm has been applied in seismic tomography to determine paths of minimum travel time from all locations in a 3D earth model to sensors used in seismic experiments. An iterative data parallel algorithm is formulated for seismic tomography based on the Bellman-Ford-Moore (BFM) algorithm. Performance is demonstrated for OpenMP and OpenCL.
Jul, 26

FPGA-Based Accelerator Design from a Domain-Specific Language

A large portion of image processing applications often come with stringent requirements regarding performance, energy efficiency, and power. FPGAs have proven to be among the most suitable architectures for algorithms that can be processed in a streaming pipeline. Yet, designing imaging systems for FPGAs remains a very time consuming task. High-Level Synthesis, which has significantly […]
Jul, 26

Gerbil: A Fast and Memory-Efficient k-mer Counter with GPU-Support

A basic task in bioinformatics is the counting of k-mers in genome strings. The k-mer counting problem is to build a histogram of all substrings of length k in a given genome sequence. We present the open source k-mer counting software Gerbil that has been designed for the efficient counting of k-mers for $kgeq32$. Given […]
Jul, 26

GPU-accelleration of image rendering and sorting algorithms with the OpenCL framework

Today’s computer systems often contains several different processing units aside from the CPU. Among these the GPU is a very common processing unit with an immense compute power that is available in almost all computer systems. How do we make use of this processing power that lies within our machines? One answer is the OpenCL […]
Jul, 20

Algorithmic Trading: A brief, computational finance case study on data centre FPGAs

Increasingly FPGAs will be deployed at scale due to the need for increased need for power efficient computation and improved high level synthesis tool flows, creating a new category of device: data centre FPGAs. A method for using these FPGAs is to identify what proportion of a given workload would benefit from being implemented upon […]
Jul, 20

THOR: A New and Flexible Global Circulation Model to Explore Planetary Atmospheres

We have designed and developed, from scratch, a global circulation model named THOR that solves the three-dimensional non-hydrostatic Euler equations. Our general approach lifts the commonly used assumptions of a shallow atmosphere and hydrostatic equilibrium. We solve the "pole problem" (where converging meridians on a sphere lead to increasingly smaller time steps near the poles) […]
Jul, 20

Lowering IrGL to CUDA

The IrGL intermediate representation is an explicitly parallel representation for irregular programs that targets GPUs. In this report, we describe IrGL constructs, examples of their use and how IrGL is compiled to CUDA by the Galois GPU compiler.
Jul, 20

Scientific Computing Using Consumer Video-Gaming Hardware Devices

Commodity video-gaming hardware (consoles, graphics cards, tablets, etc.) performance has been advancing at a rapid pace owing to strong consumer demand and stiff market competition. Gaming hardware devices are currently amongst the most powerful and cost-effective computational technologies available in quantity. In this article, we evaluate a sample of current generation video-gaming hardware devices for […]
Jul, 20

Runtime Configurable Deep Neural Networks for Energy-Accuracy Trade-off

We present a novel dynamic configuration technique for deep neural networks that permits step-wise energy-accuracy trade-offs during runtime. Our configuration technique adjusts the number of channels in the network dynamically depending on response time, power, and accuracy targets. To enable this dynamic configuration technique, we co-design a new training algorithm, where the network is incrementally […]
Jul, 18

Accelerating the Conjugate Gradient Algorithm with GPUs in CFD Simulations

This paper illustrates how GPU computing can be used to accelerate computational fluid dynamics (CFD) simulations. For sparse linear systems arising from finite volume discretization, we evaluate and optimize the performance of Conjugate Gradient (CG) routines designed for manycore accelerators and compare against an industrial CPU-based implementation. We also investigate how the recent advances in […]
Jul, 18

IODA: an Input/Output Deep Architecture for image labeling

In this article, we propose a deep neural network (DNN) architecture called Input Output Deep Architecture (IODA) for solving the problem of image labeling. IODA directly links a whole image to a whole label map, assigning a label to each pixel using a single neural network forward step. Instead of designing a handcrafted a priori […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us:

contact@hpgu.org