16917

Posts

Jan, 19

Parallelizing Exact and Approximate String Matching via Inclusive Scan on a GPU

In this study, to substantially improve the runtimes of exact and approximate string matching algorithms, we propose a tribrid parallel method for bit-parallel algorithms such as the Shift-Or and Wu-Manber algorithms. Our underlying idea is to interpret bit-parallel algorithms as inclusive-scan operations, which allow these bit-parallel algorithms to run efficiently on a graphics processing unit […]
Jan, 19

Light Loss-Less Data Compression, with GPU Implementation

There is no doubt that data compression is very important in computer engineering. However, most lossless data compression and decompression algorithms are very hard to parallelize, because they use dictionaries updated sequentially. The main contribution of this paper is to present a new lossless data compression method that we call Light Loss-Less (LLL) compression. It […]
Jan, 16

An OpenCL(TM) Deep Learning Accelerator on Arria 10

Convolutional neural nets (CNNs) have become a practical means to perform vision tasks, particularly in the area of image classification. FPGAs are well known to be able to perform convolutions efficiently, however, most recent efforts to run CNNs on FPGAs have shown limited advantages over other devices such as GPUs. Previous approaches on FPGAs have […]
Jan, 16

Using efficient parallelization in Graphic Processing Units to parameterize stochastic fire propagation models

Fire propagation is a major concern in the world in general and in Argentinian northwestern Patagonia in particular where every year hundreds of hectares are affected by both natural and anthropogenic forest fires. We developed an efficient cellular automata model in Graphic Processing Units (GPUs) to simulate fire propagation. The graphical advantages of GPUs were […]
Jan, 16

Application of GPU Computing to Some Urban Traffic Problems

The present work studies and proposes GPU-based parallel algorithms and implementations for the problem of macroscopic assignment of urban traffic on large-scale networks, promoting an in-depth investigation on each sub-problem that must be efficiently solved during the traffic assignment process. Among the main contributions of this work, there are: 1) the first GPU-based algorithm for […]
Jan, 16

An N log N Parallel Fast Direct Solver for Kernel Matrices

Kernel matrices appear in machine learning and non-parametric statistics. Given N points in d dimensions and a kernel function that requires $mathcal{O}(d)$ work to evaluate, we present an $mathcal{O}(dNlog N)$-work algorithm for the approximate factorization of a regularized kernel matrix, a common computational bottleneck in the training phase of a learning task. With this factorization, […]
Jan, 16

Decoding with Finite-State Transducers on GPUs

Weighted finite automata and transducers (including hidden Markov models and conditional random fields) are widely used in natural language processing (NLP) to perform tasks such as morphological analysis, part-of-speech tagging, chunking, named entity recognition, speech recognition, and others. Parallelizing finite state algorithms on graphics processing units (GPUs) would benefit many areas of NLP. Although researchers […]
Jan, 12

GPU Hackathons, 2017

Background General-purpose Graphics Processing Units (GPGPUs) potentially offer exceptionally high memory bandwidth and performance for a wide range of applications. The challenge in utilizing such accelerators has been the difficulty in programming them. Any and all GPU programming paradigms are welcome. Hackathon goal The goal of each hackathon is for current or prospective user groups […]
Jan, 10

Parallelization of BVH and BSP on the GPU

Rendering is a central point in computer graphics and visualization. In order to display realistic images reflections, shadows and further realistic light diffusions is needed. To obtain these, ray tracing, view frustum culling as well as transparency sorting among others are commonly used techniques. Given the right acceleration structure, said procedures can be reduced to […]
Jan, 10

Software Prefetching for Indirect Memory Accesses

Many modern data processing and HPC workloads are heavily memory-latency bound. A tempting proposition to solve this is software prefetching, where special non-blocking loads are used to bring data into the cache hierarchy just before being required. However, these are difficult to insert to effectively improve performance, and techniques for automatic insertion are currently limited. […]
Jan, 10

DeepDSL: A Compilation-based Domain-Specific Language for Deep Learning

In recent years, Deep Learning (DL) has found great success in domains such as multimedia understanding. However, the complex nature of multimedia data makes it difficult to develop DL-based software. The state-of-the art tools, such as Caffe, TensorFlow, Torch7, and CNTK, while are successful in their applicable domains, are programming libraries with fixed user interface, […]
Jan, 10

An FPGA Accelerator for Molecular Dynamics Simulation Using OpenCL

Molecular dynamics (MD) simulations are very important to study physical properties of the atoms and molecules. However, a huge amount of processing time is required to simulate a few nano-seconds of an actual experiment. Although the hardware acceleration using FPGAs provides promising results, huge design time and hardware design skills are required to implement an […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: