19196

Posts

Nov, 10

Framework for Parallel Kernels Auto-tuning

The result of this thesis is a framework for auto-tuning of parallel kernels which are written in either OpenCL or CUDA language. The framework includes advanced functionality such as support for composite kernels and online auto-tuning. The thesis describes API and internal structure of the framework and presents several examples of its utilization for kernel […]
Nov, 10

Study of OpenCL Processing Models for FPGA Devices

In our study, we present the results of the implementation of the SHA-512 algorithm in FPGAs. The distinguished element of our work is that we conducted the work using OpenCL for FPGA, which is a relatively new development method for reconfigurable logic. We examine loop unrolling as an OpenCL performance optimization method and compare the […]
Nov, 10

CL-VIS: Visualization Platform for Understanding and Checking the OpenCL Programs

Due to GPU’s improved hardware performance, many researchers have tried to utilize the GPU for computer vision, image processing, cryptography, and artificial intelligence. As results, the GPU could successfully speed up algorithms from tens to hundreds of times in many cases. However, GPU programming is still known to be difficult because of its different characteristics […]
Nov, 10

KLARAPTOR: A Tool for Dynamically Finding Optimal Kernel Launch Parameters Targeting CUDA Programs

In this paper we present KLARAPTOR (Kernel LAunch parameters RAtional Program estimaTOR), a new tool built on top of the LLVM Pass Framework and NVIDIA CUPTI API to dynamically determine the optimal values of kernel launch parameters of a CUDA program P. To be precise, we describe a novel technique to statically build (at the […]
Nov, 10

Accelerating Stochastic Simulations on GPUs Using OpenCL

Since first introduced in 2008 with the 1.0 specification, OpenCL has steadily evolved over the decade to increase its support for heterogeneous parallel systems. In this paper, we accelerate stochastic simulation of biochemical reaction networks on modern GPUs (graphics processing units) by means of the OpenCL programming language. In implementing the OpenCL version of the […]
Nov, 9

8th International Workshop on OpenCL, including SYCLCon, 2019

Join us at the 8th International Workshop on OpenCL, including SYCLcon 2020, for three days of talks, workshops and community networking aimed at furthering the collaboration and knowledge sharing amongst the international community of high-performance computing specialist working with OpenCL, SYCL, SPIR and Vulkan Compute. The event provides a rich mix of hands-on tutorials, technical […]
Nov, 3

JSDoop and TensorFlow.js: Volunteer Distributed Web Browser-Based Neural Network Training

In 2019, around 57% of the population of the world has broadband access to the Internet. Moreover, there are 5.9 billion mobile broadband subscriptions, i.e., 1.3 subscriptions per user. So there is an enormous interconnected computational power held by users all around the world. Also, it is estimated that Internet users spend more than six […]
Nov, 3

Code Optimization on GPUs

Graphic Processing Units (GPUs) have become popular in the last decade due to their high memory bandwidth and powerful computing capacity. Nevertheless, achieving highperformance on GPUs is not trivial. It generally requires significant programming expertise and understanding of details of low-level execution mechanisms in GPUs. This dissertation introduces approaches for optimizing regular and irregular applications. […]
Nov, 3

In-memory database acceleration on FPGAs: a survey

While FPGAs have seen prior use in database systems, in recent years interest in using FPGA to accelerate databases has declined in both industry and academia for the following three reasons. First, specifically for in-memory databases, FPGAs integrated with conventional I/O provide insufficient bandwidth, limiting performance. Second, GPUs, which can also provide high throughput, and […]
Nov, 3

Implementing and evaluating an heterogeneous, scalable, tridiagonal linear system solver with OpenCL to target FPGAs, GPUs, and CPUs

Solving diagonally dominant tridiagonal linear systems is a common problem in scientific high-performance computing (HPC). Furthermore, it is becoming more commonplace for HPC platforms to utilise a heterogeneous combination of computing devices. Whilst it is desirable to design faster implementations of parallel linear system solvers, power consumption concerns are increasing in priority. This work presents […]
Nov, 3

Research on OpenCL optimization for FPGA deep learning application

In recent years, with the development of computer science, deep learning is held as competent enough to solve the problem of inference and learning in high dimensional space. Therefore, it has received unprecedented attention from both the academia and the business community. Compared with CPU/GPU, FPGA has attracted much attention for its high-energy efficiency, short […]
Oct, 27

PyTorchPipe: a framework for rapid prototyping of pipelines combining language and vision

Access to vast amounts of data along with affordable computational power stimulated the reincarnation of neural networks. The progress could not be achieved without adequate software tools, lowering the entry bar for the next generations of researchers and developers. The paper introduces PyTorchPipe (PTP), a framework built on top of PyTorch. Answering the recent needs […]

* * *

* * *

HGPU group © 2010-2019 hgpu.org

All rights belong to the respective authors

Contact us: