Posts
Jan, 17
Explainable Deep Behavioral Sequence Clustering for Transaction Fraud Detection
In e-commerce industry, user behavior sequence data has been widely used in many business units such as search and merchandising to improve their products. However, it is rarely used in financial services not only due to its 3V characteristics – i.e. Volume, Velocity and Variety – but also due to its unstructured nature. In this […]
Jan, 17
Fast convolutional neural networks on FPGAs with hls4ml
We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with large convolutional layers on FPGAs. By extending the hls4ml library, we demonstrate how to achieve inference latency of 5μs using convolutional architectures, while preserving state-of-the-art model performance. Considering benchmark models trained on the Street View House Numbers Dataset, we demonstrate various […]
Jan, 10
linus: Conveniently explore, share, and present large-scale biological trajectory data from a web browser
In biology, we are often confronted with information-rich, large-scale trajectory data, but exploring and communicating patterns in such data is often a cumbersome task. Ideally, the data should be wrapped with an interactive visualisation in one concise package that makes it straightforward to create and test hypotheses collaboratively. To address these challenges, we have developed […]
Jan, 10
Advances in Electron Microscopy with Deep Learning
This doctoral thesis covers some of my advances in electron microscopy with deep learning. Highlights include a comprehensive review of deep learning in electron microscopy; large new electron microscopy datasets for machine learning, dataset search engines based on variational autoencoders, and automatic data clustering by t-distributed stochastic neighbour embedding; adaptive learning rate clipping to stabilize […]
Jan, 10
Efficient Nearest-Neighbor Data Sharing in GPUs
Stencil codes (a.k.a. nearest-neighbor computations) are widely used in image processing, machine learning, and scientific applications. Stencil codes incur nearest-neighbor data exchange because the value of each point in the structured grid is calculated as a function of its value and the values of a subset of its nearest-neighbor points. When running on Graphics Processing […]
Jan, 10
Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed Hypergraphs
To apply neural sequence models such as the Transformers to music generation tasks, one has to represent a piece of music by a sequence of tokens drawn from a finite set of pre-defined vocabulary. Such a vocabulary usually involves tokens of various types. For example, to describe a musical note, one needs separate tokens to […]
Jan, 10
Hardware Acceleration of HPC Computational Flow Dynamics using HBM-enabled FPGAs
Scientific computing is at the core of many High-Performance Computing applications, including computational flow dynamics. Because of the uttermost importance to simulate increasingly larger computational models, hardware acceleration is receiving increased attention due to its potential to maximize the performance of scientific computing. A Field-Programmable Gate Array is a reconfigurable hardware accelerator that is fully […]
Jan, 6
9th International Workshop on OpenCL and SYCL, 2021
IWOCL & SYCLcon is the annual gathering of the international community of OpenCL and SYCL developers, researchers, suppliers and Khronos Working Group members to share best practice, and to advance the use and evolution of the Open Computing Language (OpenCL) and the SYCL standard for C++ programming of heterogeneous platforms and their associated ecosystems. This […]
Jan, 3
Design, Implementation and Test of Efficient GPU to GPU Communication Methods
Stencil codes are commonly used to solve many problems. On parallel heterogeneous systems with CPUs and GPUs, the domain is usually split and assigned to GPUs, where it is further divided to GPU blocks. The iterative distributed stencil computation consists of two steps – computation and communication, where the subdomains exchange boundary data, also called […]
Jan, 3
Interactive Parallelization of C Programs in SAPFOR
SAPFOR (System For Automated Parallelization) is a software development suite that is focused on cost reduction of manual program parallelization. SAPFOR produces parallel programs according to the high-level DVMH parallel programming model. SAPFOR relies on an implicitly parallel programming model, so it includes an automatic parallelizing compiler. On the other hand, it allows the user […]
Jan, 3
I/O Lower Bounds for Auto-tuning of Convolutions in CNNs
Convolution is the most time-consuming part in the computation of convolutional neural networks (CNNs), which have achieved great successes in numerous applications. Due to the complex data dependency and the increase in the amount of model samples, the convolution suffers from high overhead on data movement (i.e., memory access). This work provides comprehensive analysis and […]
Jan, 3
Thermal Safety and Real-Time Predictability on Heterogeneous Embedded SoC Platforms
Recent embedded systems are designed with high-performance System-on-Chips (SoCs) to satisfy the computational needs of complex applications widely used in real life, such as airplane controllers, autonomous driving automobiles, medical devices, drones, and hand-held devices. Modern SoCs integrate multi-core CPUs and various types of accelerators including GPUs and DSPs. Uncontrolled heat dissipation is one of […]