Posts
Jan, 4
Evaluation of Multi-Threading in Vulkan
Today processor development has a lot of focus on parallel performance by providing multiple cores that programs can use. The problem with the current version of OpenGL is that it lacks support for utilizing multiple CPU threads for calling rendering commands. Vulkan is a new low level graphics API that gives more control to the […]
Jan, 4
An initial performance review of software components for a heterogeneous computing platform
The design of embedded systems is a complex activity that involves a lot of decisions. With high performance demands of present day usage scenarios and software, they often involve energy hungry state-of-the-art computing units. While focusing on power consumption of computing units, the physical properties of software are often ignored. Recently, there has been a […]
Dec, 31
Android Malware Classification Using Parallelized Machine Learning Methods
Android is the most popular mobile operating system with a market share of over 80%. Due to its popularity and also its open source nature, Android is now the platform most targeted by malware, creating an urgent need for effective defense mechanisms to protect Android-enabled devices. In this dissertation, we present a novel characterization and […]
Dec, 31
Synthesizing Benchmarks for Predictive Modeling
Predictive modeling using machine learning is an effective method for building compiler heuristics, but there is a shortage of benchmarks. Typical machine learning experiments outside of the compilation field train over thousands or millions of examples. In machine learning for compilers, however, there are typically only a few dozen common benchmarks available. This limits the […]
Dec, 31
Automatic OpenCL Task Adaptation for Heterogeneous Architectures
OpenCL defines a common parallel programming language for all devices, although writing tasks adapted to the devices, managing communication and load-balancing issues are left to the programmer. In this work, we propose a novel automatic compiler and runtime technique to execute single OpenCL kernels on heterogeneous multi-device architectures. The technique proposed is completely transparent to […]
Dec, 31
Parallel Digital Predistortion Design on Mobile GPU and Embedded Multicore CPU for Mobile Transmitters
Digital predistortion (DPD) is a widely adopted baseband processing technique in current radio transmitters. While DPD can effectively suppress unwanted spurious spectrum emissions stemming from imperfections of analog RF and baseband electronics, it also introduces extra processing complexity and poses challenges on efficient and flexible implementations, especially for mobile cellular transmitters, considering their limited computing […]
Dec, 31
dOpenCL – Evaluation of an API-Forwarding Implementation
Parallel workloads using compute resources such as GPUs and accelerators is a rapidly developing trend in the field of high performance computing. At the same time, virtualization is a generally accepted solution to share compute resources with remote users in a secure and isolated way. However, accessing compute resources from inside virtualized environments still poses […]
Dec, 26
Language Modeling with Gated Convolutional Networks
The pre-dominant approach to language modeling to date is based on recurrent neural networks. In this paper we present a convolutional approach to language modeling. We introduce a novel gating mechanism that eases gradient propagation and which performs better than the LSTM-style gating of (Oord et al, 2016) despite being simpler. We achieve a new […]
Dec, 26
Batched Shift Reduce Parsing with Lists of Vectors on CUDA
Shift Reduce Parsing is a common algorithm used in compilers and natural language processing, and can be used to compose a sequence of fixed-length vectors into a single vector of equal length. Previous versions are implemented using predetermined computational graphs that trade excessive memory and computation to minimize transfers of memory from the device to […]
Dec, 26
Function Call Re-Vectorization
Programming languages such as C for CUDA, OpenCL or ISPC have contributed to increase the programmability of SIMD accelerators and graphics processing units. However, these languages still lack the flexibility offered by lowlevel SIMD programming on explicit vectors. To close this expressiveness gap while preserving performance, this paper introduces the notion of Call Re-Vectorization (CREV). […]
Dec, 26
Accelerating Lattice QCD Multigrid on GPUs Using Fine-Grained Parallelization
The past decade has witnessed a dramatic acceleration of lattice quantum chromodynamics calculations in nuclear and particle physics. This has been due to both significant progress in accelerating the iterative linear solvers using multi-grid algorithms, and due to the throughput improvements brought by GPUs. Deploying hierarchical algorithms optimally on GPUs is non-trivial owing to the […]
Dec, 26
Streaming GPU Singular Value and Dynamic Mode Decompositions
This work develops a parallelized algorithm to compute the dynamic mode decomposition (DMD) on a graphics processing unit using the streaming method of snapshots singular value decomposition. This allows the algorithm to operate efficiently on streaming data by avoiding redundant inner-products as new data becomes available. In addition, it is possible to leverage the native […]