## Posts

Dec, 31

### Parallel Digital Predistortion Design on Mobile GPU and Embedded Multicore CPU for Mobile Transmitters

Digital predistortion (DPD) is a widely adopted baseband processing technique in current radio transmitters. While DPD can effectively suppress unwanted spurious spectrum emissions stemming from imperfections of analog RF and baseband electronics, it also introduces extra processing complexity and poses challenges on efficient and flexible implementations, especially for mobile cellular transmitters, considering their limited computing […]

Dec, 31

### dOpenCL – Evaluation of an API-Forwarding Implementation

Parallel workloads using compute resources such as GPUs and accelerators is a rapidly developing trend in the field of high performance computing. At the same time, virtualization is a generally accepted solution to share compute resources with remote users in a secure and isolated way. However, accessing compute resources from inside virtualized environments still poses […]

Dec, 26

### Language Modeling with Gated Convolutional Networks

The pre-dominant approach to language modeling to date is based on recurrent neural networks. In this paper we present a convolutional approach to language modeling. We introduce a novel gating mechanism that eases gradient propagation and which performs better than the LSTM-style gating of (Oord et al, 2016) despite being simpler. We achieve a new […]

Dec, 26

### Batched Shift Reduce Parsing with Lists of Vectors on CUDA

Shift Reduce Parsing is a common algorithm used in compilers and natural language processing, and can be used to compose a sequence of fixed-length vectors into a single vector of equal length. Previous versions are implemented using predetermined computational graphs that trade excessive memory and computation to minimize transfers of memory from the device to […]

Dec, 26

### Function Call Re-Vectorization

Programming languages such as C for CUDA, OpenCL or ISPC have contributed to increase the programmability of SIMD accelerators and graphics processing units. However, these languages still lack the flexibility offered by lowlevel SIMD programming on explicit vectors. To close this expressiveness gap while preserving performance, this paper introduces the notion of Call Re-Vectorization (CREV). […]

Dec, 26

### Accelerating Lattice QCD Multigrid on GPUs Using Fine-Grained Parallelization

The past decade has witnessed a dramatic acceleration of lattice quantum chromodynamics calculations in nuclear and particle physics. This has been due to both significant progress in accelerating the iterative linear solvers using multi-grid algorithms, and due to the throughput improvements brought by GPUs. Deploying hierarchical algorithms optimally on GPUs is non-trivial owing to the […]

Dec, 26

### Streaming GPU Singular Value and Dynamic Mode Decompositions

This work develops a parallelized algorithm to compute the dynamic mode decomposition (DMD) on a graphics processing unit using the streaming method of snapshots singular value decomposition. This allows the algorithm to operate efficiently on streaming data by avoiding redundant inner-products as new data becomes available. In addition, it is possible to leverage the native […]

Dec, 22

### High productivity multi-device exploitation with the Heterogeneous Programming Library

Heterogeneous devices require much more work from programmers than traditional CPUs, particularly when there are several of them, as each one has its own memory space. Multi-device applications require to distribute kernel executions and, even worse, arrays portions that must be kept coherent among the different device memories and the host memory. In addition, when […]

Dec, 22

### Log File Regular Expression Pattern Matching And Capture With GPUs

The information contained in a system is normally stored into log files. Most of the time, these files store the information in plain text with many not formatted information. It is then necessary to extract parts of this information to be able to understand what is going on such system. Currently, such information can be […]

Dec, 20

### Fluid Simulation: Smoothed Particle Hydrodynamics on the GPU

This report describes the physical concept of fluids as well as a mathematical model for fluids governed by the Navier-Stokes equations. The Smoothed Particle Hydrodynamics method (SPH) for simulating fluids is described, and implementation details of the method are explained. Numerical integration methods such as Euler and Leap-Frog integration are discussed. The presented result is […]

Dec, 20

### An EoS-meter of QCD transition from deep learning

Supervised learning with a deep convolutional neural network is used to identify the QCD equation of state (EoS) employed in relativistic hydrodynamic simulations of heavy-ion collisions. The final-state particle spectra $rho(p_T,Phi)$ provide directly accessible information from experiments. High-level correlations of $rho(p_T,Phi)$ learned by the neural network act as an "EoS-meter", effective in detecting the nature […]

Dec, 20

### Accelerating Spark RDD Operations with Local and Remote GPU Devices

Apache Spark is a distributed processing framework for large-scale data sets, where intermediate data sets are represented as RDDs (Resilient Distributed Datasets) and stored in memory distributed over machines. To accelerate its various computation intensive operations, such as reduction and sort, we focus on GPU devices. We modified Spark framework to invoke CUDA kernels when […]