## Posts

Sep, 16

### ZUCL: A ZYNQ UltraScale+ Framework for OpenCL HLS Applications

In this work, we are proposing the ZUCL framework for implementing and running OpenCL applications for the latest Xilinx ZYNQ UltraScale+ platform. ZUCL is a holistic framework addressing the FPGA OS infrastructure, high level synthesis (HLS) module implementation as well as the runtime management. ZUCL enables partial reconfiguration (PR) on this platform by providing an […]

Sep, 9

### Efficient and Scalable k-Means on GPUs

k-Means is a versatile clustering algorithm widely used in practice. To cluster large data sets, state-of-the-art implementations use GPUs to shorten the data to knowledge time. These implementations commonly assign points on a GPU and update centroids on a CPU. We identify two main shortcomings of this approach. First, it requires expensive data exchange between […]

Sep, 9

### Developing a New Storage Format and a Warp-Based SpMV Kernel for Configuration Interaction Sparse Matrices on the GPU

Sparse matrix-vector multiplication (SpMV) can be used to solve diverse-scaled linear systems and eigenvalue problems that exist in numerous, and varying scientific applications. One of the scientific applications that SpMV is involved in is known as Configuration Interaction (CI). CI is a linear method for solving the non-relativistic Schroedinger equation for quantum chemical multi-electron systems, […]

Sep, 9

### Doctor AI: Interpretable Deep Learning for Modeling Electronic Health Records

Deep learning recently has been showing superior performance in complex domains such as computer vision, audio processing and natural language processing compared to traditional statistical methods. Naturally, deep learning techniques, combined with large electronic health records (EHR) data generated from healthcare organizations have potential to bring dramatic changes to the healthcare industry. However, typical deep […]

Sep, 9

### Using SIMD and SIMT vectorization to evaluate sparse chemical kinetic Jacobian matrices and thermochemical source terms

Accurately predicting key combustion phenomena in reactive-flow simulations, e.g., lean blow-out, extinction/ignition limits and pollutant formation, necessitates the use of detailed chemical kinetics. The large size and high levels of numerical stiffness typically present in chemical kinetic models relevant to transportation/power-generation applications make the efficient evaluation/factorization of the chemical kinetic Jacobian and thermochemical source-terms critical […]

Sep, 9

### Cracks in the Sky: Abelian-Higgs Cosmic String Evolution with CUDA

Topological defects form at cosmological phase transitions by the Kibble mechanism, with cosmic strings and superstrings having the most interesting phenomenology. A rigorous analysis of their astrophysical consequences is limited by the availability of accurate numerical simulations, and therefore by hardware resources and computation time. Improving the speed and efficiency of existing codes is therefore […]

Sep, 2

### Optimizing Communication for Clusters of GPUs

GPUs are frequently used to accelerate data-parallel workloads across a wide variety of application domains. While GPUs offer a large amount of computational throughput within a single node, the largest problems require a cluster of such devices communicating with different compute nodes across a network. These clusters can range in size from a small handful […]

Sep, 2

### Performance Evaluation and Tuning of An OpenCL based Matrix Multiplier

Matrix multiplication is one of the fundamental building blocks of numerical linear algebra. It requires computer systems have huge computing capability and consumes much more power as problem size is increased. In this research, an OpenCL-based matrix multiplier is presented. When data are single precision floating-points, compared with the software simulations based on the Intel […]

Sep, 2

### Implementing Strassen’s Algorithm with CUTLASS on NVIDIA Volta GPUs

Conventional GPU implementations of Strassen’s algorithm (Strassen) typically rely on the existing high-performance matrix multiplication (GEMM), trading space for time. As a result, such approaches can only achieve practical speedup for relatively large, "squarish" matrices due to the extra memory overhead, and their usages are limited due to the considerable workspace. We present novel Strassen […]

Sep, 2

### Full Speed Ahead: 3D Spatial Database Acceleration with GPUs

Many industries rely on visual insights to support decision- making processes in their businesses. In mining, the analysis of drills and geological shapes, represented as 3D geometries, is an important tool to assist geologists on the search for new ore deposits. Aeronautics manipulate high-resolution geometries when designing a new aircraft aided by the numerical simulation […]

Sep, 2

### A study of integer sorting on multicores

Integer sorting on multicores and GPUs can be realized by a variety of approaches that include variants of distribution-based methods such as radix-sort, comparison-oriented algorithms such as deterministic regular sampling and random sampling parallel sorting, and network-based algorithms such as Batcher’s bitonic sorting algorithm. In this work we present an experimental study of integer sorting […]

Aug, 26

### Deep learning: A guide for practitioners in the physical sciences

Machine learning is finding increasingly broad applications in the physical sciences. This most often involves building a model relationship between a dependent, measurable output, and an associated set of controllable, but complicated, independent inputs. We present a tutorial on current techniques in machine learning – a jumping-off point for interested researchers to advance their work. […]