15639

Posts

Apr, 3

Classiffication-based Financial Markets Prediction using Deep Neural Networks

Deep neural networks (DNNs) are powerful types of artificial neural networks (ANNs) that use several hidden layers. They have recently gained considerable attention in the speech transcription and image recognition community (Krizhevsky et al., 2012) for their superior predictive properties including robustness to overfitting. However their application to algorithmic trading has not been previously researched, […]
Apr, 3

Dynamic Sparse-Matrix Allocation on GPUs

Sparse matrices are a core component in many numerical simulations, and their efficiency is essential to achieving high performance. Dynamic sparse-matrix allocation (insertion) can benefit a number of problems such as sparse-matrix factorization, sparse-matrix-matrix addition, static analysis (e.g. points-to analysis), computing transitive closure, and other graph algorithms. Existing sparse-matrix formats are poorly designed to handle […]
Apr, 3

Evaluating the Performance Impact of Multiple Streams on the MIC-based Heterogeneous Platform

Using multiple streams can improve the overall system performance by mitigating the data transfer overhead on heterogeneous systems. Prior work focuses a lot on GPUs but little is known about the performance impact on (Intel Xeon) Phi. In this work, we apply multiple streams into six real-world applications on Phi. We then systematically evaluate the […]
Mar, 29

A Novel CSR-Based Sparse Matrix-Vector Multiplication on GPUs

Sparse matrix-vector multiplication (SpMV) is an important operation in scientific computations. Compressed sparse row (CSR) is the most frequently used format to store sparse matrices. However, CSR-based SpMVs on graphic processing units (GPUs), e.g., CSR-scalar and CSR-vector, usually have poor performance due to irregular memory access patterns. This motivates us to propose a perfect CSR-based […]
Mar, 29

A generalized GPU-based connected component labeling algorithm

We propose a generalized GPU-based connected component labeling (CCL) algorithm that can be applied to both various lattices and to non-lattice environments in a uniform fashion. We extend our recent GPU-based CCL algorithm without the use of conventional iteration to the generalized method. As an application of this algorithm, we deal with the bond percolation […]
Mar, 29

Generic Inverted Index on the GPU

Data variety, as one of the three Vs of the Big Data, is manifested by a growing number of complex data types such as documents, sequences, trees, graphs and high dimensional vectors. To perform similarity search on these data, existing works mainly choose to create customized indexes for different data types. Due to the diversity […]
Mar, 29

A Stencil DSEL for Single Code Accelerated Computing with SYCL

Stencil kernels arise in many scientific codes as the result from dis-cretizing natural, continuous phenomenons. Many research works have designed stencil frameworks to help programmer optimize stencil kernels for performance, and to target CPUs or accelerators. However, existing stencil kernels, either library-based or language-based necessitate to write distinct source codes for accelerated kernels and for […]
Mar, 29

GPU Computing in Bayesian Inference of Realized Stochastic Volatility Model

The realized stochastic volatility (RSV) model that utilizes the realized volatility as additional information has been proposed to infer volatility of financial time series. We consider the Bayesian inference of the RSV model by the Hybrid Monte Carlo (HMC) algorithm. The HMC algorithm can be parallelized and thus performed on the GPU for speedup. The […]
Mar, 25

Efficient Exact Gradient Update for training Deep Networks with Very Large Sparse Targets

An important class of problems involves training deep neural networks with sparse prediction targets of very high dimension D. These occur naturally in e.g. neural language models or the learning of word-embeddings, often posed as predicting the probability of next words among a vocabulary of size D (e.g. 200,000). Computing the equally large, but typically […]
Mar, 25

Wanted: Floating-Point Add Round-off Error instruction

We propose a new instruction (FPADDRE) that computes the round-off error in floating-point addition. We explain how this instruction benefits high-precision arithmetic operations in applications where double precision is not sufficient. Performance estimates on Intel Haswell, Intel Skylake, and AMD Steamroller processors, as well as Intel Knights Corner co-processor, demonstrate that such an instruction would […]
Mar, 25

Accelerating Deep Neural Network Training with Inconsistent Stochastic Gradient Descent

SGD is the widely adopted method to train CNN. Conceptually it approximates the population with a randomly sampled batch; then it evenly trains batches by conducting a gradient update on every batch in an epoch. In this paper, we demonstrate Sampling Bias, Intrinsic Image Difference and Fixed Cycle Pseudo Random Sampling differentiate batches in training, […]
Mar, 25

An Efficient Implementation of the Longest Common Subsequence Algorithm with Bit-Parallelism on GPUs

The longest common subsequence (LCS) for two given strings has various applications, such as for the comparison of deoxyribonucleic acid (DNA). In this thesis, we propose a graphics processing unit (GPU) algorithm to accelerate Hirschberg’s LCS algorithm improved with the bit-parallel algorithm by Crochemore et al. The algorithm by Crochemore et al. includes bitwise logical […]

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: