16656

Posts

Oct, 22

Efficient Random Sampling – Parallel, Vectorized, Cache-Efficient, and Online

We consider the problem of sampling $n$ numbers from the range ${1,ldots,N}$ without replacement on modern architectures. The main result is a simple divide-and-conquer scheme that makes sequential algorithms more cache efficient and leads to a parallel algorithm running in expected time $mathcal{O}left(n/p+log pright)$ on $p$ processors. The amount of communication between the processors is […]
Oct, 22

Sparse-Matrix support for the SkePU library for portable CPU/GPU programming

In this thesis work we have extended the SkePU framework by designing a new container data structure for the representation of generic two dimensional sparse matrices. Computation on matrices is an integral part of many scientific and engineering problems. Sometimes it is unnecessary to perform costly operations on zero entries of the matrix. If the […]
Oct, 22

Energy-efficient FPGA Implementation of the k-Nearest Neighbors Algorithm Using OpenCL

Modern SoCs are getting increasingly heterogeneous with a combination of multi-core architectures and hardware accelerators to speed up the execution of compute-intensive tasks at considerably lower power consumption. Modern FPGAs, due to their reasonable execution speed and comparatively lower power consumption, are strong competitors to the traditional GPU based accelerators. High-level Synthesis (HLS) simplifies FPGA […]
Oct, 22

CuMF_SGD: Fast and Scalable Matrix Factorization

Matrix factorization (MF) has been widely used in e.g., recommender systems, topic modeling and word embedding. Stochastic gradient descent (SGD) is popular in solving MF problems because it can deal with large data sets and is easy to do incremental learning. We observed that SGD for MF is memory bound. Meanwhile, single-node CPU systems with […]
Oct, 22

OpenMP, OpenMP/MPI, and CUDA/MPI C programs for solving the time-dependent dipolar Gross-Pitaevskii equation

We present new versions of the previously published C and CUDA programs for solving the dipolar Gross-Pitaevskii equation in one, two, and three spatial dimensions, which calculate stationary and non-stationary solutions by propagation in imaginary or real time. Presented programs are improved and parallelized versions of previous programs, divided into three packages according to the […]
Oct, 15

Efficient molecular dynamics simulations with many-body potentials on graphics processing units

Graphics processing units have been extensively used to accelerate classical molecular dynamics simulations. However, there is much less progress on the acceleration of force evaluations for many-body potentials compared to pairwise ones. In the conventional force evaluation algorithm for many-body potentials, the force, virial stress, and heat current for a given atom are accumulated within […]
Oct, 15

Reordering strategy for blocking optimization in sparse linear solvers

Solving sparse linear systems is a problem that arises in many scientific applications, and sparse direct solvers are a time consuming and key kernel for those applications and for more advanced solvers such as hybrid direct-iterative solvers. For this reason, optimizing their performance on modern architectures is critical. The preprocessing steps of sparse direct solvers, […]
Oct, 15

Machine Learning Based Intrusion Detection in Controller Area Networks

This project examines the feasibility of machine learning based fingerprinting of CAN transceivers for the purpose of uniquely identifying signal sources during intrusion detection. A working multi-node CAN bus development environment was constructed, and an OpenCL Deep Learning Python Wrapper was ported to the platform. Multiple Machine Learning Algorithms were compared Systematically, and two models […]
Oct, 15

Embedded real-time stereo estimation via Semi-Global Matching on the GPU

Dense, robust and real-time computation of depth information from stereo-camera systems is a computationally demanding requirement for robotics, advanced driver assistance systems (ADAS) and autonomous vehicles. Semi-Global Matching (SGM) is a widely used algorithm that propagates consistency constraints along several paths across the image. This work presents a real-time system producing reliable disparity estimation results […]
Oct, 15

GPU-accelerated real-time stixel computation

The Stixel World is a medium-level, compact representation of road scenes that abstracts millions of disparity pixels into hundreds or thousands of stixels. The goal of this work is to implement and evaluate a complete multi-stixel estimation pipeline on an embedded, energy-efficient, GPU-accelerated device. This work presents a full GPU-accelerated implementation of stixel estimation that […]
Oct, 14

International Conference on Network and Cyber Security (ICNCS), 2017

ICNCS 2017, International Conference on Network and Cyber Security, will take place in Lakeland,Florida, United States from May 19-23, 2017. ICNCS 2017 is a not-to-be-missed opportunity that distills the most current knowledge on a rapidly advancing discipline in one conference. Join key researchers and established professionals in the field of Network and Cyber Security as […]
Oct, 14

The 2nd International Conference on Electronics Engineering and Informatics (ICEEI), 2017

2017 The 2nd International Conference on Electronics Engineering and Informatics (ICEEI 2017) will be held during June 25-27, in Beijing,China, as the workshop of WCSE 2017. ICEEI’s set is based on the success of WCSE conferences, the research papers published in WCSE proceedings had been indexed by EI, Scopus each year, as the workshop of […]

* * *

* * *

HGPU group © 2010-2024 hgpu.org

All rights belong to the respective authors

Contact us: