8499

Posts

Oct, 26

Parallel Verlet neighbor list algorithm for GPU-optimized MD simulations

Understanding protein and RNA biomolecular folding and assembly processes have important applications because misfolding is associated with diseases like Alzheimer’s and Parkinson’s. However, simulating biologically relevant biomolecules on timescales that correspond to biological functions is an extraordinary challenge due to bottlenecks that are mainly involved in force calculations. We briefly review the molecular dynamics (MD) […]
Oct, 25

Automatic Generation Of Application-Specific Accelerators for FPGAs from Python Loop Nests

We present Three Fingered Jack, a highly productive approach to mapping vectorizable applications to the FPGA. Our system applies traditional dependence analysis and reordering transformations to a restricted set of Python loop nests. It does this to uncover parallelism and divide computation between multiple parallel processing elements (PEs) that are automatically generated through high-level synthesis […]
Oct, 25

GPU-Based Asynchronous Global Optimization with Particle Swarm

The recent upsurge in research into general-purpose applications for graphics processing units (GPUs) has made low cost high-performance computing increasingly more accessible. Many global optimization algorithms that have previously benefited from parallel computation are now poised to take advantage of general-purpose GPU computing as well. In this paper, a global parallel asynchronous particle swarm optimization […]
Oct, 25

Modular & Scalable Ultrasound Platform with GPU Processing

The objective of our project is to develop a complete ultrasound platform with real-time GPU processing. The platform is designed to be modular and scalable both in number of ultrasound channels (64-256), as well as in communication bandwidth and processing power. By standardizing on the PCIe switch fabric, we are planning to integrate all the […]
Oct, 25

A structural analysis of the A5/1 state transition graph

We describe efficient algorithms to analyze the cycle structure of the graph induced by the state transition function of the A5/1 stream cipher used in GSM mobile phones and report on the results of the implementation. The analysis is performed in five steps utilizing HPC clusters, GPGPU and external memory computation. A great reduction of […]
Oct, 25

A Comparison of Sequential and GPU Implementations of Iterative Methods to Compute Reachability Probabilities

We consider the problem of computing reachability probabilities: given a Markov chain, an initial state of the Markov chain, and a set of goal states of the Markov chain, what is the probability of reaching any of the goal states from the initial state? This problem can be reduced to solving a linear equation Ax=b […]
Oct, 24

GPU Implementation of the STA Algorithm on I/Q Data

GPU computing is a new paradigm in high performance signal and image processing. Massive parallel processing offered by the GPUs provides high acceleration of computations when they are properly implemented. Ultrasound image reconstruction is one of these highly parallel classes of algorithms. Massive amount of multichannel input data and deterministic order of execution makes US […]
Oct, 24

Research for Chinese Spam Filtering Based on GPU

Spam has become a more and more serious problem as the wide use of E-mail. Spam filtering based on mail content is a mainstream technology to solve the spam. However, the efficiency of spam filtering algorithm is becoming a bottleneck when it is used in the training of a great amount of mail samples or […]
Oct, 24

Dawn of GPU Era-Potentials of Chaos Theory

In the present era Chaos theory has tremendous potential in Computer Science Domain. The true potential of Chaos theory can be realized with the assistance of high performance computing aids such as GPU that have become available in present times. The main purpose is to develop a high performance experimental laboratory in academic institutions, for […]
Oct, 24

Large-scale Monte Carlo simulation of two-dimensional classical XY model using multiple GPUs

We study the two-dimensional classical XY model by the large-scale Monte Carlo simulation of the Swendsen-Wang multi-cluster algorithm using multiple GPUs on the open science supercomputer TSUBAME 2.0. Simulating systems up to the linear system size L=65536, we investigate the Kosterlitz-Thouless (KT) transition. Using the generalized version of the probability-changing cluster algorithm based on the […]
Oct, 24

Floating-Point Arithmetic in Transport Triggered Architectures

Many computational applications have high performance and energy-efficiency requirements which "off-the-shelf" general-purpose processors cannot meet. On the other hand, designing special-purpose hardware accelerators can be prohibitively expensive in terms of development time. One approach to the problem is to design an Application-Specific Instruction set Processor (ASIP), which is programmable, but tailor-made for the task at […]
Oct, 23

Task Parallelism and Data Distribution: An Overview of Explicit Parallel Programming Languages

Programming parallel machines as effectively as sequential ones would ideally require a language that provides high-level programming constructs to avoid the programming errors frequent when expressing parallelism. Since task parallelism is considered more error-prone than data parallelism, we survey six popular and efficient parallel language designs that tackle this difficult issue: Cilk, Chapel, X10, Habanero-Java, […]

Recent source codes

* * *

* * *

HGPU group © 2010-2025 hgpu.org

All rights belong to the respective authors

Contact us: