Posts
Jun, 10
CUDA Based Fast Implementation of Very Large Matrix Computation
CUDA (Compute Unified Device Architecture) acceleration of very large scale matrix-vector and matrix-matrix multiplication is presented in this paper. The intrinsic parallelism in the matrix computations are exploited thoroughly. By dividing the entire matrix computation to multiple sub-groups, scalable performance improvement can be achieved using multiple GPUs. The key operations are accelerated by GPU. And […]
Jun, 10
Planetary-Scale Terrain Composition
Many interrelated planetary height map and surface image map data sets exist, and more data are collected each day. Broad communities of scientists require tools to compose these data interactively and explore them via real-time visualization. While related, these data sets are often unregistered with one another, having different projection, resolution, format, and type. We […]
Jun, 10
The Research of Real-Time Shadow Rendering Algorithm of Virtual Scenes
Shadow scenes by shadow mapping has long suffered from the problem of under-sampling artifacts due to too little shadow map resolution leading to so-called perspective and projection aliasing. On this issue, we present a new practical real-time shadow mapping algorithm. Firstly we sample the scene from the eye-point on the GPU to get the needed […]
Jun, 10
Accelerating Multi-layer Perceptron based short term demand forecasting using Graphics Processing Units
Load forecasting plays a vitally important role in the operation and planning of the power system in a deregulated electricity market. A large variety of methods have been proposed for load forecasting. In this paper, we introduce the Graphics Processing Units (GPU) based computing to accelerate the short term load forecasting with multi-layer perceptron (MLP). […]
Jun, 10
The scoring sequences on profile Hidden Markov Models with delete states elimination by GPUs
A profile Hidden Markov Model (HMM) is well suited for representing profiles of multiple sequences alignments, and it has been becoming the main method of multiple sequences alignments in bioinformatics. The scoring of sequences on profile HMMs is compute-intensive, especially when there are many Markov models and many states in each model. A parallel algorithm […]
Jun, 10
Real-time rain simulation in cartoon style
An efficient method for simulating cartoon style rain in 3D environment is proposed here. By taking advantage of the parallelism and programmability of GPUs (graphic processing units), real-time interaction can be achieved. Splashing of raindrop is simulated using collision detection, series of stylized textures and rotations of point sprites. To simulate wind-driven raining effect, the […]
Jun, 10
Real-time rendering of large-scale tree scene
High-quality, realistic visualization of vegetation and tree model is always a long-standing goal of complex virtual natural scene. Rendering a photo-realistic forest scene in real time has an important significance in simulating the growing tree. In this paper, we present a method of 3D tree modeling and a hybrid rendering algorithm of large-scale forest scene […]
Jun, 10
Handwritten Digit Recognition with a Committee of Deep Neural Nets on GPUs
The competitive MNIST handwritten digit recognition benchmark has a long history of broken records since 1998. The most recent substantial improvement by others dates back 7 years (error rate 0.4%) . Recently we were able to significantly improve this result, using graphics cards to greatly speed up training of simple but deep MLPs, which achieved […]
Jun, 9
Single molecule detection of tuberculosis nucleic acid using dark field Tethered Particle Motion
Current methods for tuberculosis nucleic acid detection require amplification and labeling before detection is possible. We propose here a method for direct detection using Tethered Particle Motion: gold nanoparticles are tethered to a glass substrate by single-stranded DNA molecules consisting of the complementary sequence to the target. Detection takes place by observing a change in […]
Jun, 9
cuGWAM: Genome-wide association multifactor dimensionality reduction using CUDA-enabled high-performance graphics processing unit
Multifactor dimensionality reduction (MDR) method has been widely applied to detect gene-gene interactions that are well recognized as playing an important role in understanding complex traits, such as disease susceptibility. However, because of an exhaustive analysis of MDR, the current MDR software has some limitations to be extended to the genome-wide association studies (GWAS) with […]
Jun, 9
Low-Frequency MLFMA on Graphics Processors
A parallelization of the low-frequency multilevel fast multipole algorithm (MLFMA) for graphics processing units (GPUs) is presented. The implementation exhibits speedups between 10 and 30 compared to a serial CPU implementation of the algorithm. The error of the MLFMA on the GPU is controllable down to machine precision. Under the typical method-of-moments (MoM) error requirement […]
Jun, 9
Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing
In this paper, we describe our experiment developing an implementation of the Linpack benchmark for TianHe-1, a petascale CPU/GPU supercomputer system, the largest GPU-accelerated system ever attempted before. An adaptive optimization framework is presented to balance the workload distribution across the GPUs and CPUs with the negligible runtime overhead, resulting in the better performance than […]