Posts
Apr, 17
NBODY6++GPU: Ready for the gravitational million-body problem
Accurate direct N-body simulations help to obtain detailed information about the dynamical evolution of star clusters. They also enable comparisons with analytical models and Fokker-Planck or Monte-Carlo methods. NBODY6 is a well-known direct N-body code for star clusters, and NBODY6++ is the extended version designed for large particle number simulations by supercomputers. We present NBODY6++GPU, […]
Apr, 17
2nd International Conference on Communication and Signal Processing (ICCSP), 2015
Topics: Antennas, RF and Microwave Communications Audio / Speech Processing and Coding Array Signal Processing Bio Signal Processing Cognitive Radio and Cognitive Networks Digital Signal Processing Mobile and Cellular Communications MIMO and Space Time Communications Optical Communication OFDM and CDMA Communication Receivers Satellite Communication Statistical Signal Processing Signal Processing for Communications Signal Processing for Security […]
Apr, 15
Collaborative Diffusion on the GPU for Path-Finding in Games
Exploiting the powerful processing power available on the GPU in many machines, we investigate the performance of parallelised versions of pathfinding algorithms in typical game environments. We describe a parallel implementation of a collaborative diffusion algorithm that is shown to find short paths in real-time across a range of graph sizes and provide a comparison […]
Apr, 14
GPU Accelerated Randomized Singular Value Decomposition and Its Application in Image Compression
In this paper, we present a GPU-accelerated implementation of randomized Singular Value Decomposition (SVD) algorithm on a large matrix to rapidly approximate the top-k dominating singular values and correspondent singular vectors. The fundamental idea of randomized SVD is to condense a large matrix into a small dense matrix by random sampling while keeping the important […]
Apr, 14
A Parallel Tree Pattern Query Processing Algorithm for Graph Databases using a GPGPU
Large amounts of data are modeled and stored as graphs in order to express complex data relationships. Consequently, query processing on graph structures is becoming an important component in real-world applications. The most commonly used query format is that of tree pattern queries. We present a new parallel SIMD algorithm, GGQ (GPU Graph data base […]
Apr, 14
Image Classification with Pyramid Representation and Rotated Data Augmentation on Torch 7
This project classifies images in Tiny ImageNet Challenge, a dataset with 200 classes and 500 training examples for each class. Three network architectures are experimented: a traditional architecture with 4 convolutional layers + 2 fully-connected layers; a Tiny GoogleNet with 3 inception layers; and a pyramid representation-based network. Tiny GoogleNet achieved the highest top-1 validation […]
Apr, 14
Massively Parallel Ray Tracing Algorithm Using GPU
Ray tracing is a technique for generating an image by tracing the path of light through pixels in an image plane and simulating the effects of high-quality global illumination at a heavy computational cost. Because of the high computation complexity, it can’t reach the requirement of real-time rendering. The emergence of many-core architectures, makes it […]
Apr, 14
OpenCL-Z Android Released on Google Play
Developers have been using utility tools such as CPU-Z, GPU-Z, CUDA-Z, OpenCL-Z for a long time. These tools provide platform and hardware information in details and help developers quickly understand the hardware capabilities. Recently, OpenCL has been supported by most of the latest mobile phones/tablets, as the mobile GPUs are gaining more compute power. OpenCL-A […]
Apr, 12
GPU-based digital hologram reconstruction and particle detection
Digital holograms, when combined with tracer particles, can be used for examining otherwise-invisible fluid flows. These holograms can be captured with standard digital imaging equipment, however processing them to extract tracer or particle locations is computationally expensive. Exacerbating the issue is that hundreds or thousands of holograms must be reconstructed to analyze a single flow.Presented […]
Apr, 12
Framework for Batched and GPU-resident Factorization Algorithms Applied to Block Householder Transformations
As modern hardware keeps evolving, an increasingly effective approach to develop energy efficient and high-performance solvers is to design them to work on many small size and independent problems. Many applications already need this functionality, especially for GPUs, which are currently known to be about four to five times more energy efficient than multicore CPUs. […]
Apr, 12
Batched Matrix Computations on Hardware Accelerators
Scientific applications require solvers that work on many small size problems that are independent from each other. At the same time, the high-end hardware evolves rapidly and becomes ever more throughput-oriented and thus there is an increasing need for effective approach to develop energy efficient, high-performance codes for these small matrix problems that we call […]
Apr, 12
Robust real time face recognition and tracking on gpu using fusion of rgb and depth image
This paper presents a real-time face recognition system using kinect sensor. The algorithm is implemented on GPU using opencl and significant speed improvements are observed. We use kinect depth image to increase the robustness and reduce computational cost of conventional LBP based face recognition. The main objective of this paper was to perform robust, high […]