Christian Pinto
During the last few decades an unprecedented technological growth has been at the center of the embedded systems design paramount, with Moore’s Law being the leading factor of this trend. Today in fact an ever increasing number of cores can be integrated on the same die, marking the transition from state-of-the-art multi-core chips to the […]
View View   Download Download (PDF)   
Mark Stephenson, Siva Kumar Sastry Hari, Yunsup Lee, Eiman Ebrahimi, Daniel R. Johnson, David Nellans, Mike O'Connor, Stephen W. Keckler
To aid application characterization and architecture design space exploration, researchers and engineers have developed a wide range of tools for CPUs, including simulators, profilers, and binary instrumentation tools. With the advent of GPU computing, GPU manufacturers have developed similar tools leveraging hardware profiling and debugging hooks. To date, these tools are largely limited by the […]
View View   Download Download (PDF)   
Edoardo Paone
Parallel programming is a skill which software engineers no longer can do without, since multi- and many-core architectures have been widely adopted for general-purpose computing platforms. In 2006 Intel introduced the first multi-core processor on the consumer market and, at the same time, NVIDIA unveiled CUDA, a programming paradigm to exploit Graphics Processing Units (GPUs) […]
View View   Download Download (PDF)   
Lai-Huei Wang
Dataflow models are widely used for expressing the functionality of digital signal processing (DSP) applications due to their useful features, such as providing formal mechanisms for description of application functionality, imposing minimal data-dependency constraints in specifications, and exposing task and data level parallelism effectively. Due to the increased complexity of dynamics in modern DSP applications, […]
View View   Download Download (PDF)   
Hana Park, Jungmin So, Young-Woong Ko, Jeong-Gun Lee
Power and energy consumptions are also becoming important design criteria. Consequently, software designs have to consider the power/energy consumptions together with performance when they are developing software. In this paper, we explore a design space exploration with a commercial GPU: nVidia GTX 660 for investigating the best configuration of a kernel grid structure in a […]
View View   Download Download (PDF)   
Wenhao Jia, Kelly A. Shaw, Margaret Martonosi
Graphics processing units (GPUs) are in increasingly wide use, but significant hurdles lie in selecting the appropriate algorithms, runtime parameter settings, and hardware configurations to achieve power and performance goals with them. Exploring hardware and software choices requires time-consuming simulations or extensive real-system measurements. While some auto-tuning support has been proposed, it is often narrow […]
View View   Download Download (PDF)   
Junying Chen
This work explored the design considerations on the real-time medical ultrasound adaptive beamformer implementations using different computing platforms: CPU, GPU and FPGA. Adaptive beamforming has been well considered as an advanced solution for improving the image quality of medical ultrasound imaging machines. Although it provides promising improvements in lateral resolution, image contrast and imaging penetration […]
View View   Download Download (PDF)   
Lai-Huei Wang, Chung-Ching Shen, Gunasekaran Seetharaman, Kannappan Palaniappan, Shuvra S. Bhattacharyya
Multidimensional synchronous dataflow (MDSDF) provides an effective model of computation for a variety of multidimensional DSP systems that have static dataflow structures. In this paper, we develop new methods for optimized implementation of MDSDF graphs on embedded platforms that employ multiple levels of parallelism to enhance performance at different levels of granularity. Our approach allows […]
View View   Download Download (PDF)   
Mahesh Nanjundappa, Anirudh Kaushik, Hiren D. Patel, Sandeep K. Shukla
Recent developments in graphics processing unit (GPU) technology has invigorated an interest in using GPUs for accelerating the simulation of SystemC models. SystemC is extensively used for design space exploration, and early performance analysis of hardware systems. SystemC’s reference implementation of the simulation kernel supports a single-threaded simulation kernel. However, modern computing platforms offer substantially […]
View View   Download Download (PDF)   
David Sheffield, Michael Anderson, Kurt Keutzer
We present Three Fingered Jack, a highly productive approach to mapping vectorizable applications to the FPGA. Our system applies traditional dependence analysis and reordering transformations to a restricted set of Python loop nests. It does this to uncover parallelism and divide computation between multiple parallel processing elements (PEs) that are automatically generated through high-level synthesis […]
Muhammad Shafiq
The basic concept behind the architecture of a general purpose CPU core conforms well to a serial programming model. The integration of more cores on a single chip helped CPUs in running parts of a program in parallel. However, the utilization of huge parallelism available from many high performance applications and the corresponding data is […]
View View   Download Download (PDF)   
G. Falcao, M. Owaida, D. Novo, M. Purnaprajna, N. Bellas, C.D. Antonopoulos, G. Karakonstantis, A. Burg, P. Ienne
Hardware designers and engineers typically need to explore a multi-parametric design space in order to find the best configuration for their designs using simulations that can take weeks to months to complete. For example, designers of special purpose chips need to explore parameters such as the optimal bit width and data representation. This is the […]
View View   Download Download (PDF)   
Page 1 of 3123

* * *

* * *

Follow us on Twitter

HGPU group

1666 peoples are following HGPU @twitter

Like us on Facebook

HGPU group

338 people like HGPU on Facebook

* * *

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.

The platforms are

Node 1
  • GPU device 0: nVidia GeForce GTX 560 Ti 2GB, 822MHz
  • GPU device 1: AMD/ATI Radeon HD 6970 2GB, 880MHz
  • CPU: AMD Phenom II X6 @ 2.8GHz 1055T
  • RAM: 12GB
  • OS: OpenSUSE 13.1
  • SDK: nVidia CUDA Toolkit 6.5.14, AMD APP SDK 3.0
Node 2
  • GPU device 0: AMD/ATI Radeon HD 7970 3GB, 1000MHz
  • GPU device 1: AMD/ATI Radeon HD 5870 2GB, 850MHz
  • CPU: Intel Core i7-2600 @ 3.4GHz
  • RAM: 16GB
  • OS: OpenSUSE 12.3
  • SDK: AMD APP SDK 3.0

Completed OpenCL project should be uploaded via User dashboard (see instructions and example there), compilation and execution terminal output logs will be provided to the user.

The information send to hgpu.org will be treated according to our Privacy Policy

HGPU group © 2010-2015 hgpu.org

All rights belong to the respective authors

Contact us: