Electrical Engineering

Efficient orchestration of sub-word parallelism in media processors

John Y. Oliver, University of California - DavisFollow
Venkatesh Akella, University of California - Davis
Frederic T. Chong, University of California - Davis

Recommended Citation

Postprint version. Published in Proceedings of the 16th Annual ACM Symposium on Parallelism in Algorithms and Architectures: Barcelona, Spain, June 27, 2004, pages 225-234.

NOTE: At the time of publication, the author John Oliver was not yet affiliated with Cal Poly.

The definitive version is available at https://doi.org/10.1145/1007912.1007946.

Abstract

Communication and multimedia applications with increased data rates and enhanced functionality continuously raise the bar for the computational requirements of future microprocessors. In order to meet these computational demands it is necessary to exploit sub-word parallelism efficiently. We propose to make sub-word data movement a first-class operation in microprocessor architectures by introducing a Sub-word Permutation Unit (SPU)in the execution pipeline. The SPU is evaluated in the context of the MMX media co-processor for the Intel Pentium architectures, but our results can be extended to any processor that supports sub-word parallelism. We find that the SPU all ws us to orchestrate sub-word data placement prior to computation, thus all wing the MMX functional units to concentrate on performing calculations. Furthermore, we introduce a decoupled SPU control mechanism at the basic block level which allows static optimization to eliminate data-movement verhead in tight loops, where most media and signal processing occurs. We demonstrated that anywhere from 4% to 20% improvement can be obtained on key media and signal processing kernels with as little as 1% increase in hardware resources.

Disciplines

Electrical and Computer Engineering

Copyright

2004 ACM

Publisher statement

This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of the 16th Annual ACM Symposium on Parallelism in Algorithms and Architectures: Barcelona, Spain.

Download

Included in

Electrical and Computer Engineering Commons

COinS

URL: https://digitalcommons.calpoly.edu/eeng_fac/125

Electrical Engineering

Efficient orchestration of sub-word parallelism in media processors

Recommended Citation

Abstract

Disciplines

Copyright

Publisher statement

Included in

Search

Browse

Author Corner

LINKS

Electrical Engineering

Efficient orchestration of sub-word parallelism in media processors

Author Info

Recommended Citation

Abstract

Disciplines

Copyright

Publisher statement

Included in

Share

Search

Browse

Author Corner

LINKS