[9] | 1 | This is a patched version of zlib modified to use
|
---|
| 2 | Pentium-optimized assembly code in the deflation algorithm. The files
|
---|
| 3 | changed/added by this patch are:
|
---|
| 4 |
|
---|
| 5 | README.586
|
---|
| 6 | match.S
|
---|
| 7 |
|
---|
| 8 | The effectiveness of these modifications is a bit marginal, as the the
|
---|
| 9 | program's bottleneck seems to be mostly L1-cache contention, for which
|
---|
| 10 | there is no real way to work around without rewriting the basic
|
---|
| 11 | algorithm. The speedup on average is around 5-10% (which is generally
|
---|
| 12 | less than the amount of variance between subsequent executions).
|
---|
| 13 | However, when used at level 9 compression, the cache contention can
|
---|
| 14 | drop enough for the assembly version to achieve 10-20% speedup (and
|
---|
| 15 | sometimes more, depending on the amount of overall redundancy in the
|
---|
| 16 | files). Even here, though, cache contention can still be the limiting
|
---|
| 17 | factor, depending on the nature of the program using the zlib library.
|
---|
| 18 | This may also mean that better improvements will be seen on a Pentium
|
---|
| 19 | with MMX, which suffers much less from L1-cache contention, but I have
|
---|
| 20 | not yet verified this.
|
---|
| 21 |
|
---|
| 22 | Note that this code has been tailored for the Pentium in particular,
|
---|
| 23 | and will not perform well on the Pentium Pro (due to the use of a
|
---|
| 24 | partial register in the inner loop).
|
---|
| 25 |
|
---|
| 26 | If you are using an assembler other than GNU as, you will have to
|
---|
| 27 | translate match.S to use your assembler's syntax. (Have fun.)
|
---|
| 28 |
|
---|
| 29 | Brian Raiter
|
---|
| 30 | breadbox@muppetlabs.com
|
---|
| 31 | April, 1998
|
---|
| 32 |
|
---|
| 33 |
|
---|
| 34 | Added for zlib 1.1.3:
|
---|
| 35 |
|
---|
| 36 | The patches come from
|
---|
| 37 | http://www.muppetlabs.com/~breadbox/software/assembly.html
|
---|
| 38 |
|
---|
| 39 | To compile zlib with this asm file, copy match.S to the zlib directory
|
---|
| 40 | then do:
|
---|
| 41 |
|
---|
| 42 | CFLAGS="-O3 -DASMV" ./configure
|
---|
| 43 | make OBJA=match.o
|
---|