| 1 | This is a patched version of zlib modified to use | 
|---|
| 2 | Pentium-optimized assembly code in the deflation algorithm. The files | 
|---|
| 3 | changed/added by this patch are: | 
|---|
| 4 |  | 
|---|
| 5 | README.586 | 
|---|
| 6 | match.S | 
|---|
| 7 |  | 
|---|
| 8 | The effectiveness of these modifications is a bit marginal, as the the | 
|---|
| 9 | program's bottleneck seems to be mostly L1-cache contention, for which | 
|---|
| 10 | there is no real way to work around without rewriting the basic | 
|---|
| 11 | algorithm. The speedup on average is around 5-10% (which is generally | 
|---|
| 12 | less than the amount of variance between subsequent executions). | 
|---|
| 13 | However, when used at level 9 compression, the cache contention can | 
|---|
| 14 | drop enough for the assembly version to achieve 10-20% speedup (and | 
|---|
| 15 | sometimes more, depending on the amount of overall redundancy in the | 
|---|
| 16 | files). Even here, though, cache contention can still be the limiting | 
|---|
| 17 | factor, depending on the nature of the program using the zlib library. | 
|---|
| 18 | This may also mean that better improvements will be seen on a Pentium | 
|---|
| 19 | with MMX, which suffers much less from L1-cache contention, but I have | 
|---|
| 20 | not yet verified this. | 
|---|
| 21 |  | 
|---|
| 22 | Note that this code has been tailored for the Pentium in particular, | 
|---|
| 23 | and will not perform well on the Pentium Pro (due to the use of a | 
|---|
| 24 | partial register in the inner loop). | 
|---|
| 25 |  | 
|---|
| 26 | If you are using an assembler other than GNU as, you will have to | 
|---|
| 27 | translate match.S to use your assembler's syntax. (Have fun.) | 
|---|
| 28 |  | 
|---|
| 29 | Brian Raiter | 
|---|
| 30 | breadbox@muppetlabs.com | 
|---|
| 31 | April, 1998 | 
|---|
| 32 |  | 
|---|
| 33 |  | 
|---|
| 34 | Added for zlib 1.1.3: | 
|---|
| 35 |  | 
|---|
| 36 | The patches come from | 
|---|
| 37 | http://www.muppetlabs.com/~breadbox/software/assembly.html | 
|---|
| 38 |  | 
|---|
| 39 | To compile zlib with this asm file, copy match.S to the zlib directory | 
|---|
| 40 | then do: | 
|---|
| 41 |  | 
|---|
| 42 | CFLAGS="-O3 -DASMV" ./configure | 
|---|
| 43 | make OBJA=match.o | 
|---|