source: trunk/minix/commands/bzip2-1.0.3/manual.html@ 10

Last change on this file since 10 was 9, checked in by Mattia Monga, 14 years ago

Minix 3.1.2a

File size: 121.5 KB
RevLine 
[9]1<html>
2<head>
3<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
4<title>bzip2 and libbzip2, version 1.0.3</title>
5<meta name="generator" content="DocBook XSL Stylesheets V1.64.1">
6<style type="text/css" media="screen">/* Colours:
7#74240f dark brown h1, h2, h3, h4
8#336699 medium blue links
9#339999 turquoise link hover colour
10#202020 almost black general text
11#761596 purple md5sum text
12#626262 dark gray pre border
13#eeeeee very light gray pre background
14#f2f2f9 very light blue nav table background
15#3366cc medium blue nav table border
16*/
17
18a, a:link, a:visited, a:active { color: #336699; }
19a:hover { color: #339999; }
20
21body { font: 80%/126% sans-serif; }
22h1, h2, h3, h4 { color: #74240f; }
23
24dt { color: #336699; font-weight: bold }
25dd {
26 margin-left: 1.5em;
27 padding-bottom: 0.8em;
28}
29
30/* -- ruler -- */
31div.hr_blue {
32 height: 3px;
33 background:#ffffff url("/images/hr_blue.png") repeat-x; }
34div.hr_blue hr { display:none; }
35
36/* release styles */
37#release p { margin-top: 0.4em; }
38#release .md5sum { color: #761596; }
39
40
41/* ------ styles for docs|manuals|howto ------ */
42/* -- lists -- */
43ul {
44 margin: 0px 4px 16px 16px;
45 padding: 0px;
46 list-style: url("/images/li-blue.png");
47}
48ul li {
49 margin-bottom: 10px;
50}
51ul ul {
52 list-style-type: none;
53 list-style-image: none;
54 margin-left: 0px;
55}
56
57/* header / footer nav tables */
58table.nav {
59 border: solid 1px #3366cc;
60 background: #f2f2f9;
61 background-color: #f2f2f9;
62 margin-bottom: 0.5em;
63}
64/* don't have underlined links in chunked nav menus */
65table.nav a { text-decoration: none; }
66table.nav a:hover { text-decoration: underline; }
67table.nav td { font-size: 85%; }
68
69code, tt, pre { font-size: 120%; }
70code, tt { color: #761596; }
71
72div.literallayout, pre.programlisting, pre.screen {
73 color: #000000;
74 padding: 0.5em;
75 background: #eeeeee;
76 border: 1px solid #626262;
77 background-color: #eeeeee;
78 margin: 4px 0px 4px 0px;
79}
80</style>
81</head>
82<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="book" lang="en">
83<div class="titlepage">
84<div>
85<div><h1 class="title">
86<a name="userman"></a>bzip2 and libbzip2, version 1.0.3</h1></div>
87<div><h2 class="subtitle">A program and library for data compression</h2></div>
88<div><div class="authorgroup"><div class="author">
89<h3 class="author">
90<span class="firstname">Julian</span> <span class="surname">Seward</span>
91</h3>
92<div class="affiliation"><span class="orgname">http://www.bzip.org<br></span></div>
93</div></div></div>
94<div><p class="releaseinfo">Version 1.0.3 of 15 February 2005</p></div>
95<div><p class="copyright">Copyright © 1996-2005 Julian Seward</p></div>
96<div><div class="legalnotice">
97<p>This program, <tt class="computeroutput">bzip2</tt>, the
98 associated library <tt class="computeroutput">libbzip2</tt>, and
99 all documentation, are copyright © 1996-2005 Julian Seward.
100 All rights reserved.</p>
101<p>Redistribution and use in source and binary forms, with
102 or without modification, are permitted provided that the
103 following conditions are met:</p>
104<div class="itemizedlist"><ul type="bullet">
105<li style="list-style-type: disc"><p>Redistributions of source code must retain the
106 above copyright notice, this list of conditions and the
107 following disclaimer.</p></li>
108<li style="list-style-type: disc"><p>The origin of this software must not be
109 misrepresented; you must not claim that you wrote the original
110 software. If you use this software in a product, an
111 acknowledgment in the product documentation would be
112 appreciated but is not required.</p></li>
113<li style="list-style-type: disc"><p>Altered source versions must be plainly marked
114 as such, and must not be misrepresented as being the original
115 software.</p></li>
116<li style="list-style-type: disc"><p>The name of the author may not be used to
117 endorse or promote products derived from this software without
118 specific prior written permission.</p></li>
119</ul></div>
120<p>THIS SOFTWARE IS PROVIDED BY THE AUTHOR "AS IS" AND ANY
121 EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
122 THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
123 PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
124 AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
125 EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
126 TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
127 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
128 ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
129 LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
130 IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
131 THE POSSIBILITY OF SUCH DAMAGE.</p>
132<p>PATENTS: To the best of my knowledge,
133 <tt class="computeroutput">bzip2</tt> and
134 <tt class="computeroutput">libbzip2</tt> do not use any patented
135 algorithms. However, I do not have the resources to carry
136 out a patent search. Therefore I cannot give any guarantee of
137 the above statement.
138 </p>
139</div></div>
140</div>
141<div></div>
142<hr>
143</div>
144<div class="toc">
145<p><b>Table of Contents</b></p>
146<dl>
147<dt><span class="chapter"><a href="#intro">1. Introduction</a></span></dt>
148<dt><span class="chapter"><a href="#using">2. How to use bzip2</a></span></dt>
149<dd><dl>
150<dt><span class="sect1"><a href="#name">2.1. NAME</a></span></dt>
151<dt><span class="sect1"><a href="#synopsis">2.2. SYNOPSIS</a></span></dt>
152<dt><span class="sect1"><a href="#description">2.3. DESCRIPTION</a></span></dt>
153<dt><span class="sect1"><a href="#options">2.4. OPTIONS</a></span></dt>
154<dt><span class="sect1"><a href="#memory-management">2.5. MEMORY MANAGEMENT</a></span></dt>
155<dt><span class="sect1"><a href="#recovering">2.6. RECOVERING DATA FROM DAMAGED FILES</a></span></dt>
156<dt><span class="sect1"><a href="#performance">2.7. PERFORMANCE NOTES</a></span></dt>
157<dt><span class="sect1"><a href="#caveats">2.8. CAVEATS</a></span></dt>
158<dt><span class="sect1"><a href="#author">2.9. AUTHOR</a></span></dt>
159</dl></dd>
160<dt><span class="chapter"><a href="#libprog">3.
161Programming with libbzip2
162</a></span></dt>
163<dd><dl>
164<dt><span class="sect1"><a href="#top-level">3.1. Top-level structure</a></span></dt>
165<dd><dl>
166<dt><span class="sect2"><a href="#ll-summary">3.1.1. Low-level summary</a></span></dt>
167<dt><span class="sect2"><a href="#hl-summary">3.1.2. High-level summary</a></span></dt>
168<dt><span class="sect2"><a href="#util-fns-summary">3.1.3. Utility functions summary</a></span></dt>
169</dl></dd>
170<dt><span class="sect1"><a href="#err-handling">3.2. Error handling</a></span></dt>
171<dt><span class="sect1"><a href="#low-level">3.3. Low-level interface</a></span></dt>
172<dd><dl>
173<dt><span class="sect2"><a href="#bzcompress-init">3.3.1. BZ2_bzCompressInit</a></span></dt>
174<dt><span class="sect2"><a href="#bzCompress">3.3.2. BZ2_bzCompress</a></span></dt>
175<dt><span class="sect2"><a href="#bzCompress-end">3.3.3. BZ2_bzCompressEnd</a></span></dt>
176<dt><span class="sect2"><a href="#bzDecompress-init">3.3.4. BZ2_bzDecompressInit</a></span></dt>
177<dt><span class="sect2"><a href="#bzDecompress">3.3.5. BZ2_bzDecompress</a></span></dt>
178<dt><span class="sect2"><a href="#bzDecompress-end">3.3.6. BZ2_bzDecompressEnd</a></span></dt>
179</dl></dd>
180<dt><span class="sect1"><a href="#hl-interface">3.4. High-level interface</a></span></dt>
181<dd><dl>
182<dt><span class="sect2"><a href="#bzreadopen">3.4.1. BZ2_bzReadOpen</a></span></dt>
183<dt><span class="sect2"><a href="#bzread">3.4.2. BZ2_bzRead</a></span></dt>
184<dt><span class="sect2"><a href="#bzreadgetunused">3.4.3. BZ2_bzReadGetUnused</a></span></dt>
185<dt><span class="sect2"><a href="#bzreadclose">3.4.4. BZ2_bzReadClose</a></span></dt>
186<dt><span class="sect2"><a href="#bzwriteopen">3.4.5. BZ2_bzWriteOpen</a></span></dt>
187<dt><span class="sect2"><a href="#bzwrite">3.4.6. BZ2_bzWrite</a></span></dt>
188<dt><span class="sect2"><a href="#bzwriteclose">3.4.7. BZ2_bzWriteClose</a></span></dt>
189<dt><span class="sect2"><a href="#embed">3.4.8. Handling embedded compressed data streams</a></span></dt>
190<dt><span class="sect2"><a href="#std-rdwr">3.4.9. Standard file-reading/writing code</a></span></dt>
191</dl></dd>
192<dt><span class="sect1"><a href="#util-fns">3.5. Utility functions</a></span></dt>
193<dd><dl>
194<dt><span class="sect2"><a href="#bzbufftobuffcompress">3.5.1. BZ2_bzBuffToBuffCompress</a></span></dt>
195<dt><span class="sect2"><a href="#bzbufftobuffdecompress">3.5.2. BZ2_bzBuffToBuffDecompress</a></span></dt>
196</dl></dd>
197<dt><span class="sect1"><a href="#zlib-compat">3.6. zlib compatibility functions</a></span></dt>
198<dt><span class="sect1"><a href="#stdio-free">3.7. Using the library in a stdio-free environment</a></span></dt>
199<dd><dl>
200<dt><span class="sect2"><a href="#stdio-bye">3.7.1. Getting rid of stdio</a></span></dt>
201<dt><span class="sect2"><a href="#critical-error">3.7.2. Critical error handling</a></span></dt>
202</dl></dd>
203<dt><span class="sect1"><a href="#win-dll">3.8. Making a Windows DLL</a></span></dt>
204</dl></dd>
205<dt><span class="chapter"><a href="#misc">4. Miscellanea</a></span></dt>
206<dd><dl>
207<dt><span class="sect1"><a href="#limits">4.1. Limitations of the compressed file format</a></span></dt>
208<dt><span class="sect1"><a href="#port-issues">4.2. Portability issues</a></span></dt>
209<dt><span class="sect1"><a href="#bugs">4.3. Reporting bugs</a></span></dt>
210<dt><span class="sect1"><a href="#package">4.4. Did you get the right package?</a></span></dt>
211<dt><span class="sect1"><a href="#reading">4.5. Further Reading</a></span></dt>
212</dl></dd>
213</dl>
214</div>
215<div class="chapter" lang="en">
216<div class="titlepage">
217<div><div><h2 class="title">
218<a name="intro"></a>1. Introduction</h2></div></div>
219<div></div>
220</div>
221<p><tt class="computeroutput">bzip2</tt> compresses files
222using the Burrows-Wheeler block-sorting text compression
223algorithm, and Huffman coding. Compression is generally
224considerably better than that achieved by more conventional
225LZ77/LZ78-based compressors, and approaches the performance of
226the PPM family of statistical compressors.</p>
227<p><tt class="computeroutput">bzip2</tt> is built on top of
228<tt class="computeroutput">libbzip2</tt>, a flexible library for
229handling compressed data in the
230<tt class="computeroutput">bzip2</tt> format. This manual
231describes both how to use the program and how to work with the
232library interface. Most of the manual is devoted to this
233library, not the program, which is good news if your interest is
234only in the program.</p>
235<div class="itemizedlist"><ul type="bullet">
236<li style="list-style-type: disc"><p><a href="#using">How to use bzip2</a> describes how to use
237 <tt class="computeroutput">bzip2</tt>; this is the only part
238 you need to read if you just want to know how to operate the
239 program.</p></li>
240<li style="list-style-type: disc"><p><a href="#libprog">Programming with libbzip2</a> describes the
241 programming interfaces in detail, and</p></li>
242<li style="list-style-type: disc"><p><a href="#misc">Miscellanea</a> records some
243 miscellaneous notes which I thought ought to be recorded
244 somewhere.</p></li>
245</ul></div>
246</div>
247<div class="chapter" lang="en">
248<div class="titlepage">
249<div><div><h2 class="title">
250<a name="using"></a>2. How to use bzip2</h2></div></div>
251<div></div>
252</div>
253<div class="toc">
254<p><b>Table of Contents</b></p>
255<dl>
256<dt><span class="sect1"><a href="#name">2.1. NAME</a></span></dt>
257<dt><span class="sect1"><a href="#synopsis">2.2. SYNOPSIS</a></span></dt>
258<dt><span class="sect1"><a href="#description">2.3. DESCRIPTION</a></span></dt>
259<dt><span class="sect1"><a href="#options">2.4. OPTIONS</a></span></dt>
260<dt><span class="sect1"><a href="#memory-management">2.5. MEMORY MANAGEMENT</a></span></dt>
261<dt><span class="sect1"><a href="#recovering">2.6. RECOVERING DATA FROM DAMAGED FILES</a></span></dt>
262<dt><span class="sect1"><a href="#performance">2.7. PERFORMANCE NOTES</a></span></dt>
263<dt><span class="sect1"><a href="#caveats">2.8. CAVEATS</a></span></dt>
264<dt><span class="sect1"><a href="#author">2.9. AUTHOR</a></span></dt>
265</dl>
266</div>
267<p>This chapter contains a copy of the
268<tt class="computeroutput">bzip2</tt> man page, and nothing
269else.</p>
270<div class="sect1" lang="en">
271<div class="titlepage">
272<div><div><h2 class="title" style="clear: both">
273<a name="name"></a>2.1. NAME</h2></div></div>
274<div></div>
275</div>
276<div class="itemizedlist"><ul type="bullet">
277<li style="list-style-type: disc"><p><tt class="computeroutput">bzip2</tt>,
278 <tt class="computeroutput">bunzip2</tt> - a block-sorting file
279 compressor, v1.0.3</p></li>
280<li style="list-style-type: disc"><p><tt class="computeroutput">bzcat</tt> -
281 decompresses files to stdout</p></li>
282<li style="list-style-type: disc"><p><tt class="computeroutput">bzip2recover</tt> -
283 recovers data from damaged bzip2 files</p></li>
284</ul></div>
285</div>
286<div class="sect1" lang="en">
287<div class="titlepage">
288<div><div><h2 class="title" style="clear: both">
289<a name="synopsis"></a>2.2. SYNOPSIS</h2></div></div>
290<div></div>
291</div>
292<div class="itemizedlist"><ul type="bullet">
293<li style="list-style-type: disc"><p><tt class="computeroutput">bzip2</tt> [
294 -cdfkqstvzVL123456789 ] [ filenames ... ]</p></li>
295<li style="list-style-type: disc"><p><tt class="computeroutput">bunzip2</tt> [
296 -fkvsVL ] [ filenames ... ]</p></li>
297<li style="list-style-type: disc"><p><tt class="computeroutput">bzcat</tt> [ -s ] [
298 filenames ... ]</p></li>
299<li style="list-style-type: disc"><p><tt class="computeroutput">bzip2recover</tt>
300 filename</p></li>
301</ul></div>
302</div>
303<div class="sect1" lang="en">
304<div class="titlepage">
305<div><div><h2 class="title" style="clear: both">
306<a name="description"></a>2.3. DESCRIPTION</h2></div></div>
307<div></div>
308</div>
309<p><tt class="computeroutput">bzip2</tt> compresses files
310using the Burrows-Wheeler block sorting text compression
311algorithm, and Huffman coding. Compression is generally
312considerably better than that achieved by more conventional
313LZ77/LZ78-based compressors, and approaches the performance of
314the PPM family of statistical compressors.</p>
315<p>The command-line options are deliberately very similar to
316those of GNU <tt class="computeroutput">gzip</tt>, but they are
317not identical.</p>
318<p><tt class="computeroutput">bzip2</tt> expects a list of
319file names to accompany the command-line flags. Each file is
320replaced by a compressed version of itself, with the name
321<tt class="computeroutput">original_name.bz2</tt>. Each
322compressed file has the same modification date, permissions, and,
323when possible, ownership as the corresponding original, so that
324these properties can be correctly restored at decompression time.
325File name handling is naive in the sense that there is no
326mechanism for preserving original file names, permissions,
327ownerships or dates in filesystems which lack these concepts, or
328have serious file name length restrictions, such as
329MS-DOS.</p>
330<p><tt class="computeroutput">bzip2</tt> and
331<tt class="computeroutput">bunzip2</tt> will by default not
332overwrite existing files. If you want this to happen, specify
333the <tt class="computeroutput">-f</tt> flag.</p>
334<p>If no file names are specified,
335<tt class="computeroutput">bzip2</tt> compresses from standard
336input to standard output. In this case,
337<tt class="computeroutput">bzip2</tt> will decline to write
338compressed output to a terminal, as this would be entirely
339incomprehensible and therefore pointless.</p>
340<p><tt class="computeroutput">bunzip2</tt> (or
341<tt class="computeroutput">bzip2 -d</tt>) decompresses all
342specified files. Files which were not created by
343<tt class="computeroutput">bzip2</tt> will be detected and
344ignored, and a warning issued.
345<tt class="computeroutput">bzip2</tt> attempts to guess the
346filename for the decompressed file from that of the compressed
347file as follows:</p>
348<div class="itemizedlist"><ul type="bullet">
349<li style="list-style-type: disc"><p><tt class="computeroutput">filename.bz2 </tt>
350 becomes
351 <tt class="computeroutput">filename</tt></p></li>
352<li style="list-style-type: disc"><p><tt class="computeroutput">filename.bz </tt>
353 becomes
354 <tt class="computeroutput">filename</tt></p></li>
355<li style="list-style-type: disc"><p><tt class="computeroutput">filename.tbz2</tt>
356 becomes
357 <tt class="computeroutput">filename.tar</tt></p></li>
358<li style="list-style-type: disc"><p><tt class="computeroutput">filename.tbz </tt>
359 becomes
360 <tt class="computeroutput">filename.tar</tt></p></li>
361<li style="list-style-type: disc"><p><tt class="computeroutput">anyothername </tt>
362 becomes
363 <tt class="computeroutput">anyothername.out</tt></p></li>
364</ul></div>
365<p>If the file does not end in one of the recognised endings,
366<tt class="computeroutput">.bz2</tt>,
367<tt class="computeroutput">.bz</tt>,
368<tt class="computeroutput">.tbz2</tt> or
369<tt class="computeroutput">.tbz</tt>,
370<tt class="computeroutput">bzip2</tt> complains that it cannot
371guess the name of the original file, and uses the original name
372with <tt class="computeroutput">.out</tt> appended.</p>
373<p>As with compression, supplying no filenames causes
374decompression from standard input to standard output.</p>
375<p><tt class="computeroutput">bunzip2</tt> will correctly
376decompress a file which is the concatenation of two or more
377compressed files. The result is the concatenation of the
378corresponding uncompressed files. Integrity testing
379(<tt class="computeroutput">-t</tt>) of concatenated compressed
380files is also supported.</p>
381<p>You can also compress or decompress files to the standard
382output by giving the <tt class="computeroutput">-c</tt> flag.
383Multiple files may be compressed and decompressed like this. The
384resulting outputs are fed sequentially to stdout. Compression of
385multiple files in this manner generates a stream containing
386multiple compressed file representations. Such a stream can be
387decompressed correctly only by
388<tt class="computeroutput">bzip2</tt> version 0.9.0 or later.
389Earlier versions of <tt class="computeroutput">bzip2</tt> will
390stop after decompressing the first file in the stream.</p>
391<p><tt class="computeroutput">bzcat</tt> (or
392<tt class="computeroutput">bzip2 -dc</tt>) decompresses all
393specified files to the standard output.</p>
394<p><tt class="computeroutput">bzip2</tt> will read arguments
395from the environment variables
396<tt class="computeroutput">BZIP2</tt> and
397<tt class="computeroutput">BZIP</tt>, in that order, and will
398process them before any arguments read from the command line.
399This gives a convenient way to supply default arguments.</p>
400<p>Compression is always performed, even if the compressed
401file is slightly larger than the original. Files of less than
402about one hundred bytes tend to get larger, since the compression
403mechanism has a constant overhead in the region of 50 bytes.
404Random data (including the output of most file compressors) is
405coded at about 8.05 bits per byte, giving an expansion of around
4060.5%.</p>
407<p>As a self-check for your protection,
408<tt class="computeroutput">bzip2</tt> uses 32-bit CRCs to make
409sure that the decompressed version of a file is identical to the
410original. This guards against corruption of the compressed data,
411and against undetected bugs in
412<tt class="computeroutput">bzip2</tt> (hopefully very unlikely).
413The chances of data corruption going undetected is microscopic,
414about one chance in four billion for each file processed. Be
415aware, though, that the check occurs upon decompression, so it
416can only tell you that something is wrong. It can't help you
417recover the original uncompressed data. You can use
418<tt class="computeroutput">bzip2recover</tt> to try to recover
419data from damaged files.</p>
420<p>Return values: 0 for a normal exit, 1 for environmental
421problems (file not found, invalid flags, I/O errors, etc.), 2
422to indicate a corrupt compressed file, 3 for an internal
423consistency error (eg, bug) which caused
424<tt class="computeroutput">bzip2</tt> to panic.</p>
425</div>
426<div class="sect1" lang="en">
427<div class="titlepage">
428<div><div><h2 class="title" style="clear: both">
429<a name="options"></a>2.4. OPTIONS</h2></div></div>
430<div></div>
431</div>
432<div class="variablelist"><dl>
433<dt><span class="term"><tt class="computeroutput">-c --stdout</tt></span></dt>
434<dd><p>Compress or decompress to standard
435 output.</p></dd>
436<dt><span class="term"><tt class="computeroutput">-d --decompress</tt></span></dt>
437<dd><p>Force decompression.
438 <tt class="computeroutput">bzip2</tt>,
439 <tt class="computeroutput">bunzip2</tt> and
440 <tt class="computeroutput">bzcat</tt> are really the same
441 program, and the decision about what actions to take is done on
442 the basis of which name is used. This flag overrides that
443 mechanism, and forces bzip2 to decompress.</p></dd>
444<dt><span class="term"><tt class="computeroutput">-z --compress</tt></span></dt>
445<dd><p>The complement to
446 <tt class="computeroutput">-d</tt>: forces compression,
447 regardless of the invokation name.</p></dd>
448<dt><span class="term"><tt class="computeroutput">-t --test</tt></span></dt>
449<dd><p>Check integrity of the specified file(s), but
450 don't decompress them. This really performs a trial
451 decompression and throws away the result.</p></dd>
452<dt><span class="term"><tt class="computeroutput">-f --force</tt></span></dt>
453<dd>
454<p>Force overwrite of output files. Normally,
455 <tt class="computeroutput">bzip2</tt> will not overwrite
456 existing output files. Also forces
457 <tt class="computeroutput">bzip2</tt> to break hard links to
458 files, which it otherwise wouldn't do.</p>
459<p><tt class="computeroutput">bzip2</tt> normally declines
460 to decompress files which don't have the correct magic header
461 bytes. If forced (<tt class="computeroutput">-f</tt>),
462 however, it will pass such files through unmodified. This is
463 how GNU <tt class="computeroutput">gzip</tt> behaves.</p>
464</dd>
465<dt><span class="term"><tt class="computeroutput">-k --keep</tt></span></dt>
466<dd><p>Keep (don't delete) input files during
467 compression or decompression.</p></dd>
468<dt><span class="term"><tt class="computeroutput">-s --small</tt></span></dt>
469<dd>
470<p>Reduce memory usage, for compression,
471 decompression and testing. Files are decompressed and tested
472 using a modified algorithm which only requires 2.5 bytes per
473 block byte. This means any file can be decompressed in 2300k
474 of memory, albeit at about half the normal speed.</p>
475<p>During compression, <tt class="computeroutput">-s</tt>
476 selects a block size of 200k, which limits memory use to around
477 the same figure, at the expense of your compression ratio. In
478 short, if your machine is low on memory (8 megabytes or less),
479 use <tt class="computeroutput">-s</tt> for everything. See
480 <a href="#memory-management">MEMORY MANAGEMENT</a> below.</p>
481</dd>
482<dt><span class="term"><tt class="computeroutput">-q --quiet</tt></span></dt>
483<dd><p>Suppress non-essential warning messages.
484 Messages pertaining to I/O errors and other critical events
485 will not be suppressed.</p></dd>
486<dt><span class="term"><tt class="computeroutput">-v --verbose</tt></span></dt>
487<dd><p>Verbose mode -- show the compression ratio for
488 each file processed. Further
489 <tt class="computeroutput">-v</tt>'s increase the verbosity
490 level, spewing out lots of information which is primarily of
491 interest for diagnostic purposes.</p></dd>
492<dt><span class="term"><tt class="computeroutput">-L --license -V --version</tt></span></dt>
493<dd><p>Display the software version, license terms and
494 conditions.</p></dd>
495<dt><span class="term"><tt class="computeroutput">-1</tt> (or
496 <tt class="computeroutput">--fast</tt>) to
497 <tt class="computeroutput">-9</tt> (or
498 <tt class="computeroutput">-best</tt>)</span></dt>
499<dd><p>Set the block size to 100 k, 200 k ... 900 k
500 when compressing. Has no effect when decompressing. See <a href="#memory-management">MEMORY MANAGEMENT</a> below. The
501 <tt class="computeroutput">--fast</tt> and
502 <tt class="computeroutput">--best</tt> aliases are primarily
503 for GNU <tt class="computeroutput">gzip</tt> compatibility.
504 In particular, <tt class="computeroutput">--fast</tt> doesn't
505 make things significantly faster. And
506 <tt class="computeroutput">--best</tt> merely selects the
507 default behaviour.</p></dd>
508<dt><span class="term"><tt class="computeroutput">--</tt></span></dt>
509<dd><p>Treats all subsequent arguments as file names,
510 even if they start with a dash. This is so you can handle
511 files with names beginning with a dash, for example:
512 <tt class="computeroutput">bzip2 --
513 -myfilename</tt>.</p></dd>
514<dt>
515<span class="term"><tt class="computeroutput">--repetitive-fast</tt>, </span><span class="term"><tt class="computeroutput">--repetitive-best</tt>, </span>
516</dt>
517<dd><p>These flags are redundant in versions 0.9.5 and
518 above. They provided some coarse control over the behaviour of
519 the sorting algorithm in earlier versions, which was sometimes
520 useful. 0.9.5 and above have an improved algorithm which
521 renders these flags irrelevant.</p></dd>
522</dl></div>
523</div>
524<div class="sect1" lang="en">
525<div class="titlepage">
526<div><div><h2 class="title" style="clear: both">
527<a name="memory-management"></a>2.5. MEMORY MANAGEMENT</h2></div></div>
528<div></div>
529</div>
530<p><tt class="computeroutput">bzip2</tt> compresses large
531files in blocks. The block size affects both the compression
532ratio achieved, and the amount of memory needed for compression
533and decompression. The flags <tt class="computeroutput">-1</tt>
534through <tt class="computeroutput">-9</tt> specify the block
535size to be 100,000 bytes through 900,000 bytes (the default)
536respectively. At decompression time, the block size used for
537compression is read from the header of the compressed file, and
538<tt class="computeroutput">bunzip2</tt> then allocates itself
539just enough memory to decompress the file. Since block sizes are
540stored in compressed files, it follows that the flags
541<tt class="computeroutput">-1</tt> to
542<tt class="computeroutput">-9</tt> are irrelevant to and so
543ignored during decompression.</p>
544<p>Compression and decompression requirements, in bytes, can be
545estimated as:</p>
546<pre class="programlisting">Compression: 400k + ( 8 x block size )
547
548Decompression: 100k + ( 4 x block size ), or
549 100k + ( 2.5 x block size )</pre>
550<p>Larger block sizes give rapidly diminishing marginal
551returns. Most of the compression comes from the first two or
552three hundred k of block size, a fact worth bearing in mind when
553using <tt class="computeroutput">bzip2</tt> on small machines.
554It is also important to appreciate that the decompression memory
555requirement is set at compression time by the choice of block
556size.</p>
557<p>For files compressed with the default 900k block size,
558<tt class="computeroutput">bunzip2</tt> will require about 3700
559kbytes to decompress. To support decompression of any file on a
5604 megabyte machine, <tt class="computeroutput">bunzip2</tt> has
561an option to decompress using approximately half this amount of
562memory, about 2300 kbytes. Decompression speed is also halved,
563so you should use this option only where necessary. The relevant
564flag is <tt class="computeroutput">-s</tt>.</p>
565<p>In general, try and use the largest block size memory
566constraints allow, since that maximises the compression achieved.
567Compression and decompression speed are virtually unaffected by
568block size.</p>
569<p>Another significant point applies to files which fit in a
570single block -- that means most files you'd encounter using a
571large block size. The amount of real memory touched is
572proportional to the size of the file, since the file is smaller
573than a block. For example, compressing a file 20,000 bytes long
574with the flag <tt class="computeroutput">-9</tt> will cause the
575compressor to allocate around 7600k of memory, but only touch
576400k + 20000 * 8 = 560 kbytes of it. Similarly, the decompressor
577will allocate 3700k but only touch 100k + 20000 * 4 = 180
578kbytes.</p>
579<p>Here is a table which summarises the maximum memory usage
580for different block sizes. Also recorded is the total compressed
581size for 14 files of the Calgary Text Compression Corpus
582totalling 3,141,622 bytes. This column gives some feel for how
583compression varies with block size. These figures tend to
584understate the advantage of larger block sizes for larger files,
585since the Corpus is dominated by smaller files.</p>
586<pre class="programlisting"> Compress Decompress Decompress Corpus
587Flag usage usage -s usage Size
588
589 -1 1200k 500k 350k 914704
590 -2 2000k 900k 600k 877703
591 -3 2800k 1300k 850k 860338
592 -4 3600k 1700k 1100k 846899
593 -5 4400k 2100k 1350k 845160
594 -6 5200k 2500k 1600k 838626
595 -7 6100k 2900k 1850k 834096
596 -8 6800k 3300k 2100k 828642
597 -9 7600k 3700k 2350k 828642</pre>
598</div>
599<div class="sect1" lang="en">
600<div class="titlepage">
601<div><div><h2 class="title" style="clear: both">
602<a name="recovering"></a>2.6. RECOVERING DATA FROM DAMAGED FILES</h2></div></div>
603<div></div>
604</div>
605<p><tt class="computeroutput">bzip2</tt> compresses files in
606blocks, usually 900kbytes long. Each block is handled
607independently. If a media or transmission error causes a
608multi-block <tt class="computeroutput">.bz2</tt> file to become
609damaged, it may be possible to recover data from the undamaged
610blocks in the file.</p>
611<p>The compressed representation of each block is delimited by
612a 48-bit pattern, which makes it possible to find the block
613boundaries with reasonable certainty. Each block also carries
614its own 32-bit CRC, so damaged blocks can be distinguished from
615undamaged ones.</p>
616<p><tt class="computeroutput">bzip2recover</tt> is a simple
617program whose purpose is to search for blocks in
618<tt class="computeroutput">.bz2</tt> files, and write each block
619out into its own <tt class="computeroutput">.bz2</tt> file. You
620can then use <tt class="computeroutput">bzip2 -t</tt> to test
621the integrity of the resulting files, and decompress those which
622are undamaged.</p>
623<p><tt class="computeroutput">bzip2recover</tt> takes a
624single argument, the name of the damaged file, and writes a
625number of files <tt class="computeroutput">rec0001file.bz2</tt>,
626<tt class="computeroutput">rec0002file.bz2</tt>, etc, containing
627the extracted blocks. The output filenames are designed so that
628the use of wildcards in subsequent processing -- for example,
629<tt class="computeroutput">bzip2 -dc rec*file.bz2 &gt;
630recovered_data</tt> -- lists the files in the correct
631order.</p>
632<p><tt class="computeroutput">bzip2recover</tt> should be of
633most use dealing with large <tt class="computeroutput">.bz2</tt>
634files, as these will contain many blocks. It is clearly futile
635to use it on damaged single-block files, since a damaged block
636cannot be recovered. If you wish to minimise any potential data
637loss through media or transmission errors, you might consider
638compressing with a smaller block size.</p>
639</div>
640<div class="sect1" lang="en">
641<div class="titlepage">
642<div><div><h2 class="title" style="clear: both">
643<a name="performance"></a>2.7. PERFORMANCE NOTES</h2></div></div>
644<div></div>
645</div>
646<p>The sorting phase of compression gathers together similar
647strings in the file. Because of this, files containing very long
648runs of repeated symbols, like "aabaabaabaab ..." (repeated
649several hundred times) may compress more slowly than normal.
650Versions 0.9.5 and above fare much better than previous versions
651in this respect. The ratio between worst-case and average-case
652compression time is in the region of 10:1. For previous
653versions, this figure was more like 100:1. You can use the
654<tt class="computeroutput">-vvvv</tt> option to monitor progress
655in great detail, if you want.</p>
656<p>Decompression speed is unaffected by these
657phenomena.</p>
658<p><tt class="computeroutput">bzip2</tt> usually allocates
659several megabytes of memory to operate in, and then charges all
660over it in a fairly random fashion. This means that performance,
661both for compressing and decompressing, is largely determined by
662the speed at which your machine can service cache misses.
663Because of this, small changes to the code to reduce the miss
664rate have been observed to give disproportionately large
665performance improvements. I imagine
666<tt class="computeroutput">bzip2</tt> will perform best on
667machines with very large caches.</p>
668</div>
669<div class="sect1" lang="en">
670<div class="titlepage">
671<div><div><h2 class="title" style="clear: both">
672<a name="caveats"></a>2.8. CAVEATS</h2></div></div>
673<div></div>
674</div>
675<p>I/O error messages are not as helpful as they could be.
676<tt class="computeroutput">bzip2</tt> tries hard to detect I/O
677errors and exit cleanly, but the details of what the problem is
678sometimes seem rather misleading.</p>
679<p>This manual page pertains to version 1.0.3 of
680<tt class="computeroutput">bzip2</tt>. Compressed data created
681by this version is entirely forwards and backwards compatible
682with the previous public releases, versions 0.1pl2, 0.9.0 and
6830.9.5, 1.0.0, 1.0.1 and 1.0.2, but with the following exception: 0.9.0
684and above can correctly decompress multiple concatenated
685compressed files. 0.1pl2 cannot do this; it will stop after
686decompressing just the first file in the stream.</p>
687<p><tt class="computeroutput">bzip2recover</tt> versions
688prior to 1.0.2 used 32-bit integers to represent bit positions in
689compressed files, so it could not handle compressed files more
690than 512 megabytes long. Versions 1.0.2 and above use 64-bit ints
691on some platforms which support them (GNU supported targets, and
692Windows). To establish whether or not
693<tt class="computeroutput">bzip2recover</tt> was built with such
694a limitation, run it without arguments. In any event you can
695build yourself an unlimited version if you can recompile it with
696<tt class="computeroutput">MaybeUInt64</tt> set to be an
697unsigned 64-bit integer.</p>
698</div>
699<div class="sect1" lang="en">
700<div class="titlepage">
701<div><div><h2 class="title" style="clear: both">
702<a name="author"></a>2.9. AUTHOR</h2></div></div>
703<div></div>
704</div>
705<p>Julian Seward,
706<tt class="computeroutput">jseward@bzip.org</tt></p>
707<p>The ideas embodied in
708<tt class="computeroutput">bzip2</tt> are due to (at least) the
709following people: Michael Burrows and David Wheeler (for the
710block sorting transformation), David Wheeler (again, for the
711Huffman coder), Peter Fenwick (for the structured coding model in
712the original <tt class="computeroutput">bzip</tt>, and many
713refinements), and Alistair Moffat, Radford Neal and Ian Witten
714(for the arithmetic coder in the original
715<tt class="computeroutput">bzip</tt>). I am much indebted for
716their help, support and advice. See the manual in the source
717distribution for pointers to sources of documentation. Christian
718von Roques encouraged me to look for faster sorting algorithms,
719so as to speed up compression. Bela Lubkin encouraged me to
720improve the worst-case compression performance.
721Donna Robinson XMLised the documentation.
722Many people sent
723patches, helped with portability problems, lent machines, gave
724advice and were generally helpful.</p>
725</div>
726</div>
727<div class="chapter" lang="en">
728<div class="titlepage">
729<div><div><h2 class="title">
730<a name="libprog"></a>3. 
731Programming with <tt class="computeroutput">libbzip2</tt>
732</h2></div></div>
733<div></div>
734</div>
735<div class="toc">
736<p><b>Table of Contents</b></p>
737<dl>
738<dt><span class="sect1"><a href="#top-level">3.1. Top-level structure</a></span></dt>
739<dd><dl>
740<dt><span class="sect2"><a href="#ll-summary">3.1.1. Low-level summary</a></span></dt>
741<dt><span class="sect2"><a href="#hl-summary">3.1.2. High-level summary</a></span></dt>
742<dt><span class="sect2"><a href="#util-fns-summary">3.1.3. Utility functions summary</a></span></dt>
743</dl></dd>
744<dt><span class="sect1"><a href="#err-handling">3.2. Error handling</a></span></dt>
745<dt><span class="sect1"><a href="#low-level">3.3. Low-level interface</a></span></dt>
746<dd><dl>
747<dt><span class="sect2"><a href="#bzcompress-init">3.3.1. BZ2_bzCompressInit</a></span></dt>
748<dt><span class="sect2"><a href="#bzCompress">3.3.2. BZ2_bzCompress</a></span></dt>
749<dt><span class="sect2"><a href="#bzCompress-end">3.3.3. BZ2_bzCompressEnd</a></span></dt>
750<dt><span class="sect2"><a href="#bzDecompress-init">3.3.4. BZ2_bzDecompressInit</a></span></dt>
751<dt><span class="sect2"><a href="#bzDecompress">3.3.5. BZ2_bzDecompress</a></span></dt>
752<dt><span class="sect2"><a href="#bzDecompress-end">3.3.6. BZ2_bzDecompressEnd</a></span></dt>
753</dl></dd>
754<dt><span class="sect1"><a href="#hl-interface">3.4. High-level interface</a></span></dt>
755<dd><dl>
756<dt><span class="sect2"><a href="#bzreadopen">3.4.1. BZ2_bzReadOpen</a></span></dt>
757<dt><span class="sect2"><a href="#bzread">3.4.2. BZ2_bzRead</a></span></dt>
758<dt><span class="sect2"><a href="#bzreadgetunused">3.4.3. BZ2_bzReadGetUnused</a></span></dt>
759<dt><span class="sect2"><a href="#bzreadclose">3.4.4. BZ2_bzReadClose</a></span></dt>
760<dt><span class="sect2"><a href="#bzwriteopen">3.4.5. BZ2_bzWriteOpen</a></span></dt>
761<dt><span class="sect2"><a href="#bzwrite">3.4.6. BZ2_bzWrite</a></span></dt>
762<dt><span class="sect2"><a href="#bzwriteclose">3.4.7. BZ2_bzWriteClose</a></span></dt>
763<dt><span class="sect2"><a href="#embed">3.4.8. Handling embedded compressed data streams</a></span></dt>
764<dt><span class="sect2"><a href="#std-rdwr">3.4.9. Standard file-reading/writing code</a></span></dt>
765</dl></dd>
766<dt><span class="sect1"><a href="#util-fns">3.5. Utility functions</a></span></dt>
767<dd><dl>
768<dt><span class="sect2"><a href="#bzbufftobuffcompress">3.5.1. BZ2_bzBuffToBuffCompress</a></span></dt>
769<dt><span class="sect2"><a href="#bzbufftobuffdecompress">3.5.2. BZ2_bzBuffToBuffDecompress</a></span></dt>
770</dl></dd>
771<dt><span class="sect1"><a href="#zlib-compat">3.6. zlib compatibility functions</a></span></dt>
772<dt><span class="sect1"><a href="#stdio-free">3.7. Using the library in a stdio-free environment</a></span></dt>
773<dd><dl>
774<dt><span class="sect2"><a href="#stdio-bye">3.7.1. Getting rid of stdio</a></span></dt>
775<dt><span class="sect2"><a href="#critical-error">3.7.2. Critical error handling</a></span></dt>
776</dl></dd>
777<dt><span class="sect1"><a href="#win-dll">3.8. Making a Windows DLL</a></span></dt>
778</dl>
779</div>
780<p>This chapter describes the programming interface to
781<tt class="computeroutput">libbzip2</tt>.</p>
782<p>For general background information, particularly about
783memory use and performance aspects, you'd be well advised to read
784<a href="#using">How to use bzip2</a> as well.</p>
785<div class="sect1" lang="en">
786<div class="titlepage">
787<div><div><h2 class="title" style="clear: both">
788<a name="top-level"></a>3.1. Top-level structure</h2></div></div>
789<div></div>
790</div>
791<p><tt class="computeroutput">libbzip2</tt> is a flexible
792library for compressing and decompressing data in the
793<tt class="computeroutput">bzip2</tt> data format. Although
794packaged as a single entity, it helps to regard the library as
795three separate parts: the low level interface, and the high level
796interface, and some utility functions.</p>
797<p>The structure of
798<tt class="computeroutput">libbzip2</tt>'s interfaces is similar
799to that of Jean-loup Gailly's and Mark Adler's excellent
800<tt class="computeroutput">zlib</tt> library.</p>
801<p>All externally visible symbols have names beginning
802<tt class="computeroutput">BZ2_</tt>. This is new in version
8031.0. The intention is to minimise pollution of the namespaces of
804library clients.</p>
805<p>To use any part of the library, you need to
806<tt class="computeroutput">#include &lt;bzlib.h&gt;</tt>
807into your sources.</p>
808<div class="sect2" lang="en">
809<div class="titlepage">
810<div><div><h3 class="title">
811<a name="ll-summary"></a>3.1.1. Low-level summary</h3></div></div>
812<div></div>
813</div>
814<p>This interface provides services for compressing and
815decompressing data in memory. There's no provision for dealing
816with files, streams or any other I/O mechanisms, just straight
817memory-to-memory work. In fact, this part of the library can be
818compiled without inclusion of
819<tt class="computeroutput">stdio.h</tt>, which may be helpful
820for embedded applications.</p>
821<p>The low-level part of the library has no global variables
822and is therefore thread-safe.</p>
823<p>Six routines make up the low level interface:
824<tt class="computeroutput">BZ2_bzCompressInit</tt>,
825<tt class="computeroutput">BZ2_bzCompress</tt>, and
826<tt class="computeroutput">BZ2_bzCompressEnd</tt> for
827compression, and a corresponding trio
828<tt class="computeroutput">BZ2_bzDecompressInit</tt>,
829<tt class="computeroutput">BZ2_bzDecompress</tt> and
830<tt class="computeroutput">BZ2_bzDecompressEnd</tt> for
831decompression. The <tt class="computeroutput">*Init</tt>
832functions allocate memory for compression/decompression and do
833other initialisations, whilst the
834<tt class="computeroutput">*End</tt> functions close down
835operations and release memory.</p>
836<p>The real work is done by
837<tt class="computeroutput">BZ2_bzCompress</tt> and
838<tt class="computeroutput">BZ2_bzDecompress</tt>. These
839compress and decompress data from a user-supplied input buffer to
840a user-supplied output buffer. These buffers can be any size;
841arbitrary quantities of data are handled by making repeated calls
842to these functions. This is a flexible mechanism allowing a
843consumer-pull style of activity, or producer-push, or a mixture
844of both.</p>
845</div>
846<div class="sect2" lang="en">
847<div class="titlepage">
848<div><div><h3 class="title">
849<a name="hl-summary"></a>3.1.2. High-level summary</h3></div></div>
850<div></div>
851</div>
852<p>This interface provides some handy wrappers around the
853low-level interface to facilitate reading and writing
854<tt class="computeroutput">bzip2</tt> format files
855(<tt class="computeroutput">.bz2</tt> files). The routines
856provide hooks to facilitate reading files in which the
857<tt class="computeroutput">bzip2</tt> data stream is embedded
858within some larger-scale file structure, or where there are
859multiple <tt class="computeroutput">bzip2</tt> data streams
860concatenated end-to-end.</p>
861<p>For reading files,
862<tt class="computeroutput">BZ2_bzReadOpen</tt>,
863<tt class="computeroutput">BZ2_bzRead</tt>,
864<tt class="computeroutput">BZ2_bzReadClose</tt> and
865<tt class="computeroutput">BZ2_bzReadGetUnused</tt> are
866supplied. For writing files,
867<tt class="computeroutput">BZ2_bzWriteOpen</tt>,
868<tt class="computeroutput">BZ2_bzWrite</tt> and
869<tt class="computeroutput">BZ2_bzWriteFinish</tt> are
870available.</p>
871<p>As with the low-level library, no global variables are used
872so the library is per se thread-safe. However, if I/O errors
873occur whilst reading or writing the underlying compressed files,
874you may have to consult <tt class="computeroutput">errno</tt> to
875determine the cause of the error. In that case, you'd need a C
876library which correctly supports
877<tt class="computeroutput">errno</tt> in a multithreaded
878environment.</p>
879<p>To make the library a little simpler and more portable,
880<tt class="computeroutput">BZ2_bzReadOpen</tt> and
881<tt class="computeroutput">BZ2_bzWriteOpen</tt> require you to
882pass them file handles (<tt class="computeroutput">FILE*</tt>s)
883which have previously been opened for reading or writing
884respectively. That avoids portability problems associated with
885file operations and file attributes, whilst not being much of an
886imposition on the programmer.</p>
887</div>
888<div class="sect2" lang="en">
889<div class="titlepage">
890<div><div><h3 class="title">
891<a name="util-fns-summary"></a>3.1.3. Utility functions summary</h3></div></div>
892<div></div>
893</div>
894<p>For very simple needs,
895<tt class="computeroutput">BZ2_bzBuffToBuffCompress</tt> and
896<tt class="computeroutput">BZ2_bzBuffToBuffDecompress</tt> are
897provided. These compress data in memory from one buffer to
898another buffer in a single function call. You should assess
899whether these functions fulfill your memory-to-memory
900compression/decompression requirements before investing effort in
901understanding the more general but more complex low-level
902interface.</p>
903<p>Yoshioka Tsuneo
904(<tt class="computeroutput">QWF00133@niftyserve.or.jp</tt> /
905<tt class="computeroutput">tsuneo-y@is.aist-nara.ac.jp</tt>) has
906contributed some functions to give better
907<tt class="computeroutput">zlib</tt> compatibility. These
908functions are <tt class="computeroutput">BZ2_bzopen</tt>,
909<tt class="computeroutput">BZ2_bzread</tt>,
910<tt class="computeroutput">BZ2_bzwrite</tt>,
911<tt class="computeroutput">BZ2_bzflush</tt>,
912<tt class="computeroutput">BZ2_bzclose</tt>,
913<tt class="computeroutput">BZ2_bzerror</tt> and
914<tt class="computeroutput">BZ2_bzlibVersion</tt>. You may find
915these functions more convenient for simple file reading and
916writing, than those in the high-level interface. These functions
917are not (yet) officially part of the library, and are minimally
918documented here. If they break, you get to keep all the pieces.
919I hope to document them properly when time permits.</p>
920<p>Yoshioka also contributed modifications to allow the
921library to be built as a Windows DLL.</p>
922</div>
923</div>
924<div class="sect1" lang="en">
925<div class="titlepage">
926<div><div><h2 class="title" style="clear: both">
927<a name="err-handling"></a>3.2. Error handling</h2></div></div>
928<div></div>
929</div>
930<p>The library is designed to recover cleanly in all
931situations, including the worst-case situation of decompressing
932random data. I'm not 100% sure that it can always do this, so
933you might want to add a signal handler to catch segmentation
934violations during decompression if you are feeling especially
935paranoid. I would be interested in hearing more about the
936robustness of the library to corrupted compressed data.</p>
937<p>Version 1.0.3 more robust in this respect than any
938previous version. Investigations with Valgrind (a tool for detecting
939problems with memory management) indicate
940that, at least for the few files I tested, all single-bit errors
941in the decompressed data are caught properly, with no
942segmentation faults, no uses of uninitialised data, no out of
943range reads or writes, and no infinite looping in the decompressor.
944So it's certainly pretty robust, although
945I wouldn't claim it to be totally bombproof.</p>
946<p>The file <tt class="computeroutput">bzlib.h</tt> contains
947all definitions needed to use the library. In particular, you
948should definitely not include
949<tt class="computeroutput">bzlib_private.h</tt>.</p>
950<p>In <tt class="computeroutput">bzlib.h</tt>, the various
951return values are defined. The following list is not intended as
952an exhaustive description of the circumstances in which a given
953value may be returned -- those descriptions are given later.
954Rather, it is intended to convey the rough meaning of each return
955value. The first five actions are normal and not intended to
956denote an error situation.</p>
957<div class="variablelist"><dl>
958<dt><span class="term"><tt class="computeroutput">BZ_OK</tt></span></dt>
959<dd><p>The requested action was completed
960 successfully.</p></dd>
961<dt><span class="term"><tt class="computeroutput">BZ_RUN_OK, BZ_FLUSH_OK,
962 BZ_FINISH_OK</tt></span></dt>
963<dd><p>In
964 <tt class="computeroutput">BZ2_bzCompress</tt>, the requested
965 flush/finish/nothing-special action was completed
966 successfully.</p></dd>
967<dt><span class="term"><tt class="computeroutput">BZ_STREAM_END</tt></span></dt>
968<dd><p>Compression of data was completed, or the
969 logical stream end was detected during
970 decompression.</p></dd>
971</dl></div>
972<p>The following return values indicate an error of some
973kind.</p>
974<div class="variablelist"><dl>
975<dt><span class="term"><tt class="computeroutput">BZ_CONFIG_ERROR</tt></span></dt>
976<dd><p>Indicates that the library has been improperly
977 compiled on your platform -- a major configuration error.
978 Specifically, it means that
979 <tt class="computeroutput">sizeof(char)</tt>,
980 <tt class="computeroutput">sizeof(short)</tt> and
981 <tt class="computeroutput">sizeof(int)</tt> are not 1, 2 and
982 4 respectively, as they should be. Note that the library
983 should still work properly on 64-bit platforms which follow
984 the LP64 programming model -- that is, where
985 <tt class="computeroutput">sizeof(long)</tt> and
986 <tt class="computeroutput">sizeof(void*)</tt> are 8. Under
987 LP64, <tt class="computeroutput">sizeof(int)</tt> is still 4,
988 so <tt class="computeroutput">libbzip2</tt>, which doesn't
989 use the <tt class="computeroutput">long</tt> type, is
990 OK.</p></dd>
991<dt><span class="term"><tt class="computeroutput">BZ_SEQUENCE_ERROR</tt></span></dt>
992<dd><p>When using the library, it is important to call
993 the functions in the correct sequence and with data structures
994 (buffers etc) in the correct states.
995 <tt class="computeroutput">libbzip2</tt> checks as much as it
996 can to ensure this is happening, and returns
997 <tt class="computeroutput">BZ_SEQUENCE_ERROR</tt> if not.
998 Code which complies precisely with the function semantics, as
999 detailed below, should never receive this value; such an event
1000 denotes buggy code which you should
1001 investigate.</p></dd>
1002<dt><span class="term"><tt class="computeroutput">BZ_PARAM_ERROR</tt></span></dt>
1003<dd><p>Returned when a parameter to a function call is
1004 out of range or otherwise manifestly incorrect. As with
1005 <tt class="computeroutput">BZ_SEQUENCE_ERROR</tt>, this
1006 denotes a bug in the client code. The distinction between
1007 <tt class="computeroutput">BZ_PARAM_ERROR</tt> and
1008 <tt class="computeroutput">BZ_SEQUENCE_ERROR</tt> is a bit
1009 hazy, but still worth making.</p></dd>
1010<dt><span class="term"><tt class="computeroutput">BZ_MEM_ERROR</tt></span></dt>
1011<dd><p>Returned when a request to allocate memory
1012 failed. Note that the quantity of memory needed to decompress
1013 a stream cannot be determined until the stream's header has
1014 been read. So
1015 <tt class="computeroutput">BZ2_bzDecompress</tt> and
1016 <tt class="computeroutput">BZ2_bzRead</tt> may return
1017 <tt class="computeroutput">BZ_MEM_ERROR</tt> even though some
1018 of the compressed data has been read. The same is not true
1019 for compression; once
1020 <tt class="computeroutput">BZ2_bzCompressInit</tt> or
1021 <tt class="computeroutput">BZ2_bzWriteOpen</tt> have
1022 successfully completed,
1023 <tt class="computeroutput">BZ_MEM_ERROR</tt> cannot
1024 occur.</p></dd>
1025<dt><span class="term"><tt class="computeroutput">BZ_DATA_ERROR</tt></span></dt>
1026<dd><p>Returned when a data integrity error is
1027 detected during decompression. Most importantly, this means
1028 when stored and computed CRCs for the data do not match. This
1029 value is also returned upon detection of any other anomaly in
1030 the compressed data.</p></dd>
1031<dt><span class="term"><tt class="computeroutput">BZ_DATA_ERROR_MAGIC</tt></span></dt>
1032<dd><p>As a special case of
1033 <tt class="computeroutput">BZ_DATA_ERROR</tt>, it is
1034 sometimes useful to know when the compressed stream does not
1035 start with the correct magic bytes (<tt class="computeroutput">'B' 'Z'
1036 'h'</tt>).</p></dd>
1037<dt><span class="term"><tt class="computeroutput">BZ_IO_ERROR</tt></span></dt>
1038<dd><p>Returned by
1039 <tt class="computeroutput">BZ2_bzRead</tt> and
1040 <tt class="computeroutput">BZ2_bzWrite</tt> when there is an
1041 error reading or writing in the compressed file, and by
1042 <tt class="computeroutput">BZ2_bzReadOpen</tt> and
1043 <tt class="computeroutput">BZ2_bzWriteOpen</tt> for attempts
1044 to use a file for which the error indicator (viz,
1045 <tt class="computeroutput">ferror(f)</tt>) is set. On
1046 receipt of <tt class="computeroutput">BZ_IO_ERROR</tt>, the
1047 caller should consult <tt class="computeroutput">errno</tt>
1048 and/or <tt class="computeroutput">perror</tt> to acquire
1049 operating-system specific information about the
1050 problem.</p></dd>
1051<dt><span class="term"><tt class="computeroutput">BZ_UNEXPECTED_EOF</tt></span></dt>
1052<dd><p>Returned by
1053 <tt class="computeroutput">BZ2_bzRead</tt> when the
1054 compressed file finishes before the logical end of stream is
1055 detected.</p></dd>
1056<dt><span class="term"><tt class="computeroutput">BZ_OUTBUFF_FULL</tt></span></dt>
1057<dd><p>Returned by
1058 <tt class="computeroutput">BZ2_bzBuffToBuffCompress</tt> and
1059 <tt class="computeroutput">BZ2_bzBuffToBuffDecompress</tt> to
1060 indicate that the output data will not fit into the output
1061 buffer provided.</p></dd>
1062</dl></div>
1063</div>
1064<div class="sect1" lang="en">
1065<div class="titlepage">
1066<div><div><h2 class="title" style="clear: both">
1067<a name="low-level"></a>3.3. Low-level interface</h2></div></div>
1068<div></div>
1069</div>
1070<div class="sect2" lang="en">
1071<div class="titlepage">
1072<div><div><h3 class="title">
1073<a name="bzcompress-init"></a>3.3.1. <tt class="computeroutput">BZ2_bzCompressInit</tt></h3></div></div>
1074<div></div>
1075</div>
1076<pre class="programlisting">typedef struct {
1077 char *next_in;
1078 unsigned int avail_in;
1079 unsigned int total_in_lo32;
1080 unsigned int total_in_hi32;
1081
1082 char *next_out;
1083 unsigned int avail_out;
1084 unsigned int total_out_lo32;
1085 unsigned int total_out_hi32;
1086
1087 void *state;
1088
1089 void *(*bzalloc)(void *,int,int);
1090 void (*bzfree)(void *,void *);
1091 void *opaque;
1092} bz_stream;
1093
1094int BZ2_bzCompressInit ( bz_stream *strm,
1095 int blockSize100k,
1096 int verbosity,
1097 int workFactor );</pre>
1098<p>Prepares for compression. The
1099<tt class="computeroutput">bz_stream</tt> structure holds all
1100data pertaining to the compression activity. A
1101<tt class="computeroutput">bz_stream</tt> structure should be
1102allocated and initialised prior to the call. The fields of
1103<tt class="computeroutput">bz_stream</tt> comprise the entirety
1104of the user-visible data. <tt class="computeroutput">state</tt>
1105is a pointer to the private data structures required for
1106compression.</p>
1107<p>Custom memory allocators are supported, via fields
1108<tt class="computeroutput">bzalloc</tt>,
1109<tt class="computeroutput">bzfree</tt>, and
1110<tt class="computeroutput">opaque</tt>. The value
1111<tt class="computeroutput">opaque</tt> is passed to as the first
1112argument to all calls to <tt class="computeroutput">bzalloc</tt>
1113and <tt class="computeroutput">bzfree</tt>, but is otherwise
1114ignored by the library. The call <tt class="computeroutput">bzalloc (
1115opaque, n, m )</tt> is expected to return a pointer
1116<tt class="computeroutput">p</tt> to <tt class="computeroutput">n *
1117m</tt> bytes of memory, and <tt class="computeroutput">bzfree (
1118opaque, p )</tt> should free that memory.</p>
1119<p>If you don't want to use a custom memory allocator, set
1120<tt class="computeroutput">bzalloc</tt>,
1121<tt class="computeroutput">bzfree</tt> and
1122<tt class="computeroutput">opaque</tt> to
1123<tt class="computeroutput">NULL</tt>, and the library will then
1124use the standard <tt class="computeroutput">malloc</tt> /
1125<tt class="computeroutput">free</tt> routines.</p>
1126<p>Before calling
1127<tt class="computeroutput">BZ2_bzCompressInit</tt>, fields
1128<tt class="computeroutput">bzalloc</tt>,
1129<tt class="computeroutput">bzfree</tt> and
1130<tt class="computeroutput">opaque</tt> should be filled
1131appropriately, as just described. Upon return, the internal
1132state will have been allocated and initialised, and
1133<tt class="computeroutput">total_in_lo32</tt>,
1134<tt class="computeroutput">total_in_hi32</tt>,
1135<tt class="computeroutput">total_out_lo32</tt> and
1136<tt class="computeroutput">total_out_hi32</tt> will have been
1137set to zero. These four fields are used by the library to inform
1138the caller of the total amount of data passed into and out of the
1139library, respectively. You should not try to change them. As of
1140version 1.0, 64-bit counts are maintained, even on 32-bit
1141platforms, using the <tt class="computeroutput">_hi32</tt>
1142fields to store the upper 32 bits of the count. So, for example,
1143the total amount of data in is <tt class="computeroutput">(total_in_hi32
1144&lt;&lt; 32) + total_in_lo32</tt>.</p>
1145<p>Parameter <tt class="computeroutput">blockSize100k</tt>
1146specifies the block size to be used for compression. It should
1147be a value between 1 and 9 inclusive, and the actual block size
1148used is 100000 x this figure. 9 gives the best compression but
1149takes most memory.</p>
1150<p>Parameter <tt class="computeroutput">verbosity</tt> should
1151be set to a number between 0 and 4 inclusive. 0 is silent, and
1152greater numbers give increasingly verbose monitoring/debugging
1153output. If the library has been compiled with
1154<tt class="computeroutput">-DBZ_NO_STDIO</tt>, no such output
1155will appear for any verbosity setting.</p>
1156<p>Parameter <tt class="computeroutput">workFactor</tt>
1157controls how the compression phase behaves when presented with
1158worst case, highly repetitive, input data. If compression runs
1159into difficulties caused by repetitive data, the library switches
1160from the standard sorting algorithm to a fallback algorithm. The
1161fallback is slower than the standard algorithm by perhaps a
1162factor of three, but always behaves reasonably, no matter how bad
1163the input.</p>
1164<p>Lower values of <tt class="computeroutput">workFactor</tt>
1165reduce the amount of effort the standard algorithm will expend
1166before resorting to the fallback. You should set this parameter
1167carefully; too low, and many inputs will be handled by the
1168fallback algorithm and so compress rather slowly, too high, and
1169your average-to-worst case compression times can become very
1170large. The default value of 30 gives reasonable behaviour over a
1171wide range of circumstances.</p>
1172<p>Allowable values range from 0 to 250 inclusive. 0 is a
1173special case, equivalent to using the default value of 30.</p>
1174<p>Note that the compressed output generated is the same
1175regardless of whether or not the fallback algorithm is
1176used.</p>
1177<p>Be aware also that this parameter may disappear entirely in
1178future versions of the library. In principle it should be
1179possible to devise a good way to automatically choose which
1180algorithm to use. Such a mechanism would render the parameter
1181obsolete.</p>
1182<p>Possible return values:</p>
1183<pre class="programlisting">BZ_CONFIG_ERROR
1184 if the library has been mis-compiled
1185BZ_PARAM_ERROR
1186 if strm is NULL
1187 or blockSize &lt; 1 or blockSize &gt; 9
1188 or verbosity &lt; 0 or verbosity &gt; 4
1189 or workFactor &lt; 0 or workFactor &gt; 250
1190BZ_MEM_ERROR
1191 if not enough memory is available
1192BZ_OK
1193 otherwise</pre>
1194<p>Allowable next actions:</p>
1195<pre class="programlisting">BZ2_bzCompress
1196 if BZ_OK is returned
1197 no specific action needed in case of error</pre>
1198</div>
1199<div class="sect2" lang="en">
1200<div class="titlepage">
1201<div><div><h3 class="title">
1202<a name="bzCompress"></a>3.3.2. <tt class="computeroutput">BZ2_bzCompress</tt></h3></div></div>
1203<div></div>
1204</div>
1205<pre class="programlisting">int BZ2_bzCompress ( bz_stream *strm, int action );</pre>
1206<p>Provides more input and/or output buffer space for the
1207library. The caller maintains input and output buffers, and
1208calls <tt class="computeroutput">BZ2_bzCompress</tt> to transfer
1209data between them.</p>
1210<p>Before each call to
1211<tt class="computeroutput">BZ2_bzCompress</tt>,
1212<tt class="computeroutput">next_in</tt> should point at the data
1213to be compressed, and <tt class="computeroutput">avail_in</tt>
1214should indicate how many bytes the library may read.
1215<tt class="computeroutput">BZ2_bzCompress</tt> updates
1216<tt class="computeroutput">next_in</tt>,
1217<tt class="computeroutput">avail_in</tt> and
1218<tt class="computeroutput">total_in</tt> to reflect the number
1219of bytes it has read.</p>
1220<p>Similarly, <tt class="computeroutput">next_out</tt> should
1221point to a buffer in which the compressed data is to be placed,
1222with <tt class="computeroutput">avail_out</tt> indicating how
1223much output space is available.
1224<tt class="computeroutput">BZ2_bzCompress</tt> updates
1225<tt class="computeroutput">next_out</tt>,
1226<tt class="computeroutput">avail_out</tt> and
1227<tt class="computeroutput">total_out</tt> to reflect the number
1228of bytes output.</p>
1229<p>You may provide and remove as little or as much data as you
1230like on each call of
1231<tt class="computeroutput">BZ2_bzCompress</tt>. In the limit,
1232it is acceptable to supply and remove data one byte at a time,
1233although this would be terribly inefficient. You should always
1234ensure that at least one byte of output space is available at
1235each call.</p>
1236<p>A second purpose of
1237<tt class="computeroutput">BZ2_bzCompress</tt> is to request a
1238change of mode of the compressed stream.</p>
1239<p>Conceptually, a compressed stream can be in one of four
1240states: IDLE, RUNNING, FLUSHING and FINISHING. Before
1241initialisation
1242(<tt class="computeroutput">BZ2_bzCompressInit</tt>) and after
1243termination (<tt class="computeroutput">BZ2_bzCompressEnd</tt>),
1244a stream is regarded as IDLE.</p>
1245<p>Upon initialisation
1246(<tt class="computeroutput">BZ2_bzCompressInit</tt>), the stream
1247is placed in the RUNNING state. Subsequent calls to
1248<tt class="computeroutput">BZ2_bzCompress</tt> should pass
1249<tt class="computeroutput">BZ_RUN</tt> as the requested action;
1250other actions are illegal and will result in
1251<tt class="computeroutput">BZ_SEQUENCE_ERROR</tt>.</p>
1252<p>At some point, the calling program will have provided all
1253the input data it wants to. It will then want to finish up -- in
1254effect, asking the library to process any data it might have
1255buffered internally. In this state,
1256<tt class="computeroutput">BZ2_bzCompress</tt> will no longer
1257attempt to read data from
1258<tt class="computeroutput">next_in</tt>, but it will want to
1259write data to <tt class="computeroutput">next_out</tt>. Because
1260the output buffer supplied by the user can be arbitrarily small,
1261the finishing-up operation cannot necessarily be done with a
1262single call of
1263<tt class="computeroutput">BZ2_bzCompress</tt>.</p>
1264<p>Instead, the calling program passes
1265<tt class="computeroutput">BZ_FINISH</tt> as an action to
1266<tt class="computeroutput">BZ2_bzCompress</tt>. This changes
1267the stream's state to FINISHING. Any remaining input (ie,
1268<tt class="computeroutput">next_in[0 .. avail_in-1]</tt>) is
1269compressed and transferred to the output buffer. To do this,
1270<tt class="computeroutput">BZ2_bzCompress</tt> must be called
1271repeatedly until all the output has been consumed. At that
1272point, <tt class="computeroutput">BZ2_bzCompress</tt> returns
1273<tt class="computeroutput">BZ_STREAM_END</tt>, and the stream's
1274state is set back to IDLE.
1275<tt class="computeroutput">BZ2_bzCompressEnd</tt> should then be
1276called.</p>
1277<p>Just to make sure the calling program does not cheat, the
1278library makes a note of <tt class="computeroutput">avail_in</tt>
1279at the time of the first call to
1280<tt class="computeroutput">BZ2_bzCompress</tt> which has
1281<tt class="computeroutput">BZ_FINISH</tt> as an action (ie, at
1282the time the program has announced its intention to not supply
1283any more input). By comparing this value with that of
1284<tt class="computeroutput">avail_in</tt> over subsequent calls
1285to <tt class="computeroutput">BZ2_bzCompress</tt>, the library
1286can detect any attempts to slip in more data to compress. Any
1287calls for which this is detected will return
1288<tt class="computeroutput">BZ_SEQUENCE_ERROR</tt>. This
1289indicates a programming mistake which should be corrected.</p>
1290<p>Instead of asking to finish, the calling program may ask
1291<tt class="computeroutput">BZ2_bzCompress</tt> to take all the
1292remaining input, compress it and terminate the current
1293(Burrows-Wheeler) compression block. This could be useful for
1294error control purposes. The mechanism is analogous to that for
1295finishing: call <tt class="computeroutput">BZ2_bzCompress</tt>
1296with an action of <tt class="computeroutput">BZ_FLUSH</tt>,
1297remove output data, and persist with the
1298<tt class="computeroutput">BZ_FLUSH</tt> action until the value
1299<tt class="computeroutput">BZ_RUN</tt> is returned. As with
1300finishing, <tt class="computeroutput">BZ2_bzCompress</tt>
1301detects any attempt to provide more input data once the flush has
1302begun.</p>
1303<p>Once the flush is complete, the stream returns to the
1304normal RUNNING state.</p>
1305<p>This all sounds pretty complex, but isn't really. Here's a
1306table which shows which actions are allowable in each state, what
1307action will be taken, what the next state is, and what the
1308non-error return values are. Note that you can't explicitly ask
1309what state the stream is in, but nor do you need to -- it can be
1310inferred from the values returned by
1311<tt class="computeroutput">BZ2_bzCompress</tt>.</p>
1312<pre class="programlisting">IDLE/any
1313 Illegal. IDLE state only exists after BZ2_bzCompressEnd or
1314 before BZ2_bzCompressInit.
1315 Return value = BZ_SEQUENCE_ERROR
1316
1317RUNNING/BZ_RUN
1318 Compress from next_in to next_out as much as possible.
1319 Next state = RUNNING
1320 Return value = BZ_RUN_OK
1321
1322RUNNING/BZ_FLUSH
1323 Remember current value of next_in. Compress from next_in
1324 to next_out as much as possible, but do not accept any more input.
1325 Next state = FLUSHING
1326 Return value = BZ_FLUSH_OK
1327
1328RUNNING/BZ_FINISH
1329 Remember current value of next_in. Compress from next_in
1330 to next_out as much as possible, but do not accept any more input.
1331 Next state = FINISHING
1332 Return value = BZ_FINISH_OK
1333
1334FLUSHING/BZ_FLUSH
1335 Compress from next_in to next_out as much as possible,
1336 but do not accept any more input.
1337 If all the existing input has been used up and all compressed
1338 output has been removed
1339 Next state = RUNNING; Return value = BZ_RUN_OK
1340 else
1341 Next state = FLUSHING; Return value = BZ_FLUSH_OK
1342
1343FLUSHING/other
1344 Illegal.
1345 Return value = BZ_SEQUENCE_ERROR
1346
1347FINISHING/BZ_FINISH
1348 Compress from next_in to next_out as much as possible,
1349 but to not accept any more input.
1350 If all the existing input has been used up and all compressed
1351 output has been removed
1352 Next state = IDLE; Return value = BZ_STREAM_END
1353 else
1354 Next state = FINISHING; Return value = BZ_FINISHING
1355
1356FINISHING/other
1357 Illegal.
1358 Return value = BZ_SEQUENCE_ERROR</pre>
1359<p>That still looks complicated? Well, fair enough. The
1360usual sequence of calls for compressing a load of data is:</p>
1361<div class="orderedlist"><ol type="1">
1362<li><p>Get started with
1363 <tt class="computeroutput">BZ2_bzCompressInit</tt>.</p></li>
1364<li><p>Shovel data in and shlurp out its compressed form
1365 using zero or more calls of
1366 <tt class="computeroutput">BZ2_bzCompress</tt> with action =
1367 <tt class="computeroutput">BZ_RUN</tt>.</p></li>
1368<li><p>Finish up. Repeatedly call
1369 <tt class="computeroutput">BZ2_bzCompress</tt> with action =
1370 <tt class="computeroutput">BZ_FINISH</tt>, copying out the
1371 compressed output, until
1372 <tt class="computeroutput">BZ_STREAM_END</tt> is
1373 returned.</p></li>
1374<li><p>Close up and go home. Call
1375 <tt class="computeroutput">BZ2_bzCompressEnd</tt>.</p></li>
1376</ol></div>
1377<p>If the data you want to compress fits into your input
1378buffer all at once, you can skip the calls of
1379<tt class="computeroutput">BZ2_bzCompress ( ..., BZ_RUN )</tt>
1380and just do the <tt class="computeroutput">BZ2_bzCompress ( ..., BZ_FINISH
1381)</tt> calls.</p>
1382<p>All required memory is allocated by
1383<tt class="computeroutput">BZ2_bzCompressInit</tt>. The
1384compression library can accept any data at all (obviously). So
1385you shouldn't get any error return values from the
1386<tt class="computeroutput">BZ2_bzCompress</tt> calls. If you
1387do, they will be
1388<tt class="computeroutput">BZ_SEQUENCE_ERROR</tt>, and indicate
1389a bug in your programming.</p>
1390<p>Trivial other possible return values:</p>
1391<pre class="programlisting">BZ_PARAM_ERROR
1392 if strm is NULL, or strm-&gt;s is NULL</pre>
1393</div>
1394<div class="sect2" lang="en">
1395<div class="titlepage">
1396<div><div><h3 class="title">
1397<a name="bzCompress-end"></a>3.3.3. <tt class="computeroutput">BZ2_bzCompressEnd</tt></h3></div></div>
1398<div></div>
1399</div>
1400<pre class="programlisting">int BZ2_bzCompressEnd ( bz_stream *strm );</pre>
1401<p>Releases all memory associated with a compression
1402stream.</p>
1403<p>Possible return values:</p>
1404<pre class="programlisting">BZ_PARAM_ERROR if strm is NULL or strm-&gt;s is NULL
1405BZ_OK otherwise</pre>
1406</div>
1407<div class="sect2" lang="en">
1408<div class="titlepage">
1409<div><div><h3 class="title">
1410<a name="bzDecompress-init"></a>3.3.4. <tt class="computeroutput">BZ2_bzDecompressInit</tt></h3></div></div>
1411<div></div>
1412</div>
1413<pre class="programlisting">int BZ2_bzDecompressInit ( bz_stream *strm, int verbosity, int small );</pre>
1414<p>Prepares for decompression. As with
1415<tt class="computeroutput">BZ2_bzCompressInit</tt>, a
1416<tt class="computeroutput">bz_stream</tt> record should be
1417allocated and initialised before the call. Fields
1418<tt class="computeroutput">bzalloc</tt>,
1419<tt class="computeroutput">bzfree</tt> and
1420<tt class="computeroutput">opaque</tt> should be set if a custom
1421memory allocator is required, or made
1422<tt class="computeroutput">NULL</tt> for the normal
1423<tt class="computeroutput">malloc</tt> /
1424<tt class="computeroutput">free</tt> routines. Upon return, the
1425internal state will have been initialised, and
1426<tt class="computeroutput">total_in</tt> and
1427<tt class="computeroutput">total_out</tt> will be zero.</p>
1428<p>For the meaning of parameter
1429<tt class="computeroutput">verbosity</tt>, see
1430<tt class="computeroutput">BZ2_bzCompressInit</tt>.</p>
1431<p>If <tt class="computeroutput">small</tt> is nonzero, the
1432library will use an alternative decompression algorithm which
1433uses less memory but at the cost of decompressing more slowly
1434(roughly speaking, half the speed, but the maximum memory
1435requirement drops to around 2300k). See <a href="#using">How to use bzip2</a>
1436for more information on memory management.</p>
1437<p>Note that the amount of memory needed to decompress a
1438stream cannot be determined until the stream's header has been
1439read, so even if
1440<tt class="computeroutput">BZ2_bzDecompressInit</tt> succeeds, a
1441subsequent <tt class="computeroutput">BZ2_bzDecompress</tt>
1442could fail with
1443<tt class="computeroutput">BZ_MEM_ERROR</tt>.</p>
1444<p>Possible return values:</p>
1445<pre class="programlisting">BZ_CONFIG_ERROR
1446 if the library has been mis-compiled
1447BZ_PARAM_ERROR
1448 if ( small != 0 &amp;&amp; small != 1 )
1449 or (verbosity &lt;; 0 || verbosity &gt; 4)
1450BZ_MEM_ERROR
1451 if insufficient memory is available</pre>
1452<p>Allowable next actions:</p>
1453<pre class="programlisting">BZ2_bzDecompress
1454 if BZ_OK was returned
1455 no specific action required in case of error</pre>
1456</div>
1457<div class="sect2" lang="en">
1458<div class="titlepage">
1459<div><div><h3 class="title">
1460<a name="bzDecompress"></a>3.3.5. <tt class="computeroutput">BZ2_bzDecompress</tt></h3></div></div>
1461<div></div>
1462</div>
1463<pre class="programlisting">int BZ2_bzDecompress ( bz_stream *strm );</pre>
1464<p>Provides more input and/out output buffer space for the
1465library. The caller maintains input and output buffers, and uses
1466<tt class="computeroutput">BZ2_bzDecompress</tt> to transfer
1467data between them.</p>
1468<p>Before each call to
1469<tt class="computeroutput">BZ2_bzDecompress</tt>,
1470<tt class="computeroutput">next_in</tt> should point at the
1471compressed data, and <tt class="computeroutput">avail_in</tt>
1472should indicate how many bytes the library may read.
1473<tt class="computeroutput">BZ2_bzDecompress</tt> updates
1474<tt class="computeroutput">next_in</tt>,
1475<tt class="computeroutput">avail_in</tt> and
1476<tt class="computeroutput">total_in</tt> to reflect the number
1477of bytes it has read.</p>
1478<p>Similarly, <tt class="computeroutput">next_out</tt> should
1479point to a buffer in which the uncompressed output is to be
1480placed, with <tt class="computeroutput">avail_out</tt>
1481indicating how much output space is available.
1482<tt class="computeroutput">BZ2_bzCompress</tt> updates
1483<tt class="computeroutput">next_out</tt>,
1484<tt class="computeroutput">avail_out</tt> and
1485<tt class="computeroutput">total_out</tt> to reflect the number
1486of bytes output.</p>
1487<p>You may provide and remove as little or as much data as you
1488like on each call of
1489<tt class="computeroutput">BZ2_bzDecompress</tt>. In the limit,
1490it is acceptable to supply and remove data one byte at a time,
1491although this would be terribly inefficient. You should always
1492ensure that at least one byte of output space is available at
1493each call.</p>
1494<p>Use of <tt class="computeroutput">BZ2_bzDecompress</tt> is
1495simpler than
1496<tt class="computeroutput">BZ2_bzCompress</tt>.</p>
1497<p>You should provide input and remove output as described
1498above, and repeatedly call
1499<tt class="computeroutput">BZ2_bzDecompress</tt> until
1500<tt class="computeroutput">BZ_STREAM_END</tt> is returned.
1501Appearance of <tt class="computeroutput">BZ_STREAM_END</tt>
1502denotes that <tt class="computeroutput">BZ2_bzDecompress</tt>
1503has detected the logical end of the compressed stream.
1504<tt class="computeroutput">BZ2_bzDecompress</tt> will not
1505produce <tt class="computeroutput">BZ_STREAM_END</tt> until all
1506output data has been placed into the output buffer, so once
1507<tt class="computeroutput">BZ_STREAM_END</tt> appears, you are
1508guaranteed to have available all the decompressed output, and
1509<tt class="computeroutput">BZ2_bzDecompressEnd</tt> can safely
1510be called.</p>
1511<p>If case of an error return value, you should call
1512<tt class="computeroutput">BZ2_bzDecompressEnd</tt> to clean up
1513and release memory.</p>
1514<p>Possible return values:</p>
1515<pre class="programlisting">BZ_PARAM_ERROR
1516 if strm is NULL or strm-&gt;s is NULL
1517 or strm-&gt;avail_out &lt; 1
1518BZ_DATA_ERROR
1519 if a data integrity error is detected in the compressed stream
1520BZ_DATA_ERROR_MAGIC
1521 if the compressed stream doesn't begin with the right magic bytes
1522BZ_MEM_ERROR
1523 if there wasn't enough memory available
1524BZ_STREAM_END
1525 if the logical end of the data stream was detected and all
1526 output in has been consumed, eg s--&gt;avail_out &gt; 0
1527BZ_OK
1528 otherwise</pre>
1529<p>Allowable next actions:</p>
1530<pre class="programlisting">BZ2_bzDecompress
1531 if BZ_OK was returned
1532BZ2_bzDecompressEnd
1533 otherwise</pre>
1534</div>
1535<div class="sect2" lang="en">
1536<div class="titlepage">
1537<div><div><h3 class="title">
1538<a name="bzDecompress-end"></a>3.3.6. <tt class="computeroutput">BZ2_bzDecompressEnd</tt></h3></div></div>
1539<div></div>
1540</div>
1541<pre class="programlisting">int BZ2_bzDecompressEnd ( bz_stream *strm );</pre>
1542<p>Releases all memory associated with a decompression
1543stream.</p>
1544<p>Possible return values:</p>
1545<pre class="programlisting">BZ_PARAM_ERROR
1546 if strm is NULL or strm-&gt;s is NULL
1547BZ_OK
1548 otherwise</pre>
1549<p>Allowable next actions:</p>
1550<pre class="programlisting"> None.</pre>
1551</div>
1552</div>
1553<div class="sect1" lang="en">
1554<div class="titlepage">
1555<div><div><h2 class="title" style="clear: both">
1556<a name="hl-interface"></a>3.4. High-level interface</h2></div></div>
1557<div></div>
1558</div>
1559<p>This interface provides functions for reading and writing
1560<tt class="computeroutput">bzip2</tt> format files. First, some
1561general points.</p>
1562<div class="itemizedlist"><ul type="bullet">
1563<li style="list-style-type: disc"><p>All of the functions take an
1564 <tt class="computeroutput">int*</tt> first argument,
1565 <tt class="computeroutput">bzerror</tt>. After each call,
1566 <tt class="computeroutput">bzerror</tt> should be consulted
1567 first to determine the outcome of the call. If
1568 <tt class="computeroutput">bzerror</tt> is
1569 <tt class="computeroutput">BZ_OK</tt>, the call completed
1570 successfully, and only then should the return value of the
1571 function (if any) be consulted. If
1572 <tt class="computeroutput">bzerror</tt> is
1573 <tt class="computeroutput">BZ_IO_ERROR</tt>, there was an
1574 error reading/writing the underlying compressed file, and you
1575 should then consult <tt class="computeroutput">errno</tt> /
1576 <tt class="computeroutput">perror</tt> to determine the cause
1577 of the difficulty. <tt class="computeroutput">bzerror</tt>
1578 may also be set to various other values; precise details are
1579 given on a per-function basis below.</p></li>
1580<li style="list-style-type: disc"><p>If <tt class="computeroutput">bzerror</tt> indicates
1581 an error (ie, anything except
1582 <tt class="computeroutput">BZ_OK</tt> and
1583 <tt class="computeroutput">BZ_STREAM_END</tt>), you should
1584 immediately call
1585 <tt class="computeroutput">BZ2_bzReadClose</tt> (or
1586 <tt class="computeroutput">BZ2_bzWriteClose</tt>, depending on
1587 whether you are attempting to read or to write) to free up all
1588 resources associated with the stream. Once an error has been
1589 indicated, behaviour of all calls except
1590 <tt class="computeroutput">BZ2_bzReadClose</tt>
1591 (<tt class="computeroutput">BZ2_bzWriteClose</tt>) is
1592 undefined. The implication is that (1)
1593 <tt class="computeroutput">bzerror</tt> should be checked
1594 after each call, and (2) if
1595 <tt class="computeroutput">bzerror</tt> indicates an error,
1596 <tt class="computeroutput">BZ2_bzReadClose</tt>
1597 (<tt class="computeroutput">BZ2_bzWriteClose</tt>) should then
1598 be called to clean up.</p></li>
1599<li style="list-style-type: disc"><p>The <tt class="computeroutput">FILE*</tt> arguments
1600 passed to <tt class="computeroutput">BZ2_bzReadOpen</tt> /
1601 <tt class="computeroutput">BZ2_bzWriteOpen</tt> should be set
1602 to binary mode. Most Unix systems will do this by default, but
1603 other platforms, including Windows and Mac, will not. If you
1604 omit this, you may encounter problems when moving code to new
1605 platforms.</p></li>
1606<li style="list-style-type: disc"><p>Memory allocation requests are handled by
1607 <tt class="computeroutput">malloc</tt> /
1608 <tt class="computeroutput">free</tt>. At present there is no
1609 facility for user-defined memory allocators in the file I/O
1610 functions (could easily be added, though).</p></li>
1611</ul></div>
1612<div class="sect2" lang="en">
1613<div class="titlepage">
1614<div><div><h3 class="title">
1615<a name="bzreadopen"></a>3.4.1. <tt class="computeroutput">BZ2_bzReadOpen</tt></h3></div></div>
1616<div></div>
1617</div>
1618<pre class="programlisting">typedef void BZFILE;
1619
1620BZFILE *BZ2_bzReadOpen( int *bzerror, FILE *f,
1621 int verbosity, int small,
1622 void *unused, int nUnused );</pre>
1623<p>Prepare to read compressed data from file handle
1624<tt class="computeroutput">f</tt>.
1625<tt class="computeroutput">f</tt> should refer to a file which
1626has been opened for reading, and for which the error indicator
1627(<tt class="computeroutput">ferror(f)</tt>)is not set. If
1628<tt class="computeroutput">small</tt> is 1, the library will try
1629to decompress using less memory, at the expense of speed.</p>
1630<p>For reasons explained below,
1631<tt class="computeroutput">BZ2_bzRead</tt> will decompress the
1632<tt class="computeroutput">nUnused</tt> bytes starting at
1633<tt class="computeroutput">unused</tt>, before starting to read
1634from the file <tt class="computeroutput">f</tt>. At most
1635<tt class="computeroutput">BZ_MAX_UNUSED</tt> bytes may be
1636supplied like this. If this facility is not required, you should
1637pass <tt class="computeroutput">NULL</tt> and
1638<tt class="computeroutput">0</tt> for
1639<tt class="computeroutput">unused</tt> and
1640n<tt class="computeroutput">Unused</tt> respectively.</p>
1641<p>For the meaning of parameters
1642<tt class="computeroutput">small</tt> and
1643<tt class="computeroutput">verbosity</tt>, see
1644<tt class="computeroutput">BZ2_bzDecompressInit</tt>.</p>
1645<p>The amount of memory needed to decompress a file cannot be
1646determined until the file's header has been read. So it is
1647possible that <tt class="computeroutput">BZ2_bzReadOpen</tt>
1648returns <tt class="computeroutput">BZ_OK</tt> but a subsequent
1649call of <tt class="computeroutput">BZ2_bzRead</tt> will return
1650<tt class="computeroutput">BZ_MEM_ERROR</tt>.</p>
1651<p>Possible assignments to
1652<tt class="computeroutput">bzerror</tt>:</p>
1653<pre class="programlisting">BZ_CONFIG_ERROR
1654 if the library has been mis-compiled
1655BZ_PARAM_ERROR
1656 if f is NULL
1657 or small is neither 0 nor 1
1658 or ( unused == NULL &amp;&amp; nUnused != 0 )
1659 or ( unused != NULL &amp;&amp; !(0 &lt;= nUnused &lt;= BZ_MAX_UNUSED) )
1660BZ_IO_ERROR
1661 if ferror(f) is nonzero
1662BZ_MEM_ERROR
1663 if insufficient memory is available
1664BZ_OK
1665 otherwise.</pre>
1666<p>Possible return values:</p>
1667<pre class="programlisting">Pointer to an abstract BZFILE
1668 if bzerror is BZ_OK
1669NULL
1670 otherwise</pre>
1671<p>Allowable next actions:</p>
1672<pre class="programlisting">BZ2_bzRead
1673 if bzerror is BZ_OK
1674BZ2_bzClose
1675 otherwise</pre>
1676</div>
1677<div class="sect2" lang="en">
1678<div class="titlepage">
1679<div><div><h3 class="title">
1680<a name="bzread"></a>3.4.2. <tt class="computeroutput">BZ2_bzRead</tt></h3></div></div>
1681<div></div>
1682</div>
1683<pre class="programlisting">int BZ2_bzRead ( int *bzerror, BZFILE *b, void *buf, int len );</pre>
1684<p>Reads up to <tt class="computeroutput">len</tt>
1685(uncompressed) bytes from the compressed file
1686<tt class="computeroutput">b</tt> into the buffer
1687<tt class="computeroutput">buf</tt>. If the read was
1688successful, <tt class="computeroutput">bzerror</tt> is set to
1689<tt class="computeroutput">BZ_OK</tt> and the number of bytes
1690read is returned. If the logical end-of-stream was detected,
1691<tt class="computeroutput">bzerror</tt> will be set to
1692<tt class="computeroutput">BZ_STREAM_END</tt>, and the number of
1693bytes read is returned. All other
1694<tt class="computeroutput">bzerror</tt> values denote an
1695error.</p>
1696<p><tt class="computeroutput">BZ2_bzRead</tt> will supply
1697<tt class="computeroutput">len</tt> bytes, unless the logical
1698stream end is detected or an error occurs. Because of this, it
1699is possible to detect the stream end by observing when the number
1700of bytes returned is less than the number requested.
1701Nevertheless, this is regarded as inadvisable; you should instead
1702check <tt class="computeroutput">bzerror</tt> after every call
1703and watch out for
1704<tt class="computeroutput">BZ_STREAM_END</tt>.</p>
1705<p>Internally, <tt class="computeroutput">BZ2_bzRead</tt>
1706copies data from the compressed file in chunks of size
1707<tt class="computeroutput">BZ_MAX_UNUSED</tt> bytes before
1708decompressing it. If the file contains more bytes than strictly
1709needed to reach the logical end-of-stream,
1710<tt class="computeroutput">BZ2_bzRead</tt> will almost certainly
1711read some of the trailing data before signalling
1712<tt class="computeroutput">BZ_SEQUENCE_END</tt>. To collect the
1713read but unused data once
1714<tt class="computeroutput">BZ_SEQUENCE_END</tt> has appeared,
1715call <tt class="computeroutput">BZ2_bzReadGetUnused</tt>
1716immediately before
1717<tt class="computeroutput">BZ2_bzReadClose</tt>.</p>
1718<p>Possible assignments to
1719<tt class="computeroutput">bzerror</tt>:</p>
1720<pre class="programlisting">BZ_PARAM_ERROR
1721 if b is NULL or buf is NULL or len &lt; 0
1722BZ_SEQUENCE_ERROR
1723 if b was opened with BZ2_bzWriteOpen
1724BZ_IO_ERROR
1725 if there is an error reading from the compressed file
1726BZ_UNEXPECTED_EOF
1727 if the compressed file ended before
1728 the logical end-of-stream was detected
1729BZ_DATA_ERROR
1730 if a data integrity error was detected in the compressed stream
1731BZ_DATA_ERROR_MAGIC
1732 if the stream does not begin with the requisite header bytes
1733 (ie, is not a bzip2 data file). This is really
1734 a special case of BZ_DATA_ERROR.
1735BZ_MEM_ERROR
1736 if insufficient memory was available
1737BZ_STREAM_END
1738 if the logical end of stream was detected.
1739BZ_OK
1740 otherwise.</pre>
1741<p>Possible return values:</p>
1742<pre class="programlisting">number of bytes read
1743 if bzerror is BZ_OK or BZ_STREAM_END
1744undefined
1745 otherwise</pre>
1746<p>Allowable next actions:</p>
1747<pre class="programlisting">collect data from buf, then BZ2_bzRead or BZ2_bzReadClose
1748 if bzerror is BZ_OK
1749collect data from buf, then BZ2_bzReadClose or BZ2_bzReadGetUnused
1750 if bzerror is BZ_SEQUENCE_END
1751BZ2_bzReadClose
1752 otherwise</pre>
1753</div>
1754<div class="sect2" lang="en">
1755<div class="titlepage">
1756<div><div><h3 class="title">
1757<a name="bzreadgetunused"></a>3.4.3. <tt class="computeroutput">BZ2_bzReadGetUnused</tt></h3></div></div>
1758<div></div>
1759</div>
1760<pre class="programlisting">void BZ2_bzReadGetUnused( int* bzerror, BZFILE *b,
1761 void** unused, int* nUnused );</pre>
1762<p>Returns data which was read from the compressed file but
1763was not needed to get to the logical end-of-stream.
1764<tt class="computeroutput">*unused</tt> is set to the address of
1765the data, and <tt class="computeroutput">*nUnused</tt> to the
1766number of bytes. <tt class="computeroutput">*nUnused</tt> will
1767be set to a value between <tt class="computeroutput">0</tt> and
1768<tt class="computeroutput">BZ_MAX_UNUSED</tt> inclusive.</p>
1769<p>This function may only be called once
1770<tt class="computeroutput">BZ2_bzRead</tt> has signalled
1771<tt class="computeroutput">BZ_STREAM_END</tt> but before
1772<tt class="computeroutput">BZ2_bzReadClose</tt>.</p>
1773<p>Possible assignments to
1774<tt class="computeroutput">bzerror</tt>:</p>
1775<pre class="programlisting">BZ_PARAM_ERROR
1776 if b is NULL
1777 or unused is NULL or nUnused is NULL
1778BZ_SEQUENCE_ERROR
1779 if BZ_STREAM_END has not been signalled
1780 or if b was opened with BZ2_bzWriteOpen
1781BZ_OK
1782 otherwise</pre>
1783<p>Allowable next actions:</p>
1784<pre class="programlisting">BZ2_bzReadClose</pre>
1785</div>
1786<div class="sect2" lang="en">
1787<div class="titlepage">
1788<div><div><h3 class="title">
1789<a name="bzreadclose"></a>3.4.4. <tt class="computeroutput">BZ2_bzReadClose</tt></h3></div></div>
1790<div></div>
1791</div>
1792<pre class="programlisting">void BZ2_bzReadClose ( int *bzerror, BZFILE *b );</pre>
1793<p>Releases all memory pertaining to the compressed file
1794<tt class="computeroutput">b</tt>.
1795<tt class="computeroutput">BZ2_bzReadClose</tt> does not call
1796<tt class="computeroutput">fclose</tt> on the underlying file
1797handle, so you should do that yourself if appropriate.
1798<tt class="computeroutput">BZ2_bzReadClose</tt> should be called
1799to clean up after all error situations.</p>
1800<p>Possible assignments to
1801<tt class="computeroutput">bzerror</tt>:</p>
1802<pre class="programlisting">BZ_SEQUENCE_ERROR
1803 if b was opened with BZ2_bzOpenWrite
1804BZ_OK
1805 otherwise</pre>
1806<p>Allowable next actions:</p>
1807<pre class="programlisting">none</pre>
1808</div>
1809<div class="sect2" lang="en">
1810<div class="titlepage">
1811<div><div><h3 class="title">
1812<a name="bzwriteopen"></a>3.4.5. <tt class="computeroutput">BZ2_bzWriteOpen</tt></h3></div></div>
1813<div></div>
1814</div>
1815<pre class="programlisting">BZFILE *BZ2_bzWriteOpen( int *bzerror, FILE *f,
1816 int blockSize100k, int verbosity,
1817 int workFactor );</pre>
1818<p>Prepare to write compressed data to file handle
1819<tt class="computeroutput">f</tt>.
1820<tt class="computeroutput">f</tt> should refer to a file which
1821has been opened for writing, and for which the error indicator
1822(<tt class="computeroutput">ferror(f)</tt>)is not set.</p>
1823<p>For the meaning of parameters
1824<tt class="computeroutput">blockSize100k</tt>,
1825<tt class="computeroutput">verbosity</tt> and
1826<tt class="computeroutput">workFactor</tt>, see
1827<tt class="computeroutput">BZ2_bzCompressInit</tt>.</p>
1828<p>All required memory is allocated at this stage, so if the
1829call completes successfully,
1830<tt class="computeroutput">BZ_MEM_ERROR</tt> cannot be signalled
1831by a subsequent call to
1832<tt class="computeroutput">BZ2_bzWrite</tt>.</p>
1833<p>Possible assignments to
1834<tt class="computeroutput">bzerror</tt>:</p>
1835<pre class="programlisting">BZ_CONFIG_ERROR
1836 if the library has been mis-compiled
1837BZ_PARAM_ERROR
1838 if f is NULL
1839 or blockSize100k &lt; 1 or blockSize100k &gt; 9
1840BZ_IO_ERROR
1841 if ferror(f) is nonzero
1842BZ_MEM_ERROR
1843 if insufficient memory is available
1844BZ_OK
1845 otherwise</pre>
1846<p>Possible return values:</p>
1847<pre class="programlisting">Pointer to an abstract BZFILE
1848 if bzerror is BZ_OK
1849NULL
1850 otherwise</pre>
1851<p>Allowable next actions:</p>
1852<pre class="programlisting">BZ2_bzWrite
1853 if bzerror is BZ_OK
1854 (you could go directly to BZ2_bzWriteClose, but this would be pretty pointless)
1855BZ2_bzWriteClose
1856 otherwise</pre>
1857</div>
1858<div class="sect2" lang="en">
1859<div class="titlepage">
1860<div><div><h3 class="title">
1861<a name="bzwrite"></a>3.4.6. <tt class="computeroutput">BZ2_bzWrite</tt></h3></div></div>
1862<div></div>
1863</div>
1864<pre class="programlisting">void BZ2_bzWrite ( int *bzerror, BZFILE *b, void *buf, int len );</pre>
1865<p>Absorbs <tt class="computeroutput">len</tt> bytes from the
1866buffer <tt class="computeroutput">buf</tt>, eventually to be
1867compressed and written to the file.</p>
1868<p>Possible assignments to
1869<tt class="computeroutput">bzerror</tt>:</p>
1870<pre class="programlisting">BZ_PARAM_ERROR
1871 if b is NULL or buf is NULL or len &lt; 0
1872BZ_SEQUENCE_ERROR
1873 if b was opened with BZ2_bzReadOpen
1874BZ_IO_ERROR
1875 if there is an error writing the compressed file.
1876BZ_OK
1877 otherwise</pre>
1878</div>
1879<div class="sect2" lang="en">
1880<div class="titlepage">
1881<div><div><h3 class="title">
1882<a name="bzwriteclose"></a>3.4.7. <tt class="computeroutput">BZ2_bzWriteClose</tt></h3></div></div>
1883<div></div>
1884</div>
1885<pre class="programlisting">void BZ2_bzWriteClose( int *bzerror, BZFILE* f,
1886 int abandon,
1887 unsigned int* nbytes_in,
1888 unsigned int* nbytes_out );
1889
1890void BZ2_bzWriteClose64( int *bzerror, BZFILE* f,
1891 int abandon,
1892 unsigned int* nbytes_in_lo32,
1893 unsigned int* nbytes_in_hi32,
1894 unsigned int* nbytes_out_lo32,
1895 unsigned int* nbytes_out_hi32 );</pre>
1896<p>Compresses and flushes to the compressed file all data so
1897far supplied by <tt class="computeroutput">BZ2_bzWrite</tt>.
1898The logical end-of-stream markers are also written, so subsequent
1899calls to <tt class="computeroutput">BZ2_bzWrite</tt> are
1900illegal. All memory associated with the compressed file
1901<tt class="computeroutput">b</tt> is released.
1902<tt class="computeroutput">fflush</tt> is called on the
1903compressed file, but it is not
1904<tt class="computeroutput">fclose</tt>'d.</p>
1905<p>If <tt class="computeroutput">BZ2_bzWriteClose</tt> is
1906called to clean up after an error, the only action is to release
1907the memory. The library records the error codes issued by
1908previous calls, so this situation will be detected automatically.
1909There is no attempt to complete the compression operation, nor to
1910<tt class="computeroutput">fflush</tt> the compressed file. You
1911can force this behaviour to happen even in the case of no error,
1912by passing a nonzero value to
1913<tt class="computeroutput">abandon</tt>.</p>
1914<p>If <tt class="computeroutput">nbytes_in</tt> is non-null,
1915<tt class="computeroutput">*nbytes_in</tt> will be set to be the
1916total volume of uncompressed data handled. Similarly,
1917<tt class="computeroutput">nbytes_out</tt> will be set to the
1918total volume of compressed data written. For compatibility with
1919older versions of the library,
1920<tt class="computeroutput">BZ2_bzWriteClose</tt> only yields the
1921lower 32 bits of these counts. Use
1922<tt class="computeroutput">BZ2_bzWriteClose64</tt> if you want
1923the full 64 bit counts. These two functions are otherwise
1924absolutely identical.</p>
1925<p>Possible assignments to
1926<tt class="computeroutput">bzerror</tt>:</p>
1927<pre class="programlisting">BZ_SEQUENCE_ERROR
1928 if b was opened with BZ2_bzReadOpen
1929BZ_IO_ERROR
1930 if there is an error writing the compressed file
1931BZ_OK
1932 otherwise</pre>
1933</div>
1934<div class="sect2" lang="en">
1935<div class="titlepage">
1936<div><div><h3 class="title">
1937<a name="embed"></a>3.4.8. Handling embedded compressed data streams</h3></div></div>
1938<div></div>
1939</div>
1940<p>The high-level library facilitates use of
1941<tt class="computeroutput">bzip2</tt> data streams which form
1942some part of a surrounding, larger data stream.</p>
1943<div class="itemizedlist"><ul type="bullet">
1944<li style="list-style-type: disc"><p>For writing, the library takes an open file handle,
1945 writes compressed data to it,
1946 <tt class="computeroutput">fflush</tt>es it but does not
1947 <tt class="computeroutput">fclose</tt> it. The calling
1948 application can write its own data before and after the
1949 compressed data stream, using that same file handle.</p></li>
1950<li style="list-style-type: disc"><p>Reading is more complex, and the facilities are not as
1951 general as they could be since generality is hard to reconcile
1952 with efficiency. <tt class="computeroutput">BZ2_bzRead</tt>
1953 reads from the compressed file in blocks of size
1954 <tt class="computeroutput">BZ_MAX_UNUSED</tt> bytes, and in
1955 doing so probably will overshoot the logical end of compressed
1956 stream. To recover this data once decompression has ended,
1957 call <tt class="computeroutput">BZ2_bzReadGetUnused</tt> after
1958 the last call of <tt class="computeroutput">BZ2_bzRead</tt>
1959 (the one returning
1960 <tt class="computeroutput">BZ_STREAM_END</tt>) but before
1961 calling
1962 <tt class="computeroutput">BZ2_bzReadClose</tt>.</p></li>
1963</ul></div>
1964<p>This mechanism makes it easy to decompress multiple
1965<tt class="computeroutput">bzip2</tt> streams placed end-to-end.
1966As the end of one stream, when
1967<tt class="computeroutput">BZ2_bzRead</tt> returns
1968<tt class="computeroutput">BZ_STREAM_END</tt>, call
1969<tt class="computeroutput">BZ2_bzReadGetUnused</tt> to collect
1970the unused data (copy it into your own buffer somewhere). That
1971data forms the start of the next compressed stream. To start
1972uncompressing that next stream, call
1973<tt class="computeroutput">BZ2_bzReadOpen</tt> again, feeding in
1974the unused data via the <tt class="computeroutput">unused</tt> /
1975<tt class="computeroutput">nUnused</tt> parameters. Keep doing
1976this until <tt class="computeroutput">BZ_STREAM_END</tt> return
1977coincides with the physical end of file
1978(<tt class="computeroutput">feof(f)</tt>). In this situation
1979<tt class="computeroutput">BZ2_bzReadGetUnused</tt> will of
1980course return no data.</p>
1981<p>This should give some feel for how the high-level interface
1982can be used. If you require extra flexibility, you'll have to
1983bite the bullet and get to grips with the low-level
1984interface.</p>
1985</div>
1986<div class="sect2" lang="en">
1987<div class="titlepage">
1988<div><div><h3 class="title">
1989<a name="std-rdwr"></a>3.4.9. Standard file-reading/writing code</h3></div></div>
1990<div></div>
1991</div>
1992<p>Here's how you'd write data to a compressed file:</p>
1993<pre class="programlisting">FILE* f;
1994BZFILE* b;
1995int nBuf;
1996char buf[ /* whatever size you like */ ];
1997int bzerror;
1998int nWritten;
1999
2000f = fopen ( "myfile.bz2", "w" );
2001if ( !f ) {
2002 /* handle error */
2003}
2004b = BZ2_bzWriteOpen( &amp;bzerror, f, 9 );
2005if (bzerror != BZ_OK) {
2006 BZ2_bzWriteClose ( b );
2007 /* handle error */
2008}
2009
2010while ( /* condition */ ) {
2011 /* get data to write into buf, and set nBuf appropriately */
2012 nWritten = BZ2_bzWrite ( &amp;bzerror, b, buf, nBuf );
2013 if (bzerror == BZ_IO_ERROR) {
2014 BZ2_bzWriteClose ( &amp;bzerror, b );
2015 /* handle error */
2016 }
2017}
2018
2019BZ2_bzWriteClose( &amp;bzerror, b );
2020if (bzerror == BZ_IO_ERROR) {
2021 /* handle error */
2022}</pre>
2023<p>And to read from a compressed file:</p>
2024<pre class="programlisting">FILE* f;
2025BZFILE* b;
2026int nBuf;
2027char buf[ /* whatever size you like */ ];
2028int bzerror;
2029int nWritten;
2030
2031f = fopen ( "myfile.bz2", "r" );
2032if ( !f ) {
2033 /* handle error */
2034}
2035b = BZ2_bzReadOpen ( &amp;bzerror, f, 0, NULL, 0 );
2036if ( bzerror != BZ_OK ) {
2037 BZ2_bzReadClose ( &amp;bzerror, b );
2038 /* handle error */
2039}
2040
2041bzerror = BZ_OK;
2042while ( bzerror == BZ_OK &amp;&amp; /* arbitrary other conditions */) {
2043 nBuf = BZ2_bzRead ( &amp;bzerror, b, buf, /* size of buf */ );
2044 if ( bzerror == BZ_OK ) {
2045 /* do something with buf[0 .. nBuf-1] */
2046 }
2047}
2048if ( bzerror != BZ_STREAM_END ) {
2049 BZ2_bzReadClose ( &amp;bzerror, b );
2050 /* handle error */
2051} else {
2052 BZ2_bzReadClose ( &amp;bzerror );
2053}</pre>
2054</div>
2055</div>
2056<div class="sect1" lang="en">
2057<div class="titlepage">
2058<div><div><h2 class="title" style="clear: both">
2059<a name="util-fns"></a>3.5. Utility functions</h2></div></div>
2060<div></div>
2061</div>
2062<div class="sect2" lang="en">
2063<div class="titlepage">
2064<div><div><h3 class="title">
2065<a name="bzbufftobuffcompress"></a>3.5.1. <tt class="computeroutput">BZ2_bzBuffToBuffCompress</tt></h3></div></div>
2066<div></div>
2067</div>
2068<pre class="programlisting">int BZ2_bzBuffToBuffCompress( char* dest,
2069 unsigned int* destLen,
2070 char* source,
2071 unsigned int sourceLen,
2072 int blockSize100k,
2073 int verbosity,
2074 int workFactor );</pre>
2075<p>Attempts to compress the data in <tt class="computeroutput">source[0
2076.. sourceLen-1]</tt> into the destination buffer,
2077<tt class="computeroutput">dest[0 .. *destLen-1]</tt>. If the
2078destination buffer is big enough,
2079<tt class="computeroutput">*destLen</tt> is set to the size of
2080the compressed data, and <tt class="computeroutput">BZ_OK</tt>
2081is returned. If the compressed data won't fit,
2082<tt class="computeroutput">*destLen</tt> is unchanged, and
2083<tt class="computeroutput">BZ_OUTBUFF_FULL</tt> is
2084returned.</p>
2085<p>Compression in this manner is a one-shot event, done with a
2086single call to this function. The resulting compressed data is a
2087complete <tt class="computeroutput">bzip2</tt> format data
2088stream. There is no mechanism for making additional calls to
2089provide extra input data. If you want that kind of mechanism,
2090use the low-level interface.</p>
2091<p>For the meaning of parameters
2092<tt class="computeroutput">blockSize100k</tt>,
2093<tt class="computeroutput">verbosity</tt> and
2094<tt class="computeroutput">workFactor</tt>, see
2095<tt class="computeroutput">BZ2_bzCompressInit</tt>.</p>
2096<p>To guarantee that the compressed data will fit in its
2097buffer, allocate an output buffer of size 1% larger than the
2098uncompressed data, plus six hundred extra bytes.</p>
2099<p><tt class="computeroutput">BZ2_bzBuffToBuffDecompress</tt>
2100will not write data at or beyond
2101<tt class="computeroutput">dest[*destLen]</tt>, even in case of
2102buffer overflow.</p>
2103<p>Possible return values:</p>
2104<pre class="programlisting">BZ_CONFIG_ERROR
2105 if the library has been mis-compiled
2106BZ_PARAM_ERROR
2107 if dest is NULL or destLen is NULL
2108 or blockSize100k &lt; 1 or blockSize100k &gt; 9
2109 or verbosity &lt; 0 or verbosity &gt; 4
2110 or workFactor &lt; 0 or workFactor &gt; 250
2111BZ_MEM_ERROR
2112 if insufficient memory is available
2113BZ_OUTBUFF_FULL
2114 if the size of the compressed data exceeds *destLen
2115BZ_OK
2116 otherwise</pre>
2117</div>
2118<div class="sect2" lang="en">
2119<div class="titlepage">
2120<div><div><h3 class="title">
2121<a name="bzbufftobuffdecompress"></a>3.5.2. <tt class="computeroutput">BZ2_bzBuffToBuffDecompress</tt></h3></div></div>
2122<div></div>
2123</div>
2124<pre class="programlisting">int BZ2_bzBuffToBuffDecompress( char* dest,
2125 unsigned int* destLen,
2126 char* source,
2127 unsigned int sourceLen,
2128 int small,
2129 int verbosity );</pre>
2130<p>Attempts to decompress the data in <tt class="computeroutput">source[0
2131.. sourceLen-1]</tt> into the destination buffer,
2132<tt class="computeroutput">dest[0 .. *destLen-1]</tt>. If the
2133destination buffer is big enough,
2134<tt class="computeroutput">*destLen</tt> is set to the size of
2135the uncompressed data, and <tt class="computeroutput">BZ_OK</tt>
2136is returned. If the compressed data won't fit,
2137<tt class="computeroutput">*destLen</tt> is unchanged, and
2138<tt class="computeroutput">BZ_OUTBUFF_FULL</tt> is
2139returned.</p>
2140<p><tt class="computeroutput">source</tt> is assumed to hold
2141a complete <tt class="computeroutput">bzip2</tt> format data
2142stream.
2143<tt class="computeroutput">BZ2_bzBuffToBuffDecompress</tt> tries
2144to decompress the entirety of the stream into the output
2145buffer.</p>
2146<p>For the meaning of parameters
2147<tt class="computeroutput">small</tt> and
2148<tt class="computeroutput">verbosity</tt>, see
2149<tt class="computeroutput">BZ2_bzDecompressInit</tt>.</p>
2150<p>Because the compression ratio of the compressed data cannot
2151be known in advance, there is no easy way to guarantee that the
2152output buffer will be big enough. You may of course make
2153arrangements in your code to record the size of the uncompressed
2154data, but such a mechanism is beyond the scope of this
2155library.</p>
2156<p><tt class="computeroutput">BZ2_bzBuffToBuffDecompress</tt>
2157will not write data at or beyond
2158<tt class="computeroutput">dest[*destLen]</tt>, even in case of
2159buffer overflow.</p>
2160<p>Possible return values:</p>
2161<pre class="programlisting">BZ_CONFIG_ERROR
2162 if the library has been mis-compiled
2163BZ_PARAM_ERROR
2164 if dest is NULL or destLen is NULL
2165 or small != 0 &amp;&amp; small != 1
2166 or verbosity &lt; 0 or verbosity &gt; 4
2167BZ_MEM_ERROR
2168 if insufficient memory is available
2169BZ_OUTBUFF_FULL
2170 if the size of the compressed data exceeds *destLen
2171BZ_DATA_ERROR
2172 if a data integrity error was detected in the compressed data
2173BZ_DATA_ERROR_MAGIC
2174 if the compressed data doesn't begin with the right magic bytes
2175BZ_UNEXPECTED_EOF
2176 if the compressed data ends unexpectedly
2177BZ_OK
2178 otherwise</pre>
2179</div>
2180</div>
2181<div class="sect1" lang="en">
2182<div class="titlepage">
2183<div><div><h2 class="title" style="clear: both">
2184<a name="zlib-compat"></a>3.6. <tt class="computeroutput">zlib</tt> compatibility functions</h2></div></div>
2185<div></div>
2186</div>
2187<p>Yoshioka Tsuneo has contributed some functions to give
2188better <tt class="computeroutput">zlib</tt> compatibility.
2189These functions are <tt class="computeroutput">BZ2_bzopen</tt>,
2190<tt class="computeroutput">BZ2_bzread</tt>,
2191<tt class="computeroutput">BZ2_bzwrite</tt>,
2192<tt class="computeroutput">BZ2_bzflush</tt>,
2193<tt class="computeroutput">BZ2_bzclose</tt>,
2194<tt class="computeroutput">BZ2_bzerror</tt> and
2195<tt class="computeroutput">BZ2_bzlibVersion</tt>. These
2196functions are not (yet) officially part of the library. If they
2197break, you get to keep all the pieces. Nevertheless, I think
2198they work ok.</p>
2199<pre class="programlisting">typedef void BZFILE;
2200
2201const char * BZ2_bzlibVersion ( void );</pre>
2202<p>Returns a string indicating the library version.</p>
2203<pre class="programlisting">BZFILE * BZ2_bzopen ( const char *path, const char *mode );
2204BZFILE * BZ2_bzdopen ( int fd, const char *mode );</pre>
2205<p>Opens a <tt class="computeroutput">.bz2</tt> file for
2206reading or writing, using either its name or a pre-existing file
2207descriptor. Analogous to <tt class="computeroutput">fopen</tt>
2208and <tt class="computeroutput">fdopen</tt>.</p>
2209<pre class="programlisting">int BZ2_bzread ( BZFILE* b, void* buf, int len );
2210int BZ2_bzwrite ( BZFILE* b, void* buf, int len );</pre>
2211<p>Reads/writes data from/to a previously opened
2212<tt class="computeroutput">BZFILE</tt>. Analogous to
2213<tt class="computeroutput">fread</tt> and
2214<tt class="computeroutput">fwrite</tt>.</p>
2215<pre class="programlisting">int BZ2_bzflush ( BZFILE* b );
2216void BZ2_bzclose ( BZFILE* b );</pre>
2217<p>Flushes/closes a <tt class="computeroutput">BZFILE</tt>.
2218<tt class="computeroutput">BZ2_bzflush</tt> doesn't actually do
2219anything. Analogous to <tt class="computeroutput">fflush</tt>
2220and <tt class="computeroutput">fclose</tt>.</p>
2221<pre class="programlisting">const char * BZ2_bzerror ( BZFILE *b, int *errnum )</pre>
2222<p>Returns a string describing the more recent error status of
2223<tt class="computeroutput">b</tt>, and also sets
2224<tt class="computeroutput">*errnum</tt> to its numerical
2225value.</p>
2226</div>
2227<div class="sect1" lang="en">
2228<div class="titlepage">
2229<div><div><h2 class="title" style="clear: both">
2230<a name="stdio-free"></a>3.7. Using the library in a <tt class="computeroutput">stdio</tt>-free environment</h2></div></div>
2231<div></div>
2232</div>
2233<div class="sect2" lang="en">
2234<div class="titlepage">
2235<div><div><h3 class="title">
2236<a name="stdio-bye"></a>3.7.1. Getting rid of <tt class="computeroutput">stdio</tt></h3></div></div>
2237<div></div>
2238</div>
2239<p>In a deeply embedded application, you might want to use
2240just the memory-to-memory functions. You can do this
2241conveniently by compiling the library with preprocessor symbol
2242<tt class="computeroutput">BZ_NO_STDIO</tt> defined. Doing this
2243gives you a library containing only the following eight
2244functions:</p>
2245<p><tt class="computeroutput">BZ2_bzCompressInit</tt>,
2246<tt class="computeroutput">BZ2_bzCompress</tt>,
2247<tt class="computeroutput">BZ2_bzCompressEnd</tt>
2248<tt class="computeroutput">BZ2_bzDecompressInit</tt>,
2249<tt class="computeroutput">BZ2_bzDecompress</tt>,
2250<tt class="computeroutput">BZ2_bzDecompressEnd</tt>
2251<tt class="computeroutput">BZ2_bzBuffToBuffCompress</tt>,
2252<tt class="computeroutput">BZ2_bzBuffToBuffDecompress</tt></p>
2253<p>When compiled like this, all functions will ignore
2254<tt class="computeroutput">verbosity</tt> settings.</p>
2255</div>
2256<div class="sect2" lang="en">
2257<div class="titlepage">
2258<div><div><h3 class="title">
2259<a name="critical-error"></a>3.7.2. Critical error handling</h3></div></div>
2260<div></div>
2261</div>
2262<p><tt class="computeroutput">libbzip2</tt> contains a number
2263of internal assertion checks which should, needless to say, never
2264be activated. Nevertheless, if an assertion should fail,
2265behaviour depends on whether or not the library was compiled with
2266<tt class="computeroutput">BZ_NO_STDIO</tt> set.</p>
2267<p>For a normal compile, an assertion failure yields the
2268message:</p>
2269<div class="blockquote"><blockquote class="blockquote">
2270<p>bzip2/libbzip2: internal error number N.</p>
2271<p>This is a bug in bzip2/libbzip2, 1.0.3 of 15 February 2005.
2272Please report it to me at: jseward@bzip.org. If this happened
2273when you were using some program which uses libbzip2 as a
2274component, you should also report this bug to the author(s)
2275of that program. Please make an effort to report this bug;
2276timely and accurate bug reports eventually lead to higher
2277quality software. Thanks. Julian Seward, 15 February 2005.
2278</p>
2279</blockquote></div>
2280<p>where <tt class="computeroutput">N</tt> is some error code
2281number. If <tt class="computeroutput">N == 1007</tt>, it also
2282prints some extra text advising the reader that unreliable memory
2283is often associated with internal error 1007. (This is a
2284frequently-observed-phenomenon with versions 1.0.0/1.0.1).</p>
2285<p><tt class="computeroutput">exit(3)</tt> is then
2286called.</p>
2287<p>For a <tt class="computeroutput">stdio</tt>-free library,
2288assertion failures result in a call to a function declared
2289as:</p>
2290<pre class="programlisting">extern void bz_internal_error ( int errcode );</pre>
2291<p>The relevant code is passed as a parameter. You should
2292supply such a function.</p>
2293<p>In either case, once an assertion failure has occurred, any
2294<tt class="computeroutput">bz_stream</tt> records involved can
2295be regarded as invalid. You should not attempt to resume normal
2296operation with them.</p>
2297<p>You may, of course, change critical error handling to suit
2298your needs. As I said above, critical errors indicate bugs in
2299the library and should not occur. All "normal" error situations
2300are indicated via error return codes from functions, and can be
2301recovered from.</p>
2302</div>
2303</div>
2304<div class="sect1" lang="en">
2305<div class="titlepage">
2306<div><div><h2 class="title" style="clear: both">
2307<a name="win-dll"></a>3.8. Making a Windows DLL</h2></div></div>
2308<div></div>
2309</div>
2310<p>Everything related to Windows has been contributed by
2311Yoshioka Tsuneo
2312(<tt class="computeroutput">QWF00133@niftyserve.or.jp</tt> /
2313<tt class="computeroutput">tsuneo-y@is.aist-nara.ac.jp</tt>), so
2314you should send your queries to him (but perhaps Cc: me,
2315<tt class="computeroutput">jseward@bzip.org</tt>).</p>
2316<p>My vague understanding of what to do is: using Visual C++
23175.0, open the project file
2318<tt class="computeroutput">libbz2.dsp</tt>, and build. That's
2319all.</p>
2320<p>If you can't open the project file for some reason, make a
2321new one, naming these files:
2322<tt class="computeroutput">blocksort.c</tt>,
2323<tt class="computeroutput">bzlib.c</tt>,
2324<tt class="computeroutput">compress.c</tt>,
2325<tt class="computeroutput">crctable.c</tt>,
2326<tt class="computeroutput">decompress.c</tt>,
2327<tt class="computeroutput">huffman.c</tt>,
2328<tt class="computeroutput">randtable.c</tt> and
2329<tt class="computeroutput">libbz2.def</tt>. You will also need
2330to name the header files <tt class="computeroutput">bzlib.h</tt>
2331and <tt class="computeroutput">bzlib_private.h</tt>.</p>
2332<p>If you don't use VC++, you may need to define the
2333proprocessor symbol
2334<tt class="computeroutput">_WIN32</tt>.</p>
2335<p>Finally, <tt class="computeroutput">dlltest.c</tt> is a
2336sample program using the DLL. It has a project file,
2337<tt class="computeroutput">dlltest.dsp</tt>.</p>
2338<p>If you just want a makefile for Visual C, have a look at
2339<tt class="computeroutput">makefile.msc</tt>.</p>
2340<p>Be aware that if you compile
2341<tt class="computeroutput">bzip2</tt> itself on Win32, you must
2342set <tt class="computeroutput">BZ_UNIX</tt> to 0 and
2343<tt class="computeroutput">BZ_LCCWIN32</tt> to 1, in the file
2344<tt class="computeroutput">bzip2.c</tt>, before compiling.
2345Otherwise the resulting binary won't work correctly.</p>
2346<p>I haven't tried any of this stuff myself, but it all looks
2347plausible.</p>
2348</div>
2349</div>
2350<div class="chapter" lang="en">
2351<div class="titlepage">
2352<div><div><h2 class="title">
2353<a name="misc"></a>4. Miscellanea</h2></div></div>
2354<div></div>
2355</div>
2356<div class="toc">
2357<p><b>Table of Contents</b></p>
2358<dl>
2359<dt><span class="sect1"><a href="#limits">4.1. Limitations of the compressed file format</a></span></dt>
2360<dt><span class="sect1"><a href="#port-issues">4.2. Portability issues</a></span></dt>
2361<dt><span class="sect1"><a href="#bugs">4.3. Reporting bugs</a></span></dt>
2362<dt><span class="sect1"><a href="#package">4.4. Did you get the right package?</a></span></dt>
2363<dt><span class="sect1"><a href="#reading">4.5. Further Reading</a></span></dt>
2364</dl>
2365</div>
2366<p>These are just some random thoughts of mine. Your mileage
2367may vary.</p>
2368<div class="sect1" lang="en">
2369<div class="titlepage">
2370<div><div><h2 class="title" style="clear: both">
2371<a name="limits"></a>4.1. Limitations of the compressed file format</h2></div></div>
2372<div></div>
2373</div>
2374<p><tt class="computeroutput">bzip2-1.0.X</tt>,
2375<tt class="computeroutput">0.9.5</tt> and
2376<tt class="computeroutput">0.9.0</tt> use exactly the same file
2377format as the original version,
2378<tt class="computeroutput">bzip2-0.1</tt>. This decision was
2379made in the interests of stability. Creating yet another
2380incompatible compressed file format would create further
2381confusion and disruption for users.</p>
2382<p>Nevertheless, this is not a painless decision. Development
2383work since the release of
2384<tt class="computeroutput">bzip2-0.1</tt> in August 1997 has
2385shown complexities in the file format which slow down
2386decompression and, in retrospect, are unnecessary. These
2387are:</p>
2388<div class="itemizedlist"><ul type="bullet">
2389<li style="list-style-type: disc"><p>The run-length encoder, which is the first of the
2390 compression transformations, is entirely irrelevant. The
2391 original purpose was to protect the sorting algorithm from the
2392 very worst case input: a string of repeated symbols. But
2393 algorithm steps Q6a and Q6b in the original Burrows-Wheeler
2394 technical report (SRC-124) show how repeats can be handled
2395 without difficulty in block sorting.</p></li>
2396<li style="list-style-type: disc">
2397<p>The randomisation mechanism doesn't really need to be
2398 there. Udi Manber and Gene Myers published a suffix array
2399 construction algorithm a few years back, which can be employed
2400 to sort any block, no matter how repetitive, in O(N log N)
2401 time. Subsequent work by Kunihiko Sadakane has produced a
2402 derivative O(N (log N)^2) algorithm which usually outperforms
2403 the Manber-Myers algorithm.</p>
2404<p>I could have changed to Sadakane's algorithm, but I find
2405 it to be slower than <tt class="computeroutput">bzip2</tt>'s
2406 existing algorithm for most inputs, and the randomisation
2407 mechanism protects adequately against bad cases. I didn't
2408 think it was a good tradeoff to make. Partly this is due to
2409 the fact that I was not flooded with email complaints about
2410 <tt class="computeroutput">bzip2-0.1</tt>'s performance on
2411 repetitive data, so perhaps it isn't a problem for real
2412 inputs.</p>
2413<p>Probably the best long-term solution, and the one I have
2414 incorporated into 0.9.5 and above, is to use the existing
2415 sorting algorithm initially, and fall back to a O(N (log N)^2)
2416 algorithm if the standard algorithm gets into
2417 difficulties.</p>
2418</li>
2419<li style="list-style-type: disc"><p>The compressed file format was never designed to be
2420 handled by a library, and I have had to jump though some hoops
2421 to produce an efficient implementation of decompression. It's
2422 a bit hairy. Try passing
2423 <tt class="computeroutput">decompress.c</tt> through the C
2424 preprocessor and you'll see what I mean. Much of this
2425 complexity could have been avoided if the compressed size of
2426 each block of data was recorded in the data stream.</p></li>
2427<li style="list-style-type: disc"><p>An Adler-32 checksum, rather than a CRC32 checksum,
2428 would be faster to compute.</p></li>
2429</ul></div>
2430<p>It would be fair to say that the
2431<tt class="computeroutput">bzip2</tt> format was frozen before I
2432properly and fully understood the performance consequences of
2433doing so.</p>
2434<p>Improvements which I was able to incorporate into 0.9.0,
2435despite using the same file format, are:</p>
2436<div class="itemizedlist"><ul type="bullet">
2437<li style="list-style-type: disc"><p>Single array implementation of the inverse BWT. This
2438 significantly speeds up decompression, presumably because it
2439 reduces the number of cache misses.</p></li>
2440<li style="list-style-type: disc"><p>Faster inverse MTF transform for large MTF values.
2441 The new implementation is based on the notion of sliding blocks
2442 of values.</p></li>
2443<li style="list-style-type: disc"><p><tt class="computeroutput">bzip2-0.9.0</tt> now reads
2444 and writes files with <tt class="computeroutput">fread</tt>
2445 and <tt class="computeroutput">fwrite</tt>; version 0.1 used
2446 <tt class="computeroutput">putc</tt> and
2447 <tt class="computeroutput">getc</tt>. Duh! Well, you live
2448 and learn.</p></li>
2449</ul></div>
2450<p>Further ahead, it would be nice to be able to do random
2451access into files. This will require some careful design of
2452compressed file formats.</p>
2453</div>
2454<div class="sect1" lang="en">
2455<div class="titlepage">
2456<div><div><h2 class="title" style="clear: both">
2457<a name="port-issues"></a>4.2. Portability issues</h2></div></div>
2458<div></div>
2459</div>
2460<p>After some consideration, I have decided not to use GNU
2461<tt class="computeroutput">autoconf</tt> to configure 0.9.5 or
24621.0.</p>
2463<p><tt class="computeroutput">autoconf</tt>, admirable and
2464wonderful though it is, mainly assists with portability problems
2465between Unix-like platforms. But
2466<tt class="computeroutput">bzip2</tt> doesn't have much in the
2467way of portability problems on Unix; most of the difficulties
2468appear when porting to the Mac, or to Microsoft's operating
2469systems. <tt class="computeroutput">autoconf</tt> doesn't help
2470in those cases, and brings in a whole load of new
2471complexity.</p>
2472<p>Most people should be able to compile the library and
2473program under Unix straight out-of-the-box, so to speak,
2474especially if you have a version of GNU C available.</p>
2475<p>There are a couple of
2476<tt class="computeroutput">__inline__</tt> directives in the
2477code. GNU C (<tt class="computeroutput">gcc</tt>) should be
2478able to handle them. If you're not using GNU C, your C compiler
2479shouldn't see them at all. If your compiler does, for some
2480reason, see them and doesn't like them, just
2481<tt class="computeroutput">#define</tt>
2482<tt class="computeroutput">__inline__</tt> to be
2483<tt class="computeroutput">/* */</tt>. One easy way to do this
2484is to compile with the flag
2485<tt class="computeroutput">-D__inline__=</tt>, which should be
2486understood by most Unix compilers.</p>
2487<p>If you still have difficulties, try compiling with the
2488macro <tt class="computeroutput">BZ_STRICT_ANSI</tt> defined.
2489This should enable you to build the library in a strictly ANSI
2490compliant environment. Building the program itself like this is
2491dangerous and not supported, since you remove
2492<tt class="computeroutput">bzip2</tt>'s checks against
2493compressing directories, symbolic links, devices, and other
2494not-really-a-file entities. This could cause filesystem
2495corruption!</p>
2496<p>One other thing: if you create a
2497<tt class="computeroutput">bzip2</tt> binary for public distribution,
2498please consider linking it statically (<tt class="computeroutput">gcc
2499-static</tt>). This avoids all sorts of library-version
2500issues that others may encounter later on.</p>
2501<p>If you build <tt class="computeroutput">bzip2</tt> on
2502Win32, you must set <tt class="computeroutput">BZ_UNIX</tt> to 0
2503and <tt class="computeroutput">BZ_LCCWIN32</tt> to 1, in the
2504file <tt class="computeroutput">bzip2.c</tt>, before compiling.
2505Otherwise the resulting binary won't work correctly.</p>
2506</div>
2507<div class="sect1" lang="en">
2508<div class="titlepage">
2509<div><div><h2 class="title" style="clear: both">
2510<a name="bugs"></a>4.3. Reporting bugs</h2></div></div>
2511<div></div>
2512</div>
2513<p>I tried pretty hard to make sure
2514<tt class="computeroutput">bzip2</tt> is bug free, both by
2515design and by testing. Hopefully you'll never need to read this
2516section for real.</p>
2517<p>Nevertheless, if <tt class="computeroutput">bzip2</tt> dies
2518with a segmentation fault, a bus error or an internal assertion
2519failure, it will ask you to email me a bug report. Experience from
2520years of feedback of bzip2 users indicates that almost all these
2521problems can be traced to either compiler bugs or hardware
2522problems.</p>
2523<div class="itemizedlist"><ul type="bullet">
2524<li style="list-style-type: disc">
2525<p>Recompile the program with no optimisation, and
2526 see if it works. And/or try a different compiler. I heard all
2527 sorts of stories about various flavours of GNU C (and other
2528 compilers) generating bad code for
2529 <tt class="computeroutput">bzip2</tt>, and I've run across two
2530 such examples myself.</p>
2531<p>2.7.X versions of GNU C are known to generate bad code
2532 from time to time, at high optimisation levels. If you get
2533 problems, try using the flags
2534 <tt class="computeroutput">-O2</tt>
2535 <tt class="computeroutput">-fomit-frame-pointer</tt>
2536 <tt class="computeroutput">-fno-strength-reduce</tt>. You
2537 should specifically <span class="emphasis"><em>not</em></span> use
2538 <tt class="computeroutput">-funroll-loops</tt>.</p>
2539<p>You may notice that the Makefile runs six tests as part
2540 of the build process. If the program passes all of these, it's
2541 a pretty good (but not 100%) indication that the compiler has
2542 done its job correctly.</p>
2543</li>
2544<li style="list-style-type: disc">
2545<p>If <tt class="computeroutput">bzip2</tt>
2546 crashes randomly, and the crashes are not repeatable, you may
2547 have a flaky memory subsystem.
2548 <tt class="computeroutput">bzip2</tt> really hammers your
2549 memory hierarchy, and if it's a bit marginal, you may get these
2550 problems. Ditto if your disk or I/O subsystem is slowly
2551 failing. Yup, this really does happen.</p>
2552<p>Try using a different machine of the same type, and see
2553 if you can repeat the problem.</p>
2554</li>
2555<li style="list-style-type: disc"><p>This isn't really a bug, but ... If
2556 <tt class="computeroutput">bzip2</tt> tells you your file is
2557 corrupted on decompression, and you obtained the file via FTP,
2558 there is a possibility that you forgot to tell FTP to do a
2559 binary mode transfer. That absolutely will cause the file to
2560 be non-decompressible. You'll have to transfer it
2561 again.</p></li>
2562</ul></div>
2563<p>If you've incorporated
2564<tt class="computeroutput">libbzip2</tt> into your own program
2565and are getting problems, please, please, please, check that the
2566parameters you are passing in calls to the library, are correct,
2567and in accordance with what the documentation says is allowable.
2568I have tried to make the library robust against such problems,
2569but I'm sure I haven't succeeded.</p>
2570<p>Finally, if the above comments don't help, you'll have to
2571send me a bug report. Now, it's just amazing how many people
2572will send me a bug report saying something like:</p>
2573<pre class="programlisting">bzip2 crashed with segmentation fault on my machine</pre>
2574<p>and absolutely nothing else. Needless to say, a such a
2575report is <span class="emphasis"><em>totally, utterly, completely and
2576comprehensively 100% useless; a waste of your time, my time, and
2577net bandwidth</em></span>. With no details at all, there's no way
2578I can possibly begin to figure out what the problem is.</p>
2579<p>The rules of the game are: facts, facts, facts. Don't omit
2580them because "oh, they won't be relevant". At the bare
2581minimum:</p>
2582<pre class="programlisting">Machine type. Operating system version.
2583Exact version of bzip2 (do bzip2 -V).
2584Exact version of the compiler used.
2585Flags passed to the compiler.</pre>
2586<p>However, the most important single thing that will help me
2587is the file that you were trying to compress or decompress at the
2588time the problem happened. Without that, my ability to do
2589anything more than speculate about the cause, is limited.</p>
2590</div>
2591<div class="sect1" lang="en">
2592<div class="titlepage">
2593<div><div><h2 class="title" style="clear: both">
2594<a name="package"></a>4.4. Did you get the right package?</h2></div></div>
2595<div></div>
2596</div>
2597<p><tt class="computeroutput">bzip2</tt> is a resource hog.
2598It soaks up large amounts of CPU cycles and memory. Also, it
2599gives very large latencies. In the worst case, you can feed many
2600megabytes of uncompressed data into the library before getting
2601any compressed output, so this probably rules out applications
2602requiring interactive behaviour.</p>
2603<p>These aren't faults of my implementation, I hope, but more
2604an intrinsic property of the Burrows-Wheeler transform
2605(unfortunately). Maybe this isn't what you want.</p>
2606<p>If you want a compressor and/or library which is faster,
2607uses less memory but gets pretty good compression, and has
2608minimal latency, consider Jean-loup Gailly's and Mark Adler's
2609work, <tt class="computeroutput">zlib-1.2.1</tt> and
2610<tt class="computeroutput">gzip-1.2.4</tt>. Look for them at
2611<a href="http://www.zlib.org" target="_top">http://www.zlib.org</a> and
2612<a href="http://www.gzip.org" target="_top">http://www.gzip.org</a>
2613respectively.</p>
2614<p>For something faster and lighter still, you might try Markus F
2615X J Oberhumer's <tt class="computeroutput">LZO</tt> real-time
2616compression/decompression library, at
2617<a href="http://www.oberhumer.com/opensource" target="_top">http://www.oberhumer.com/opensource</a>.</p>
2618</div>
2619<div class="sect1" lang="en">
2620<div class="titlepage">
2621<div><div><h2 class="title" style="clear: both">
2622<a name="reading"></a>4.5. Further Reading</h2></div></div>
2623<div></div>
2624</div>
2625<p><tt class="computeroutput">bzip2</tt> is not research
2626work, in the sense that it doesn't present any new ideas.
2627Rather, it's an engineering exercise based on existing
2628ideas.</p>
2629<p>Four documents describe essentially all the ideas behind
2630<tt class="computeroutput">bzip2</tt>:</p>
2631<div class="literallayout"><p>Michael Burrows and D. J. Wheeler:<br>
2632  "A block-sorting lossless data compression algorithm"<br>
2633   10th May 1994. <br>
2634   Digital SRC Research Report 124.<br>
2635   ftp://ftp.digital.com/pub/DEC/SRC/research-reports/SRC-124.ps.gz<br>
2636   If you have trouble finding it, try searching at the<br>
2637   New Zealand Digital Library, http://www.nzdl.org.<br>
2638<br>
2639Daniel S. Hirschberg and Debra A. LeLewer<br>
2640  "Efficient Decoding of Prefix Codes"<br>
2641   Communications of the ACM, April 1990, Vol 33, Number 4.<br>
2642   You might be able to get an electronic copy of this<br>
2643   from the ACM Digital Library.<br>
2644<br>
2645David J. Wheeler<br>
2646   Program bred3.c and accompanying document bred3.ps.<br>
2647   This contains the idea behind the multi-table Huffman coding scheme.<br>
2648   ftp://ftp.cl.cam.ac.uk/users/djw3/<br>
2649<br>
2650Jon L. Bentley and Robert Sedgewick<br>
2651  "Fast Algorithms for Sorting and Searching Strings"<br>
2652   Available from Sedgewick's web page,<br>
2653   www.cs.princeton.edu/~rs<br>
2654</p></div>
2655<p>The following paper gives valuable additional insights into
2656the algorithm, but is not immediately the basis of any code used
2657in bzip2.</p>
2658<div class="literallayout"><p>Peter Fenwick:<br>
2659   Block Sorting Text Compression<br>
2660   Proceedings of the 19th Australasian Computer Science Conference,<br>
2661     Melbourne, Australia.  Jan 31 - Feb 2, 1996.<br>
2662   ftp://ftp.cs.auckland.ac.nz/pub/peter-f/ACSC96paper.ps</p></div>
2663<p>Kunihiko Sadakane's sorting algorithm, mentioned above, is
2664available from:</p>
2665<div class="literallayout"><p>http://naomi.is.s.u-tokyo.ac.jp/~sada/papers/Sada98b.ps.gz<br>
2666</p></div>
2667<p>The Manber-Myers suffix array construction algorithm is
2668described in a paper available from:</p>
2669<div class="literallayout"><p>http://www.cs.arizona.edu/people/gene/PAPERS/suffix.ps<br>
2670</p></div>
2671<p>Finally, the following papers document some
2672investigations I made into the performance of sorting
2673and decompression algorithms:</p>
2674<div class="literallayout"><p>Julian Seward<br>
2675   On the Performance of BWT Sorting Algorithms<br>
2676   Proceedings of the IEEE Data Compression Conference 2000<br>
2677     Snowbird, Utah.  28-30 March 2000.<br>
2678<br>
2679Julian Seward<br>
2680   Space-time Tradeoffs in the Inverse B-W Transform<br>
2681   Proceedings of the IEEE Data Compression Conference 2001<br>
2682     Snowbird, Utah.  27-29 March 2001.<br>
2683</p></div>
2684</div>
2685</div>
2686</div></body>
2687</html>
Note: See TracBrowser for help on using the repository browser.