Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Blame
Revision Log

bzip2.1.preformatted@ 10

Last change on this file since 10 was 9, checked in by Mattia Monga, 14 years ago
Minix 3.1.2a
File size: 20.4 KB

Line
1	bzip2(1) bzip2(1)
2
3
4
5	NNAAMMEE
6	bzip2, bunzip2 â a blockâsorting file compressor, v1.0.3
7	bzcat â decompresses files to stdout
8	bzip2recover â recovers data from damaged bzip2 files
9
10
11	SSYYNNOOPPSSIISS
12	bbzziipp22 [ ââccddffkkqqssttvvzzVVLL112233445566778899 ] [ _f_i_l_e_n_a_m_e_s _._._. ]
13	bbuunnzziipp22 [ ââffkkvvssVVLL ] [ _f_i_l_e_n_a_m_e_s _._._. ]
14	bbzzccaatt [ ââss ] [ _f_i_l_e_n_a_m_e_s _._._. ]
15	bbzziipp22rreeccoovveerr _f_i_l_e_n_a_m_e
16
17
18	DDEESSCCRRIIPPTTIIOONN
19	_b_z_i_p_2 compresses files using the BurrowsâWheeler block
20	sorting text compression algorithm, and Huffman coding.
21	Compression is generally considerably better than that
22	achieved by more conventional LZ77/LZ78âbased compressors,
23	and approaches the performance of the PPM family of staÂ
24	tistical compressors.
25
26	The commandâline options are deliberately very similar to
27	those of _G_N_U _g_z_i_p_, but they are not identical.
28
29	_b_z_i_p_2 expects a list of file names to accompany the comÂ
30	mandâline flags. Each file is replaced by a compressed
31	version of itself, with the name "original_name.bz2".
32	Each compressed file has the same modification date, perÂ
33	missions, and, when possible, ownership as the correspondÂ
34	ing original, so that these properties can be correctly
35	restored at decompression time. File name handling is
36	naive in the sense that there is no mechanism for preservÂ
37	ing original file names, permissions, ownerships or dates
38	in filesystems which lack these concepts, or have serious
39	file name length restrictions, such as MSâDOS.
40
41	_b_z_i_p_2 and _b_u_n_z_i_p_2 will by default not overwrite existing
42	files. If you want this to happen, specify the âf flag.
43
44	If no file names are specified, _b_z_i_p_2 compresses from
45	standard input to standard output. In this case, _b_z_i_p_2
46	will decline to write compressed output to a terminal, as
47	this would be entirely incomprehensible and therefore
48	pointless.
49
50	_b_u_n_z_i_p_2 (or _b_z_i_p_2 _â_d_) decompresses all specified files.
51	Files which were not created by _b_z_i_p_2 will be detected and
52	ignored, and a warning issued. _b_z_i_p_2 attempts to guess
53	the filename for the decompressed file from that of the
54	compressed file as follows:
55
56	filename.bz2 becomes filename
57	filename.bz becomes filename
58	filename.tbz2 becomes filename.tar
59	filename.tbz becomes filename.tar
60	anyothername becomes anyothername.out
61
62	If the file does not end in one of the recognised endings,
63	_._b_z_2_, _._b_z_, _._t_b_z_2 or _._t_b_z_, _b_z_i_p_2 complains that it cannot
64	guess the name of the original file, and uses the original
65	name with _._o_u_t appended.
66
67	As with compression, supplying no filenames causes decomÂ
68	pression from standard input to standard output.
69
70	_b_u_n_z_i_p_2 will correctly decompress a file which is the conÂ
71	catenation of two or more compressed files. The result is
72	the concatenation of the corresponding uncompressed files.
73	Integrity testing (ât) of concatenated compressed files is
74	also supported.
75
76	You can also compress or decompress files to the standard
77	output by giving the âc flag. Multiple files may be comÂ
78	pressed and decompressed like this. The resulting outputs
79	are fed sequentially to stdout. Compression of multiple
80	files in this manner generates a stream containing multiÂ
81	ple compressed file representations. Such a stream can be
82	decompressed correctly only by _b_z_i_p_2 version 0.9.0 or
83	later. Earlier versions of _b_z_i_p_2 will stop after decomÂ
84	pressing the first file in the stream.
85
86	_b_z_c_a_t (or _b_z_i_p_2 _â_d_c_) decompresses all specified files to
87	the standard output.
88
89	_b_z_i_p_2 will read arguments from the environment variables
90	_B_Z_I_P_2 and _B_Z_I_P_, in that order, and will process them
91	before any arguments read from the command line. This
92	gives a convenient way to supply default arguments.
93
94	Compression is always performed, even if the compressed
95	file is slightly larger than the original. Files of less
96	than about one hundred bytes tend to get larger, since the
97	compression mechanism has a constant overhead in the
98	region of 50 bytes. Random data (including the output of
99	most file compressors) is coded at about 8.05 bits per
100	byte, giving an expansion of around 0.5%.
101
102	As a selfâcheck for your protection, _b_z_i_p_2 uses 32âbit
103	CRCs to make sure that the decompressed version of a file
104	is identical to the original. This guards against corrupÂ
105	tion of the compressed data, and against undetected bugs
106	in _b_z_i_p_2 (hopefully very unlikely). The chances of data
107	corruption going undetected is microscopic, about one
108	chance in four billion for each file processed. Be aware,
109	though, that the check occurs upon decompression, so it
110	can only tell you that something is wrong. It canât help
111	you recover the original uncompressed data. You can use
112	_b_z_i_p_2_r_e_c_o_v_e_r to try to recover data from damaged files.
113
114	Return values: 0 for a normal exit, 1 for environmental
115	problems (file not found, invalid flags, I/O errors, &c),
116	2 to indicate a corrupt compressed file, 3 for an internal
117	consistency error (eg, bug) which caused _b_z_i_p_2 to panic.
118
119
120	OOPPTTIIOONNSS
121	ââcc ââââssttddoouutt
122	Compress or decompress to standard output.
123
124	ââdd ââââddeeccoommpprreessss
125	Force decompression. _b_z_i_p_2_, _b_u_n_z_i_p_2 and _b_z_c_a_t are
126	really the same program, and the decision about
127	what actions to take is done on the basis of which
128	name is used. This flag overrides that mechanism,
129	and forces _b_z_i_p_2 to decompress.
130
131	ââzz ââââccoommpprreessss
132	The complement to âd: forces compression,
133	regardless of the invocation name.
134
135	ââtt ââââtteesstt
136	Check integrity of the specified file(s), but donât
137	decompress them. This really performs a trial
138	decompression and throws away the result.
139
140	ââff ââââffoorrccee
141	Force overwrite of output files. Normally, _b_z_i_p_2
142	will not overwrite existing output files. Also
143	forces _b_z_i_p_2 to break hard links to files, which it
144	otherwise wouldnât do.
145
146	bzip2 normally declines to decompress files which
147	donât have the correct magic header bytes. If
148	forced (âf), however, it will pass such files
149	through unmodified. This is how GNU gzip behaves.
150
151	ââkk ââââkkeeeepp
152	Keep (donât delete) input files during compression
153	or decompression.
154
155	ââss ââââssmmaallll
156	Reduce memory usage, for compression, decompression
157	and testing. Files are decompressed and tested
158	using a modified algorithm which only requires 2.5
159	bytes per block byte. This means any file can be
160	decompressed in 2300k of memory, albeit at about
161	half the normal speed.
162
163	During compression, âs selects a block size of
164	200k, which limits memory use to around the same
165	figure, at the expense of your compression ratio.
166	In short, if your machine is low on memory (8
167	megabytes or less), use âs for everything. See
168	MEMORY MANAGEMENT below.
169
170	ââqq ââââqquuiieett
171	Suppress nonâessential warning messages. Messages
172	pertaining to I/O errors and other critical events
173	will not be suppressed.
174
175	ââvv ââââvveerrbboossee
176	Verbose mode ââ show the compression ratio for each
177	file processed. Further âvâs increase the verÂ
178	bosity level, spewing out lots of information which
179	is primarily of interest for diagnostic purposes.
180
181	ââLL ââââlliicceennssee ââVV ââââvveerrssiioonn
182	Display the software version, license terms and
183	conditions.
184
185	ââ11 ((oorr ââââffaasstt)) ttoo ââ99 ((oorr ââââbbeesstt))
186	Set the block size to 100 k, 200 k .. 900 k when
187	compressing. Has no effect when decompressing.
188	See MEMORY MANAGEMENT below. The ââfast and ââbest
189	aliases are primarily for GNU gzip compatibility.
190	In particular, ââfast doesnât make things signifiÂ
191	cantly faster. And ââbest merely selects the
192	default behaviour.
193
194	ââââ Treats all subsequent arguments as file names, even
195	if they start with a dash. This is so you can hanÂ
196	dle files with names beginning with a dash, for
197	example: bzip2 ââ âmyfilename.
198
199	âââârreeppeettiittiivveeââffaasstt âââârreeppeettiittiivveeââbbeesstt
200	These flags are redundant in versions 0.9.5 and
201	above. They provided some coarse control over the
202	behaviour of the sorting algorithm in earlier verÂ
203	sions, which was sometimes useful. 0.9.5 and above
204	have an improved algorithm which renders these
205	flags irrelevant.
206
207
208	MMEEMMOORRYY MMAANNAAGGEEMMEENNTT
209	_b_z_i_p_2 compresses large files in blocks. The block size
210	affects both the compression ratio achieved, and the
211	amount of memory needed for compression and decompression.
212	The flags â1 through â9 specify the block size to be
213	100,000 bytes through 900,000 bytes (the default) respecÂ
214	tively. At decompression time, the block size used for
215	compression is read from the header of the compressed
216	file, and _b_u_n_z_i_p_2 then allocates itself just enough memory
217	to decompress the file. Since block sizes are stored in
218	compressed files, it follows that the flags â1 to â9 are
219	irrelevant to and so ignored during decompression.
220
221	Compression and decompression requirements, in bytes, can
222	be estimated as:
223
224	Compression: 400k + ( 8 x block size )
225
226	Decompression: 100k + ( 4 x block size ), or
227	100k + ( 2.5 x block size )
228
229	Larger block sizes give rapidly diminishing marginal
230	returns. Most of the compression comes from the first two
231	or three hundred k of block size, a fact worth bearing in
232	mind when using _b_z_i_p_2 on small machines. It is also
233	important to appreciate that the decompression memory
234	requirement is set at compression time by the choice of
235	block size.
236
237	For files compressed with the default 900k block size,
238	_b_u_n_z_i_p_2 will require about 3700 kbytes to decompress. To
239	support decompression of any file on a 4 megabyte machine,
240	_b_u_n_z_i_p_2 has an option to decompress using approximately
241	half this amount of memory, about 2300 kbytes. DecompresÂ
242	sion speed is also halved, so you should use this option
243	only where necessary. The relevant flag is âs.
244
245	In general, try and use the largest block size memory conÂ
246	straints allow, since that maximises the compression
247	achieved. Compression and decompression speed are virtuÂ
248	ally unaffected by block size.
249
250	Another significant point applies to files which fit in a
251	single block ââ that means most files youâd encounter
252	using a large block size. The amount of real memory
253	touched is proportional to the size of the file, since the
254	file is smaller than a block. For example, compressing a
255	file 20,000 bytes long with the flag â9 will cause the
256	compressor to allocate around 7600k of memory, but only
257	touch 400k + 20000 * 8 = 560 kbytes of it. Similarly, the
258	decompressor will allocate 3700k but only touch 100k +
259	20000 * 4 = 180 kbytes.
260
261	Here is a table which summarises the maximum memory usage
262	for different block sizes. Also recorded is the total
263	compressed size for 14 files of the Calgary Text CompresÂ
264	sion Corpus totalling 3,141,622 bytes. This column gives
265	some feel for how compression varies with block size.
266	These figures tend to understate the advantage of larger
267	block sizes for larger files, since the Corpus is domiÂ
268	nated by smaller files.
269
270	Compress Decompress Decompress Corpus
271	Flag usage usage âs usage Size
272
273	â1 1200k 500k 350k 914704
274	â2 2000k 900k 600k 877703
275	â3 2800k 1300k 850k 860338
276	â4 3600k 1700k 1100k 846899
277	â5 4400k 2100k 1350k 845160
278	â6 5200k 2500k 1600k 838626
279	â7 6100k 2900k 1850k 834096
280	â8 6800k 3300k 2100k 828642
281	â9 7600k 3700k 2350k 828642
282
283
284	RREECCOOVVEERRIINNGG DDAATTAA FFRROOMM DDAAMMAAGGEEDD FFIILLEESS
285	_b_z_i_p_2 compresses files in blocks, usually 900kbytes long.
286	Each block is handled independently. If a media or transÂ
287	mission error causes a multiâblock .bz2 file to become
288	damaged, it may be possible to recover data from the
289	undamaged blocks in the file.
290
291	The compressed representation of each block is delimited
292	by a 48âbit pattern, which makes it possible to find the
293	block boundaries with reasonable certainty. Each block
294	also carries its own 32âbit CRC, so damaged blocks can be
295	distinguished from undamaged ones.
296
297	_b_z_i_p_2_r_e_c_o_v_e_r is a simple program whose purpose is to
298	search for blocks in .bz2 files, and write each block out
299	into its own .bz2 file. You can then use _b_z_i_p_2 ât to test
300	the integrity of the resulting files, and decompress those
301	which are undamaged.
302
303	_b_z_i_p_2_r_e_c_o_v_e_r takes a single argument, the name of the damÂ
304	aged file, and writes a number of files
305	"rec00001file.bz2", "rec00002file.bz2", etc, containing
306	the extracted blocks. The output filenames are
307	designed so that the use of wildcards in subsequent proÂ
308	cessing ââ for example, "bzip2 âdc rec*file.bz2 > recovÂ
309	ered_data" ââ processes the files in the correct order.
310
311	_b_z_i_p_2_r_e_c_o_v_e_r should be of most use dealing with large .bz2
312	files, as these will contain many blocks. It is clearly
313	futile to use it on damaged singleâblock files, since a
314	damaged block cannot be recovered. If you wish to minÂ
315	imise any potential data loss through media or transmisÂ
316	sion errors, you might consider compressing with a smaller
317	block size.
318
319
320	PPEERRFFOORRMMAANNCCEE NNOOTTEESS
321	The sorting phase of compression gathers together similar
322	strings in the file. Because of this, files containing
323	very long runs of repeated symbols, like "aabaabaabaab
324	..." (repeated several hundred times) may compress more
325	slowly than normal. Versions 0.9.5 and above fare much
326	better than previous versions in this respect. The ratio
327	between worstâcase and averageâcase compression time is in
328	the region of 10:1. For previous versions, this figure
329	was more like 100:1. You can use the âvvvv option to monÂ
330	itor progress in great detail, if you want.
331
332	Decompression speed is unaffected by these phenomena.
333
334	_b_z_i_p_2 usually allocates several megabytes of memory to
335	operate in, and then charges all over it in a fairly ranÂ
336	dom fashion. This means that performance, both for comÂ
337	pressing and decompressing, is largely determined by the
338	speed at which your machine can service cache misses.
339	Because of this, small changes to the code to reduce the
340	miss rate have been observed to give disproportionately
341	large performance improvements. I imagine _b_z_i_p_2 will perÂ
342	form best on machines with very large caches.
343
344
345	CCAAVVEEAATTSS
346	I/O error messages are not as helpful as they could be.
347	_b_z_i_p_2 tries hard to detect I/O errors and exit cleanly,
348	but the details of what the problem is sometimes seem
349	rather misleading.
350
351	This manual page pertains to version 1.0.3 of _b_z_i_p_2_. ComÂ
352	pressed data created by this version is entirely forwards
353	and backwards compatible with the previous public
354	releases, versions 0.1pl2, 0.9.0, 0.9.5, 1.0.0, 1.0.1 and
355	1.0.2, but with the following exception: 0.9.0 and above
356	can correctly decompress multiple concatenated compressed
357	files. 0.1pl2 cannot do this; it will stop after decomÂ
358	pressing just the first file in the stream.
359
360	_b_z_i_p_2_r_e_c_o_v_e_r versions prior to 1.0.2 used 32âbit integers
361	to represent bit positions in compressed files, so they
362	could not handle compressed files more than 512 megabytes
363	long. Versions 1.0.2 and above use 64âbit ints on some
364	platforms which support them (GNU supported targets, and
365	Windows). To establish whether or not bzip2recover was
366	built with such a limitation, run it without arguments.
367	In any event you can build yourself an unlimited version
368	if you can recompile it with MaybeUInt64 set to be an
369	unsigned 64âbit integer.
370
371
372
373
374	AAUUTTHHOORR
375	Julian Seward, jsewardbzip.org.
376
377	http://www.bzip.org
378
379	The ideas embodied in _b_z_i_p_2 are due to (at least) the folÂ
380	lowing people: Michael Burrows and David Wheeler (for the
381	block sorting transformation), David Wheeler (again, for
382	the Huffman coder), Peter Fenwick (for the structured codÂ
383	ing model in the original _b_z_i_p_, and many refinements), and
384	Alistair Moffat, Radford Neal and Ian Witten (for the
385	arithmetic coder in the original _b_z_i_p_)_. I am much
386	indebted for their help, support and advice. See the manÂ
387	ual in the source distribution for pointers to sources of
388	documentation. Christian von Roques encouraged me to look
389	for faster sorting algorithms, so as to speed up compresÂ
390	sion. Bela Lubkin encouraged me to improve the worstâcase
391	compression performance. Donna Robinson XMLised the docuÂ
392	mentation. The bz* scripts are derived from those of GNU
393	gzip. Many people sent patches, helped with portability
394	problems, lent machines, gave advice and were generally
395	helpful.
396
397
398
399	bzip2(1)

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Original Format