Context Navigation

← Previous Revision
Latest Revision
Next Revision →
Normal
Revision Log

bzip2.1.preformatted@ 9

Last change on this file since 9 was 9, checked in by Mattia Monga, 15 years ago
Minix 3.1.2a
File size: 20.4 KB

Rev	Line
[9]	1	bzip2(1) bzip2(1)
	2
	3
	4
	5	NNAAMMEE
	6	bzip2, bunzip2 â a blockâsorting file compressor, v1.0.3
	7	bzcat â decompresses files to stdout
	8	bzip2recover â recovers data from damaged bzip2 files
	9
	10
	11	SSYYNNOOPPSSIISS
	12	bbzziipp22 [ ââccddffkkqqssttvvzzVVLL112233445566778899 ] [ _f_i_l_e_n_a_m_e_s _._._. ]
	13	bbuunnzziipp22 [ ââffkkvvssVVLL ] [ _f_i_l_e_n_a_m_e_s _._._. ]
	14	bbzzccaatt [ ââss ] [ _f_i_l_e_n_a_m_e_s _._._. ]
	15	bbzziipp22rreeccoovveerr _f_i_l_e_n_a_m_e
	16
	17
	18	DDEESSCCRRIIPPTTIIOONN
	19	_b_z_i_p_2 compresses files using the BurrowsâWheeler block
	20	sorting text compression algorithm, and Huffman coding.
	21	Compression is generally considerably better than that
	22	achieved by more conventional LZ77/LZ78âbased compressors,
	23	and approaches the performance of the PPM family of staÂ
	24	tistical compressors.
	25
	26	The commandâline options are deliberately very similar to
	27	those of _G_N_U _g_z_i_p_, but they are not identical.
	28
	29	_b_z_i_p_2 expects a list of file names to accompany the comÂ
	30	mandâline flags. Each file is replaced by a compressed
	31	version of itself, with the name "original_name.bz2".
	32	Each compressed file has the same modification date, perÂ
	33	missions, and, when possible, ownership as the correspondÂ
	34	ing original, so that these properties can be correctly
	35	restored at decompression time. File name handling is
	36	naive in the sense that there is no mechanism for preservÂ
	37	ing original file names, permissions, ownerships or dates
	38	in filesystems which lack these concepts, or have serious
	39	file name length restrictions, such as MSâDOS.
	40
	41	_b_z_i_p_2 and _b_u_n_z_i_p_2 will by default not overwrite existing
	42	files. If you want this to happen, specify the âf flag.
	43
	44	If no file names are specified, _b_z_i_p_2 compresses from
	45	standard input to standard output. In this case, _b_z_i_p_2
	46	will decline to write compressed output to a terminal, as
	47	this would be entirely incomprehensible and therefore
	48	pointless.
	49
	50	_b_u_n_z_i_p_2 (or _b_z_i_p_2 _â_d_) decompresses all specified files.
	51	Files which were not created by _b_z_i_p_2 will be detected and
	52	ignored, and a warning issued. _b_z_i_p_2 attempts to guess
	53	the filename for the decompressed file from that of the
	54	compressed file as follows:
	55
	56	filename.bz2 becomes filename
	57	filename.bz becomes filename
	58	filename.tbz2 becomes filename.tar
	59	filename.tbz becomes filename.tar
	60	anyothername becomes anyothername.out
	61
	62	If the file does not end in one of the recognised endings,
	63	_._b_z_2_, _._b_z_, _._t_b_z_2 or _._t_b_z_, _b_z_i_p_2 complains that it cannot
	64	guess the name of the original file, and uses the original
	65	name with _._o_u_t appended.
	66
	67	As with compression, supplying no filenames causes decomÂ
	68	pression from standard input to standard output.
	69
	70	_b_u_n_z_i_p_2 will correctly decompress a file which is the conÂ
	71	catenation of two or more compressed files. The result is
	72	the concatenation of the corresponding uncompressed files.
	73	Integrity testing (ât) of concatenated compressed files is
	74	also supported.
	75
	76	You can also compress or decompress files to the standard
	77	output by giving the âc flag. Multiple files may be comÂ
	78	pressed and decompressed like this. The resulting outputs
	79	are fed sequentially to stdout. Compression of multiple
	80	files in this manner generates a stream containing multiÂ
	81	ple compressed file representations. Such a stream can be
	82	decompressed correctly only by _b_z_i_p_2 version 0.9.0 or
	83	later. Earlier versions of _b_z_i_p_2 will stop after decomÂ
	84	pressing the first file in the stream.
	85
	86	_b_z_c_a_t (or _b_z_i_p_2 _â_d_c_) decompresses all specified files to
	87	the standard output.
	88
	89	_b_z_i_p_2 will read arguments from the environment variables
	90	_B_Z_I_P_2 and _B_Z_I_P_, in that order, and will process them
	91	before any arguments read from the command line. This
	92	gives a convenient way to supply default arguments.
	93
	94	Compression is always performed, even if the compressed
	95	file is slightly larger than the original. Files of less
	96	than about one hundred bytes tend to get larger, since the
	97	compression mechanism has a constant overhead in the
	98	region of 50 bytes. Random data (including the output of
	99	most file compressors) is coded at about 8.05 bits per
	100	byte, giving an expansion of around 0.5%.
	101
	102	As a selfâcheck for your protection, _b_z_i_p_2 uses 32âbit
	103	CRCs to make sure that the decompressed version of a file
	104	is identical to the original. This guards against corrupÂ
	105	tion of the compressed data, and against undetected bugs
	106	in _b_z_i_p_2 (hopefully very unlikely). The chances of data
	107	corruption going undetected is microscopic, about one
	108	chance in four billion for each file processed. Be aware,
	109	though, that the check occurs upon decompression, so it
	110	can only tell you that something is wrong. It canât help
	111	you recover the original uncompressed data. You can use
	112	_b_z_i_p_2_r_e_c_o_v_e_r to try to recover data from damaged files.
	113
	114	Return values: 0 for a normal exit, 1 for environmental
	115	problems (file not found, invalid flags, I/O errors, &c),
	116	2 to indicate a corrupt compressed file, 3 for an internal
	117	consistency error (eg, bug) which caused _b_z_i_p_2 to panic.
	118
	119
	120	OOPPTTIIOONNSS
	121	ââcc ââââssttddoouutt
	122	Compress or decompress to standard output.
	123
	124	ââdd ââââddeeccoommpprreessss
	125	Force decompression. _b_z_i_p_2_, _b_u_n_z_i_p_2 and _b_z_c_a_t are
	126	really the same program, and the decision about
	127	what actions to take is done on the basis of which
	128	name is used. This flag overrides that mechanism,
	129	and forces _b_z_i_p_2 to decompress.
	130
	131	ââzz ââââccoommpprreessss
	132	The complement to âd: forces compression,
	133	regardless of the invocation name.
	134
	135	ââtt ââââtteesstt
	136	Check integrity of the specified file(s), but donât
	137	decompress them. This really performs a trial
	138	decompression and throws away the result.
	139
	140	ââff ââââffoorrccee
	141	Force overwrite of output files. Normally, _b_z_i_p_2
	142	will not overwrite existing output files. Also
	143	forces _b_z_i_p_2 to break hard links to files, which it
	144	otherwise wouldnât do.
	145
	146	bzip2 normally declines to decompress files which
	147	donât have the correct magic header bytes. If
	148	forced (âf), however, it will pass such files
	149	through unmodified. This is how GNU gzip behaves.
	150
	151	ââkk ââââkkeeeepp
	152	Keep (donât delete) input files during compression
	153	or decompression.
	154
	155	ââss ââââssmmaallll
	156	Reduce memory usage, for compression, decompression
	157	and testing. Files are decompressed and tested
	158	using a modified algorithm which only requires 2.5
	159	bytes per block byte. This means any file can be
	160	decompressed in 2300k of memory, albeit at about
	161	half the normal speed.
	162
	163	During compression, âs selects a block size of
	164	200k, which limits memory use to around the same
	165	figure, at the expense of your compression ratio.
	166	In short, if your machine is low on memory (8
	167	megabytes or less), use âs for everything. See
	168	MEMORY MANAGEMENT below.
	169
	170	ââqq ââââqquuiieett
	171	Suppress nonâessential warning messages. Messages
	172	pertaining to I/O errors and other critical events
	173	will not be suppressed.
	174
	175	ââvv ââââvveerrbboossee
	176	Verbose mode ââ show the compression ratio for each
	177	file processed. Further âvâs increase the verÂ
	178	bosity level, spewing out lots of information which
	179	is primarily of interest for diagnostic purposes.
	180
	181	ââLL ââââlliicceennssee ââVV ââââvveerrssiioonn
	182	Display the software version, license terms and
	183	conditions.
	184
	185	ââ11 ((oorr ââââffaasstt)) ttoo ââ99 ((oorr ââââbbeesstt))
	186	Set the block size to 100 k, 200 k .. 900 k when
	187	compressing. Has no effect when decompressing.
	188	See MEMORY MANAGEMENT below. The ââfast and ââbest
	189	aliases are primarily for GNU gzip compatibility.
	190	In particular, ââfast doesnât make things signifiÂ
	191	cantly faster. And ââbest merely selects the
	192	default behaviour.
	193
	194	ââââ Treats all subsequent arguments as file names, even
	195	if they start with a dash. This is so you can hanÂ
	196	dle files with names beginning with a dash, for
	197	example: bzip2 ââ âmyfilename.
	198
	199	âââârreeppeettiittiivveeââffaasstt âââârreeppeettiittiivveeââbbeesstt
	200	These flags are redundant in versions 0.9.5 and
	201	above. They provided some coarse control over the
	202	behaviour of the sorting algorithm in earlier verÂ
	203	sions, which was sometimes useful. 0.9.5 and above
	204	have an improved algorithm which renders these
	205	flags irrelevant.
	206
	207
	208	MMEEMMOORRYY MMAANNAAGGEEMMEENNTT
	209	_b_z_i_p_2 compresses large files in blocks. The block size
	210	affects both the compression ratio achieved, and the
	211	amount of memory needed for compression and decompression.
	212	The flags â1 through â9 specify the block size to be
	213	100,000 bytes through 900,000 bytes (the default) respecÂ
	214	tively. At decompression time, the block size used for
	215	compression is read from the header of the compressed
	216	file, and _b_u_n_z_i_p_2 then allocates itself just enough memory
	217	to decompress the file. Since block sizes are stored in
	218	compressed files, it follows that the flags â1 to â9 are
	219	irrelevant to and so ignored during decompression.
	220
	221	Compression and decompression requirements, in bytes, can
	222	be estimated as:
	223
	224	Compression: 400k + ( 8 x block size )
	225
	226	Decompression: 100k + ( 4 x block size ), or
	227	100k + ( 2.5 x block size )
	228
	229	Larger block sizes give rapidly diminishing marginal
	230	returns. Most of the compression comes from the first two
	231	or three hundred k of block size, a fact worth bearing in
	232	mind when using _b_z_i_p_2 on small machines. It is also
	233	important to appreciate that the decompression memory
	234	requirement is set at compression time by the choice of
	235	block size.
	236
	237	For files compressed with the default 900k block size,
	238	_b_u_n_z_i_p_2 will require about 3700 kbytes to decompress. To
	239	support decompression of any file on a 4 megabyte machine,
	240	_b_u_n_z_i_p_2 has an option to decompress using approximately
	241	half this amount of memory, about 2300 kbytes. DecompresÂ
	242	sion speed is also halved, so you should use this option
	243	only where necessary. The relevant flag is âs.
	244
	245	In general, try and use the largest block size memory conÂ
	246	straints allow, since that maximises the compression
	247	achieved. Compression and decompression speed are virtuÂ
	248	ally unaffected by block size.
	249
	250	Another significant point applies to files which fit in a
	251	single block ââ that means most files youâd encounter
	252	using a large block size. The amount of real memory
	253	touched is proportional to the size of the file, since the
	254	file is smaller than a block. For example, compressing a
	255	file 20,000 bytes long with the flag â9 will cause the
	256	compressor to allocate around 7600k of memory, but only
	257	touch 400k + 20000 * 8 = 560 kbytes of it. Similarly, the
	258	decompressor will allocate 3700k but only touch 100k +
	259	20000 * 4 = 180 kbytes.
	260
	261	Here is a table which summarises the maximum memory usage
	262	for different block sizes. Also recorded is the total
	263	compressed size for 14 files of the Calgary Text CompresÂ
	264	sion Corpus totalling 3,141,622 bytes. This column gives
	265	some feel for how compression varies with block size.
	266	These figures tend to understate the advantage of larger
	267	block sizes for larger files, since the Corpus is domiÂ
	268	nated by smaller files.
	269
	270	Compress Decompress Decompress Corpus
	271	Flag usage usage âs usage Size
	272
	273	â1 1200k 500k 350k 914704
	274	â2 2000k 900k 600k 877703
	275	â3 2800k 1300k 850k 860338
	276	â4 3600k 1700k 1100k 846899
	277	â5 4400k 2100k 1350k 845160
	278	â6 5200k 2500k 1600k 838626
	279	â7 6100k 2900k 1850k 834096
	280	â8 6800k 3300k 2100k 828642
	281	â9 7600k 3700k 2350k 828642
	282
	283
	284	RREECCOOVVEERRIINNGG DDAATTAA FFRROOMM DDAAMMAAGGEEDD FFIILLEESS
	285	_b_z_i_p_2 compresses files in blocks, usually 900kbytes long.
	286	Each block is handled independently. If a media or transÂ
	287	mission error causes a multiâblock .bz2 file to become
	288	damaged, it may be possible to recover data from the
	289	undamaged blocks in the file.
	290
	291	The compressed representation of each block is delimited
	292	by a 48âbit pattern, which makes it possible to find the
	293	block boundaries with reasonable certainty. Each block
	294	also carries its own 32âbit CRC, so damaged blocks can be
	295	distinguished from undamaged ones.
	296
	297	_b_z_i_p_2_r_e_c_o_v_e_r is a simple program whose purpose is to
	298	search for blocks in .bz2 files, and write each block out
	299	into its own .bz2 file. You can then use _b_z_i_p_2 ât to test
	300	the integrity of the resulting files, and decompress those
	301	which are undamaged.
	302
	303	_b_z_i_p_2_r_e_c_o_v_e_r takes a single argument, the name of the damÂ
	304	aged file, and writes a number of files
	305	"rec00001file.bz2", "rec00002file.bz2", etc, containing
	306	the extracted blocks. The output filenames are
	307	designed so that the use of wildcards in subsequent proÂ
	308	cessing ââ for example, "bzip2 âdc rec*file.bz2 > recovÂ
	309	ered_data" ââ processes the files in the correct order.
	310
	311	_b_z_i_p_2_r_e_c_o_v_e_r should be of most use dealing with large .bz2
	312	files, as these will contain many blocks. It is clearly
	313	futile to use it on damaged singleâblock files, since a
	314	damaged block cannot be recovered. If you wish to minÂ
	315	imise any potential data loss through media or transmisÂ
	316	sion errors, you might consider compressing with a smaller
	317	block size.
	318
	319
	320	PPEERRFFOORRMMAANNCCEE NNOOTTEESS
	321	The sorting phase of compression gathers together similar
	322	strings in the file. Because of this, files containing
	323	very long runs of repeated symbols, like "aabaabaabaab
	324	..." (repeated several hundred times) may compress more
	325	slowly than normal. Versions 0.9.5 and above fare much
	326	better than previous versions in this respect. The ratio
	327	between worstâcase and averageâcase compression time is in
	328	the region of 10:1. For previous versions, this figure
	329	was more like 100:1. You can use the âvvvv option to monÂ
	330	itor progress in great detail, if you want.
	331
	332	Decompression speed is unaffected by these phenomena.
	333
	334	_b_z_i_p_2 usually allocates several megabytes of memory to
	335	operate in, and then charges all over it in a fairly ranÂ
	336	dom fashion. This means that performance, both for comÂ
	337	pressing and decompressing, is largely determined by the
	338	speed at which your machine can service cache misses.
	339	Because of this, small changes to the code to reduce the
	340	miss rate have been observed to give disproportionately
	341	large performance improvements. I imagine _b_z_i_p_2 will perÂ
	342	form best on machines with very large caches.
	343
	344
	345	CCAAVVEEAATTSS
	346	I/O error messages are not as helpful as they could be.
	347	_b_z_i_p_2 tries hard to detect I/O errors and exit cleanly,
	348	but the details of what the problem is sometimes seem
	349	rather misleading.
	350
	351	This manual page pertains to version 1.0.3 of _b_z_i_p_2_. ComÂ
	352	pressed data created by this version is entirely forwards
	353	and backwards compatible with the previous public
	354	releases, versions 0.1pl2, 0.9.0, 0.9.5, 1.0.0, 1.0.1 and
	355	1.0.2, but with the following exception: 0.9.0 and above
	356	can correctly decompress multiple concatenated compressed
	357	files. 0.1pl2 cannot do this; it will stop after decomÂ
	358	pressing just the first file in the stream.
	359
	360	_b_z_i_p_2_r_e_c_o_v_e_r versions prior to 1.0.2 used 32âbit integers
	361	to represent bit positions in compressed files, so they
	362	could not handle compressed files more than 512 megabytes
	363	long. Versions 1.0.2 and above use 64âbit ints on some
	364	platforms which support them (GNU supported targets, and
	365	Windows). To establish whether or not bzip2recover was
	366	built with such a limitation, run it without arguments.
	367	In any event you can build yourself an unlimited version
	368	if you can recompile it with MaybeUInt64 set to be an
	369	unsigned 64âbit integer.
	370
	371
	372
	373
	374	AAUUTTHHOORR
	375	Julian Seward, jsewardbzip.org.
	376
	377	http://www.bzip.org
	378
	379	The ideas embodied in _b_z_i_p_2 are due to (at least) the folÂ
	380	lowing people: Michael Burrows and David Wheeler (for the
	381	block sorting transformation), David Wheeler (again, for
	382	the Huffman coder), Peter Fenwick (for the structured codÂ
	383	ing model in the original _b_z_i_p_, and many refinements), and
	384	Alistair Moffat, Radford Neal and Ian Witten (for the
	385	arithmetic coder in the original _b_z_i_p_)_. I am much
	386	indebted for their help, support and advice. See the manÂ
	387	ual in the source distribution for pointers to sources of
	388	documentation. Christian von Roques encouraged me to look
	389	for faster sorting algorithms, so as to speed up compresÂ
	390	sion. Bela Lubkin encouraged me to improve the worstâcase
	391	compression performance. Donna Robinson XMLised the docuÂ
	392	mentation. The bz* scripts are derived from those of GNU
	393	gzip. Many people sent patches, helped with portability
	394	problems, lent machines, gave advice and were generally
	395	helpful.
	396
	397
	398
	399	bzip2(1)

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Original Format