Archiving Data
A file-archiving tool collects a group of files into a single package file that you can easily move around.
- Linux has several
tar
tar is a program used to archive various data files into a single file, called an archive file.
- stands for tape archiver
- original files remain on disk
- archive file is compressed on the fly into a tarball
- often used for transferring multiple files between computers in one step
-cused to create a new archive-xto extract the archive-tlist contents of an archive file to stdout
Compression
gzip, bzip2, and xz programs on Linux are used to compress individual files.
- generally,
gzipprovides the least compression,xzthe most tarsupports all three compression standards- uses unique filename extensions for each standard:
.tgzfor tarballs compressed withgzip.tbzor.tbz2for tarballs compressed withbzip2.txzfor tarballs compressed withxz
- uses unique filename extensions for each standard:
| Compression program | Uncompression program | Filename extension | tar Option |
|---|---|---|---|
gzip | gunzip | .gz | -z |
bzip2 | bunzip2 | .bz2 | -j |
xz | unxz | .xz | -J |
- these all apply lossless compression
- data recovered by uncompressing the file is identical to the original
- some graphics, audio, and audiovisual files apply lossy compression
- some data is discarded
- these tools should never be used on program files, system configuration files, or most user data files
tarsupports only lossless compression
zip
zip is an archive and compression tool similar to tarballs, commonly used outside of Unix and Linux.
- also available on Linux
- zip files have extension of
.zip - syntax:
$ zip newsip.zip afile.txt figure.tiff - use
unzipto uncompress -lto list files within archive without uncompressing-rto recurse through directories-0-9to set the compression amount