Archiving Data


A file-archiving tool collects a group of files into a single package file that you can easily move around.

  • Linux has several

tar

tar is a program used to archive various data files into a single file, called an archive file.

  • stands for tape archiver
  • original files remain on disk
  • archive file is compressed on the fly into a tarball
    • often used for transferring multiple files between computers in one step
  • -c used to create a new archive
  • -x to extract the archive
  • -t list contents of an archive file to stdout

Compression

gzip, bzip2, and xz programs on Linux are used to compress individual files.

  • generally, gzip provides the least compression, xz the most
  • tar supports all three compression standards
    • uses unique filename extensions for each standard:
      • .tgz for tarballs compressed with gzip
      • .tbz or .tbz2 for tarballs compressed with bzip2
      • .txz for tarballs compressed with xz
Compression programUncompression programFilename extensiontar Option
gzipgunzip.gz-z
bzip2bunzip2.bz2-j
xzunxz.xz-J
  • these all apply lossless compression
    • data recovered by uncompressing the file is identical to the original
  • some graphics, audio, and audiovisual files apply lossy compression
    • some data is discarded
    • these tools should never be used on program files, system configuration files, or most user data files
  • tar supports only lossless compression

zip

zip is an archive and compression tool similar to tarballs, commonly used outside of Unix and Linux.

  • also available on Linux
  • zip files have extension of .zip
  • syntax: $ zip newsip.zip afile.txt figure.tiff
  • use unzip to uncompress
  • -l to list files within archive without uncompressing
  • -r to recurse through directories
  • -0-9 to set the compression amount