Cryptographic Hash


A cryptographic hash algorithm produces a fixed-length string of bits from an input plaintext that can be of any length.

  • aka keyless cryptography or hash function
  • produces a digital fingerprint as output
    • output value is called a hash or checksum or digest or tag
    • messages encrypted this way are called message digests
  • doesn’t use a key
  • properties of a hash function:
    1. different inputs are unlikely to produce the same output (collision)
      • one bit change in input produces a completely different hash
      • a one or more bit change in input changes the output in about half of its bits
    2. it is impossible to recover the plaintext data from the digest (one-way)

How It Works

  • hashes can be thought of as fingerprints because they’re unique identifiers of a message
  • hashes of similar messages look completely different
  • can’t be used used to discover the contents of the original message
    • but can use to see of the original message has been changed
    • send the hash with a file or message, then receiver can use the same hash function to see if the hashes are the same

Collision is a matching hash for two different sets of data.

  • theoretically possible to engineer this, but difficult and usually only happens when using a broken algorithm
  • Message-Digest algorithm 5 (MD5) and SHA-1 have been attacked in this fashion
  • when a collision occurs, you generally stop using the algorithm

Properties of Hash Functions

  • should have the following properties so that security or performance is not compromised:
    • Determinism
      • the same input always results in the same output
    • Efficiency
      • computation of the hash value should be low
    • Security
      • should be practically irreversible (i.e., one way) in order to be attack resistant

Preimage Resistance

Preimage resistance property means given a hash value , it should be computationally infeasible to find an input value that hashes to .

  • aka one-way property
  • finding the original message from a message digest is computationally infeasible
  • protects against an attacker who has a hash and wants to find the associated data

Second preimage resistance property means given an input , it is computationally infeasible to find an input distinct from with the same hash.

  • aka weak collision resistance
  • finding a second input that has the same digest as any other specified input is computationally infeasible
  • i.e. it is not feasible to modify a message without changing its hash value
  • protects against an attacker who has an input value and its hash and wants to substitute a forged value to the original input value
  • preserves data integrity

Collision Resistance

Collision resistance property means it should be computationally infeasible to find any two inputs that result in the same hash.

  • aka strong collision resistance
  • if a function is collision resistant, it is also second preimage resistant
  • prevents creating two distinct data with the same hash

Avalanche Effect

Avalanche effect property means that a small change to a message should result in a digest that is significantly different and uncorrelated with the digest of the original message.

Uses

  • to prove integrity
  • for password storage and authentication solutions
  • anonymization of user (subject) IDs and credentials
  • Error detection and integrity checking for files, messages, or data blocks
  • Efficient table or database lookup
    • compressing or collapsing a large set of possible values into a smaller but unique set of lookup key values makes the search space much smaller
  • Secure message digests or hashes
  • digital signatures
  • message authentication codes (MAC)

Integrity Verification Using a Hash Function

Hash Algorithms