TryHackMe · Cryptography

Hashing — Crypto 101

2026-03-19 EASY Cryptography

hashing salting MD5 SHA-256 bcrypt hashcat john rainbow-tables HMAC integrity

Hello everyone, this is NotAlive.
On this page, I will be summarizing the Hashing - Crypto 101 room from TryHackMe.
If you are new, I would recommend not reading this, as it is mainly meant for revision.
Let's get started.

> Table of Contents

Key Terminology
What is Hashing?
Hash Collisions & The Pigeonhole Principle
Password Storage & Why It Matters
Salting — Defeating Rainbow Tables
Identifying Hash Types
Cracking Hashes
Integrity Checking & HMAC

1. Key Terminology

Plaintext

Data before encryption or hashing. Often text, but can be any file — a photograph, binary, etc.

Encoding

NOT encryption. Just a data representation format (e.g. base64, hex). Immediately and trivially reversible — like translation.

Decoding

It is not a process of decryption; rather, it involves reversing a data representation format, such as converting from base64 to plaintext.

Hash

The fixed-size output of a hash function. Also used as a verb — "to hash" means to produce the digest of some input data.

Brute Force

Attacking cryptography by systematically trying every possible password or key combination.

Cryptanalysis

Attacking cryptography by finding and exploiting a mathematical weakness in the underlying algorithm.

Note

Encoding (base64, hex) is NOT a security measure. Anyone can reverse it instantly. Never confuse encoding with encryption or hashing.

2. What is Hashing?

A hash function takes input data of any size and produces a fixed-size output called a digest. Unlike encryption, it uses no key and is designed to be strictly one-way — reconstructing the original input from the output should be computationally infeasible.

⚡

Fast to Compute

Good hash functions are designed to be computed quickly, enabling efficient verification at scale.

🔒

One-Way

The original input cannot be feasibly reconstructed from the hash output alone.

🌊

Avalanche Effect

A single bit change in the input produces a dramatically different output — unpredictable and non-linear.

📏

Fixed Output Size

Regardless of input length, the output digest is always the same length for a given algorithm.

The raw binary output is typically encoded as hexadecimal or base64 for display. Decoding this representation does not reveal the original input — it's just presentation.

How It Works in Login Systems

When you log in to a service, your password is never stored directly. The system stores its hash. On login, it hashes what you typed and compares against the stored hash. If they match — authenticated, without ever exposing the real password.

$ echo -n "password123" | sha256sum
ef92b778bafe771e89245b89ecbc08a44a4e166c06659911881f383d4473e94f

$ echo -n "password124" | sha256sum
7ee5a5c90c4fd9f6b04b2f92bc34f54e4b7d55aee6827a09f7e4a22ee48f0e0

3. Hash Collisions & The Pigeonhole Principle

A hash collision occurs when two completely different inputs produce the same hash output. Since hash functions take infinite possible inputs and produce finite outputs, collisions are mathematically unavoidable.

The Pigeonhole Principle

If you have more pigeons than pigeonholes, at least one pigeonhole must contain more than one pigeon. Since there are more possible inputs than possible hash outputs, some inputs must share the same output. It's not a flaw, it's math.

The goal of secure hash algorithms is not to eliminate collisions (impossible), but to make intentional / engineered collisions computationally infeasible for an attacker to produce.

Algorithm Status

Algorithm	Output Size	Collision Status	Safe for Security Use?
MD5	128-bit	Collisions proven	❌ No
SHA-1	160-bit	Collisions proven	❌ No
SHA-256	256-bit	No known collision	✅ Yes
SHA-512	512-bit	No known collision	✅ Yes
bcrypt	184-bit	No known collision	✅ Yes (passwords)

Interesting

No attack has ever produced the same engineered collision in both MD5 and SHA-1 simultaneously. Combining both hashes can still help detect differences, though modern algorithms like SHA-256 are always preferred.

4. Password Storage & Why It Matters

💀

Plaintext Storage

Catastrophic. Any database breach immediately exposes every user's password. Never acceptable.

⚠️

Encrypted Storage

Requires storing the decryption key. If the key is compromised, all passwords are decryptable at once.

✅

Hashed Storage

One-way. No key to steal. Each password must be cracked individually. The correct approach.

Real-World Breach — rockyou.txt

The infamous rockyou.txt wordlist originates from the RockYou breach where millions of passwords were stored in plaintext and dumped publicly. It is now the most commonly used wordlist in password cracking attacks — the textbook example of why plaintext storage is unacceptable.

Where Hashes Are Stored

Linux — /etc/shadow, readable by root only.
Windows — the SAM (Security Account Manager) database. Hashes are NTLM-based and look visually similar to MD4/MD5, so context is needed to identify them.

$ cat /etc/shadow

username:$6$rounds=5000$salt$hash:days:min:max:warn:inactive:expire:

5. Salting — Defeating Rainbow Tables

The Rainbow Table Problem

If two users have the same password, they have the same hash. An attacker with a rainbow table — a precomputed list of hash-to-plaintext mappings — can look up millions of hashes instantly without computing anything. Sites like CrackStation use massive rainbow tables for exactly this.

The Salt Solution

A salt is a random value generated uniquely per user and combined with their password before hashing. Even if two users have the same password, their salts differ — producing completely different hashes. The salt is stored alongside the hash (it doesn't need to be secret) and its purpose is to guarantee uniqueness and invalidate precomputed tables.

How It Works

hash( password + salt ) → unique hash per user, even for identical passwords. Rainbow tables become useless because they would need to be recomputed for every possible salt value.

User A: hash( "hunter2" + "xK9mQ2" ) = a1b2c3d4e5f6...
User B: hash( "hunter2" + "pL7nR1" ) = f9e8d7c6b5a4...

bcrypt      → auto-salted, slow by design (resistant to GPU cracking)
sha512crypt → auto-salted, configurable rounds

6. Identifying Hash Types

Tools like hashID can help, but they aren't always reliable — MD5, MD4, and NTLM hashes can look visually identical. The most reliable method combines: tools + hash length/prefix + context (where did the hash come from?).

Unix-Style Hash Prefixes

Unix/Linux password hashes include a prefix that identifies the algorithm, following the format:

$format$rounds$salt$hash
── format identifier ── optional rounds ── per-user salt ── the actual hash ──

Prefix	Algorithm	Notes
$1$	MD5crypt	Weak — avoid
$2$ / $2a$ / $2b$	bcrypt	Strong — recommended for passwords
$5$	SHA-256crypt	Acceptable
$6$	SHA-512crypt	Strong — common on modern Linux

Context is Everything

A 32-character hex hash could be MD5, NTLM, or MD4. If you pulled it from a web app database, it's likely MD5. If it's from a Windows SAM dump, it's NTLM. Tools alone can't determine this — context does.

7. Cracking Hashes

Since hashes cannot be decrypted, cracking requires brute-force or dictionary attacks: take a candidate password, apply the hash function (with salt if present), compare to the target. Repeat for millions of candidates.

GPUs are preferred because they have thousands of parallel cores and can compute billions of hashes per second. Running cracking tools inside a VM is usually slower since VMs often lack direct GPU access — always prefer running on the host system.

Hashcat & John the Ripper

$ hashcat -m 0 -a 0 hash.txt /usr/share/wordlists/rockyou.txt

$ hashcat -m 1800 hash.txt rockyou.txt -r rules/best64.rule

$ john --wordlist=/usr/share/wordlists/rockyou.txt hash.txt

bcrypt is Special

bcrypt is intentionally slow and designed to resist GPU acceleration. This is a feature, not a bug — it makes brute-force attacks orders of magnitude more expensive, even with powerful hardware.

8. Integrity Checking & HMAC

Data Integrity

Because the same input always produces the same hash, hashing verifies that data has not been altered. Even a single bit change produces a completely different digest.

Verifying downloaded file integrity
Detecting tampering in transmitted data
Identifying duplicate files (same hash = identical files)

$ sha256sum kali-linux-2024.iso
abc123...def456  kali-linux-2024.iso

HMAC — Hash-Based Message Authentication Code

HMAC combines a hash function with a secret key, providing two guarantees that plain hashing cannot:

🔐

Integrity

Data hasn't been modified since it was signed. Same guarantee as regular hashing.

🪪

Authenticity

The message came from someone who holds the secret key. This is what plain hashing cannot provide.

Real-World Use

VPNs commonly use HMAC-SHA512 to verify secure communication — ensuring messages are neither modified in transit nor sent by an impersonator. Only a party with the shared secret key can generate a valid HMAC.

$ echo -n "message" | sha256sum
ab530a13e45914982b79f9b7e3fba994...

$ echo -n "message" | openssl dgst -sha256 -hmac "secretkey"
HMAC-SHA256(stdin)= 8b5f48702995c1598c573db1e21866a9...