TryHackMe  ·  Cryptography

Hashing — Crypto 101

Hello everyone, this is NotAlive.
On this page, I will be summarizing the Hashing - Crypto 101 room from TryHackMe.
If you are new, I would recommend not reading this, as it is mainly meant for revision.
Let's get started.

> Table of Contents
  1. Key Terminology
  2. What is Hashing?
  3. Hash Collisions & The Pigeonhole Principle
  4. Password Storage & Why It Matters
  5. Salting — Defeating Rainbow Tables
  6. Identifying Hash Types
  7. Cracking Hashes
  8. Integrity Checking & HMAC

1. Key Terminology

Plaintext

Data before encryption or hashing. Often text, but can be any file — a photograph, binary, etc.

Encoding

NOT encryption. Just a data representation format (e.g. base64, hex). Immediately and trivially reversible — like translation.

Hash

The fixed-size output of a hash function. Also used as a verb — "to hash" means to produce the digest of some input data.

Brute Force

Attacking cryptography by systematically trying every possible password or key combination.

Cryptanalysis

Attacking cryptography by finding and exploiting a mathematical weakness in the underlying algorithm.

Note
Encoding (base64, hex) is NOT a security measure. Anyone can reverse it instantly. Never confuse encoding with encryption or hashing.

2. What is Hashing?

A hash function takes input data of any size and produces a fixed-size output called a digest. Unlike encryption, it uses no key and is designed to be strictly one-way — reconstructing the original input from the output should be computationally infeasible.

Fast to Compute

Good hash functions are designed to be computed quickly, enabling efficient verification at scale.

🔒
One-Way

The original input cannot be feasibly reconstructed from the hash output alone.

🌊
Avalanche Effect

A single bit change in the input produces a dramatically different output — unpredictable and non-linear.

📏
Fixed Output Size

Regardless of input length, the output digest is always the same length for a given algorithm.

The raw binary output is typically encoded as hexadecimal or base64 for display. Decoding this representation does not reveal the original input — it's just presentation.

How It Works in Login Systems

When you log in to a service, your password is never stored directly. The system stores its hash. On login, it hashes what you typed and compares against the stored hash. If they match — authenticated, without ever exposing the real password.

$ echo -n "password123" | sha256sum
ef92b778bafe771e89245b89ecbc08a44a4e166c06659911881f383d4473e94f

$ echo -n "password124" | sha256sum
7ee5a5c90c4fd9f6b04b2f92bc34f54e4b7d55aee6827a09f7e4a22ee48f0e0

3. Hash Collisions & The Pigeonhole Principle

A hash collision occurs when two completely different inputs produce the same hash output. Since hash functions take infinite possible inputs and produce finite outputs, collisions are mathematically unavoidable.

The Pigeonhole Principle
If you have more pigeons than pigeonholes, at least one pigeonhole must contain more than one pigeon. Since there are more possible inputs than possible hash outputs, some inputs must share the same output. It's not a flaw, it's math.

The goal of secure hash algorithms is not to eliminate collisions (impossible), but to make intentional / engineered collisions computationally infeasible for an attacker to produce.

Algorithm Status

Algorithm Output Size Collision Status Safe for Security Use?
MD5 128-bit Collisions proven ❌ No
SHA-1 160-bit Collisions proven ❌ No
SHA-256 256-bit No known collision ✅ Yes
SHA-512 512-bit No known collision ✅ Yes
bcrypt 184-bit No known collision ✅ Yes (passwords)
Interesting
No attack has ever produced the same engineered collision in both MD5 and SHA-1 simultaneously. Combining both hashes can still help detect differences, though modern algorithms like SHA-256 are always preferred.

4. Password Storage & Why It Matters

💀
Plaintext Storage

Catastrophic. Any database breach immediately exposes every user's password. Never acceptable.

⚠️
Encrypted Storage

Requires storing the decryption key. If the key is compromised, all passwords are decryptable at once.

Hashed Storage

One-way. No key to steal. Each password must be cracked individually. The correct approach.

Real-World Breach — rockyou.txt
The infamous rockyou.txt wordlist originates from the RockYou breach where millions of passwords were stored in plaintext and dumped publicly. It is now the most commonly used wordlist in password cracking attacks — the textbook example of why plaintext storage is unacceptable.

Where Hashes Are Stored

$ cat /etc/shadow

username:$6$rounds=5000$salt$hash:days:min:max:warn:inactive:expire:

5. Salting — Defeating Rainbow Tables

The Rainbow Table Problem

If two users have the same password, they have the same hash. An attacker with a rainbow table — a precomputed list of hash-to-plaintext mappings — can look up millions of hashes instantly without computing anything. Sites like CrackStation use massive rainbow tables for exactly this.

The Salt Solution

A salt is a random value generated uniquely per user and combined with their password before hashing. Even if two users have the same password, their salts differ — producing completely different hashes. The salt is stored alongside the hash (it doesn't need to be secret) and its purpose is to guarantee uniqueness and invalidate precomputed tables.

How It Works
hash( password + salt ) → unique hash per user, even for identical passwords. Rainbow tables become useless because they would need to be recomputed for every possible salt value.
User A: hash( "hunter2" + "xK9mQ2" ) = a1b2c3d4e5f6...
User B: hash( "hunter2" + "pL7nR1" ) = f9e8d7c6b5a4...

bcrypt      → auto-salted, slow by design (resistant to GPU cracking)
sha512crypt → auto-salted, configurable rounds

6. Identifying Hash Types

Tools like hashID can help, but they aren't always reliable — MD5, MD4, and NTLM hashes can look visually identical. The most reliable method combines: tools + hash length/prefix + context (where did the hash come from?).

Unix-Style Hash Prefixes

Unix/Linux password hashes include a prefix that identifies the algorithm, following the format:

$format$rounds$salt$hash
── format identifier ── optional rounds ── per-user salt ── the actual hash ──
Prefix Algorithm Notes
$1$ MD5crypt Weak — avoid
$2$ / $2a$ / $2b$ bcrypt Strong — recommended for passwords
$5$ SHA-256crypt Acceptable
$6$ SHA-512crypt Strong — common on modern Linux
Context is Everything
A 32-character hex hash could be MD5, NTLM, or MD4. If you pulled it from a web app database, it's likely MD5. If it's from a Windows SAM dump, it's NTLM. Tools alone can't determine this — context does.

7. Cracking Hashes

Since hashes cannot be decrypted, cracking requires brute-force or dictionary attacks: take a candidate password, apply the hash function (with salt if present), compare to the target. Repeat for millions of candidates.

GPUs are preferred because they have thousands of parallel cores and can compute billions of hashes per second. Running cracking tools inside a VM is usually slower since VMs often lack direct GPU access — always prefer running on the host system.

Hashcat & John the Ripper

$ hashcat -m 0 -a 0 hash.txt /usr/share/wordlists/rockyou.txt

$ hashcat -m 1800 hash.txt rockyou.txt -r rules/best64.rule

$ john --wordlist=/usr/share/wordlists/rockyou.txt hash.txt
bcrypt is Special
bcrypt is intentionally slow and designed to resist GPU acceleration. This is a feature, not a bug — it makes brute-force attacks orders of magnitude more expensive, even with powerful hardware.

8. Integrity Checking & HMAC

Data Integrity

Because the same input always produces the same hash, hashing verifies that data has not been altered. Even a single bit change produces a completely different digest.

$ sha256sum kali-linux-2024.iso
abc123...def456  kali-linux-2024.iso

HMAC — Hash-Based Message Authentication Code

HMAC combines a hash function with a secret key, providing two guarantees that plain hashing cannot:

🔐
Integrity

Data hasn't been modified since it was signed. Same guarantee as regular hashing.

🪪
Authenticity

The message came from someone who holds the secret key. This is what plain hashing cannot provide.

Real-World Use
VPNs commonly use HMAC-SHA512 to verify secure communication — ensuring messages are neither modified in transit nor sent by an impersonator. Only a party with the shared secret key can generate a valid HMAC.
$ echo -n "message" | sha256sum
ab530a13e45914982b79f9b7e3fba994...

$ echo -n "message" | openssl dgst -sha256 -hmac "secretkey"
HMAC-SHA256(stdin)= 8b5f48702995c1598c573db1e21866a9...
← Back to TryHackMe   ·   NotAlive · 2026   ·   Room Link