Hashing — Crypto 101
Hello everyone, this is NotAlive.
On this page, I will be summarizing the Hashing - Crypto 101 room from TryHackMe.
If you are new, I would recommend not reading this, as it is mainly meant for revision.
Let's get started.
1. Key Terminology
Data before encryption or hashing. Often text, but can be any file — a photograph, binary, etc.
NOT encryption. Just a data representation format (e.g. base64, hex). Immediately and trivially reversible — like translation.
The fixed-size output of a hash function. Also used as a verb — "to hash" means to produce the digest of some input data.
Attacking cryptography by systematically trying every possible password or key combination.
Attacking cryptography by finding and exploiting a mathematical weakness in the underlying algorithm.
2. What is Hashing?
A hash function takes input data of any size and produces a fixed-size output called a digest. Unlike encryption, it uses no key and is designed to be strictly one-way — reconstructing the original input from the output should be computationally infeasible.
Good hash functions are designed to be computed quickly, enabling efficient verification at scale.
The original input cannot be feasibly reconstructed from the hash output alone.
A single bit change in the input produces a dramatically different output — unpredictable and non-linear.
Regardless of input length, the output digest is always the same length for a given algorithm.
The raw binary output is typically encoded as hexadecimal or base64 for display. Decoding this representation does not reveal the original input — it's just presentation.
How It Works in Login Systems
When you log in to a service, your password is never stored directly. The system stores its hash. On login, it hashes what you typed and compares against the stored hash. If they match — authenticated, without ever exposing the real password.
$ echo -n "password123" | sha256sum
ef92b778bafe771e89245b89ecbc08a44a4e166c06659911881f383d4473e94f
$ echo -n "password124" | sha256sum
7ee5a5c90c4fd9f6b04b2f92bc34f54e4b7d55aee6827a09f7e4a22ee48f0e0
3. Hash Collisions & The Pigeonhole Principle
A hash collision occurs when two completely different inputs produce the same hash output. Since hash functions take infinite possible inputs and produce finite outputs, collisions are mathematically unavoidable.
The goal of secure hash algorithms is not to eliminate collisions (impossible), but to make intentional / engineered collisions computationally infeasible for an attacker to produce.
Algorithm Status
| Algorithm | Output Size | Collision Status | Safe for Security Use? |
|---|---|---|---|
| MD5 | 128-bit | Collisions proven | ❌ No |
| SHA-1 | 160-bit | Collisions proven | ❌ No |
| SHA-256 | 256-bit | No known collision | ✅ Yes |
| SHA-512 | 512-bit | No known collision | ✅ Yes |
| bcrypt | 184-bit | No known collision | ✅ Yes (passwords) |
4. Password Storage & Why It Matters
Catastrophic. Any database breach immediately exposes every user's password. Never acceptable.
Requires storing the decryption key. If the key is compromised, all passwords are decryptable at once.
One-way. No key to steal. Each password must be cracked individually. The correct approach.
Where Hashes Are Stored
- Linux —
/etc/shadow, readable by root only. - Windows — the SAM (Security Account Manager) database. Hashes are NTLM-based and look visually similar to MD4/MD5, so context is needed to identify them.
$ cat /etc/shadow
username:$6$rounds=5000$salt$hash:days:min:max:warn:inactive:expire:
5. Salting — Defeating Rainbow Tables
The Rainbow Table Problem
If two users have the same password, they have the same hash. An attacker with a rainbow table — a precomputed list of hash-to-plaintext mappings — can look up millions of hashes instantly without computing anything. Sites like CrackStation use massive rainbow tables for exactly this.
The Salt Solution
A salt is a random value generated uniquely per user and combined with their password before hashing. Even if two users have the same password, their salts differ — producing completely different hashes. The salt is stored alongside the hash (it doesn't need to be secret) and its purpose is to guarantee uniqueness and invalidate precomputed tables.
hash( password + salt ) → unique hash per user, even for identical passwords. Rainbow tables become useless because they would need to be recomputed for every possible salt value.
User A: hash( "hunter2" + "xK9mQ2" ) = a1b2c3d4e5f6...
User B: hash( "hunter2" + "pL7nR1" ) = f9e8d7c6b5a4...
bcrypt → auto-salted, slow by design (resistant to GPU cracking)
sha512crypt → auto-salted, configurable rounds
6. Identifying Hash Types
Tools like hashID can help, but they aren't always reliable — MD5, MD4, and NTLM hashes can look visually identical. The most reliable method combines: tools + hash length/prefix + context (where did the hash come from?).
Unix-Style Hash Prefixes
Unix/Linux password hashes include a prefix that identifies the algorithm, following the format:
── format identifier ── optional rounds ── per-user salt ── the actual hash ──
| Prefix | Algorithm | Notes |
|---|---|---|
$1$ |
MD5crypt | Weak — avoid |
$2$ / $2a$ / $2b$ |
bcrypt | Strong — recommended for passwords |
$5$ |
SHA-256crypt | Acceptable |
$6$ |
SHA-512crypt | Strong — common on modern Linux |
7. Cracking Hashes
Since hashes cannot be decrypted, cracking requires brute-force or dictionary attacks: take a candidate password, apply the hash function (with salt if present), compare to the target. Repeat for millions of candidates.
GPUs are preferred because they have thousands of parallel cores and can compute billions of hashes per second. Running cracking tools inside a VM is usually slower since VMs often lack direct GPU access — always prefer running on the host system.
Hashcat & John the Ripper
$ hashcat -m 0 -a 0 hash.txt /usr/share/wordlists/rockyou.txt
$ hashcat -m 1800 hash.txt rockyou.txt -r rules/best64.rule
$ john --wordlist=/usr/share/wordlists/rockyou.txt hash.txt
8. Integrity Checking & HMAC
Data Integrity
Because the same input always produces the same hash, hashing verifies that data has not been altered. Even a single bit change produces a completely different digest.
- Verifying downloaded file integrity
- Detecting tampering in transmitted data
- Identifying duplicate files (same hash = identical files)
$ sha256sum kali-linux-2024.iso
abc123...def456 kali-linux-2024.iso
HMAC — Hash-Based Message Authentication Code
HMAC combines a hash function with a secret key, providing two guarantees that plain hashing cannot:
Data hasn't been modified since it was signed. Same guarantee as regular hashing.
The message came from someone who holds the secret key. This is what plain hashing cannot provide.
$ echo -n "message" | sha256sum
ab530a13e45914982b79f9b7e3fba994...
$ echo -n "message" | openssl dgst -sha256 -hmac "secretkey"
HMAC-SHA256(stdin)= 8b5f48702995c1598c573db1e21866a9...