Single-byte XOR cipher¶
The hex encoded string:
... has been XOR'd against a single character. Find the key, decrypt the message.
You can do this by hand. But don't: write code to do it for you.
How? Devise some method for "scoring" a piece of English plaintext. Character frequency is a good metric. Evaluate each output and choose the one with the best score.
Solution¶
1. Single-byte XOR Function¶
def single_byte_xor(input_bytes: bytes, key: int) -> bytes:
"""Perform a single-byte XOR operation on a byte string."""
if not (0 <= key <= 255):
raise ValueError("Key must be int (0-255).")
return bytes(b ^ key for b in input_bytes)
2. Text Scoring Algorithm¶
The scoring function uses English letter frequency analysis:
def score_text(text: bytes) -> float:
"""Score a byte string based on letter frequency analysis."""
# Count letter frequencies in the text
counts_text = Counter()
for letter in lowercase_frequencies:
counts_text[letter] = text.count(letter.encode())
total = sum(counts_text.values())
if total == 0:
return float("inf")
# Calculate frequency differences from expected English frequencies
frequencies_text = {letter: counts_text[letter] / total for letter in lowercase_frequencies}
errors = {abs(lowercase_frequencies[letter] - frequencies_text[letter]) for letter in lowercase_frequencies}
score = sum(errors)
return score
The scoring compares the frequency of letters in the decrypted text against known English letter frequencies. A lower score indicates text that's more likely to be English.
3. Brute Force Key Guessing¶
def guess_single_key_xor(ciphertext: bytes) -> Guess:
"""Guess the single-byte XOR key for a given ciphertext."""
best_guess = Guess.empty()
for key in range(256):
current_guess = Guess(ciphertext, key)
best_guess = min(best_guess, current_guess)
return best_guess
4. Optimized Quick Guess¶
For faster results, there's also a heuristic approach:
def quick_guess_single_byte_xor(ciphertext: bytes) -> Guess:
"""Quickly guess a single-byte XOR key using frequency heuristics."""
frequencies = Counter(ciphertext)
most_common_byte = frequencies.most_common(1)[0][0]
common_chars = set(" etaoinshrdlu") # Most common English characters
best_guess = Guess.empty()
for char in common_chars:
current_guess = Guess(ciphertext, most_common_byte ^ ord(char))
best_guess = min(best_guess, current_guess)
return best_guess
This assumes the most frequent byte in the ciphertext corresponds to a common English character.