Information Theory Basics
The quantification of information, surprise, and entropy
Origins
Claude Shannon’s 1948 paper “A Mathematical Theory of Communication” founded information theory. Working at Bell Labs, he wanted to optimize telephone networks. The result was a general theory applicable to communication in any medium.
“The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point.” — Shannon
Core Concepts
Information as Surprise
Information is not meaning — it’s surprise or uncertainty reduction.
- Certain event (sun will rise): 0 information
- Likely event (rain in Seattle): little information
- Unlikely event (snow in Sahara): lots of information
The less expected, the more informative.
The Bit
The basic unit of information is the bit (binary digit).
- One bit = the information in one yes/no decision
- A fair coin flip: 1 bit
- Two coin flips: 2 bits
- n coin flips: n bits
Logarithmic scale:
Information = -log₂(probability)
| Probability | Information (bits) |
|---|---|
| 1 (certain) | 0 |
| 0.5 | 1 |
| 0.25 | 2 |
| 0.125 | 3 |
| 0.001 | ~10 |
Entropy
Entropy = average information content = uncertainty
The entropy H of a probability distribution:
H = -Σ p(x) × log₂(p(x))
Examples
- Fair coin: H = 1 bit (maximum uncertainty)
- Always heads: H = 0 bits (no uncertainty)
- Biased coin (75% heads): H = 0.81 bits (some uncertainty)
Maximum Entropy
Entropy is maximized when all outcomes are equally likely.
- Uniform distribution = maximum uncertainty
- Skewed distribution = less uncertainty
Channel Capacity
The channel capacity is the maximum rate at which information can be transmitted reliably over a channel.
The Noisy Channel
[Source] → [Encoder] → [Channel (noise)] → [Decoder] → [Destination]
Key insight: Even with noise, reliable communication is possible through error correction and redundancy.
Shannon’s Limit
There is a fundamental limit to communication rate given noise. You cannot transmit more information than the channel capacity allows.
Applications Beyond Communication
Thermodynamics
Entropy in physics is closely related to information entropy. Landauer’s principle: erasing information generates heat.
Biology
- DNA as information storage (4 bases = 2 bits per base pair)
- Genetic code: 64 codons encode 20 amino acids (redundancy/error correction)
- Neural coding: How much information does a spike carry?
Machine Learning
- Cross-entropy as loss function
- Information gain in decision trees
- Mutual information for feature selection
- KL divergence between distributions
Cognition
- Attention as information bottleneck
- Working memory capacity (~4 chunks = limited bits)
- Surprise drives learning (prediction error)
- Compression and abstraction
Key Relationships
Redundancy
Redundant messages are compressible:
- “The q u i c k…” → predictable → low information
- Used for error correction (if you know what should come, you can detect errors)
Compression
- Lossless: All information preserved (ZIP, PNG)
- Lossy: Some information discarded (JPEG, MP3)
- Good compression = finding and removing redundancy
Mutual Information
How much information X gives you about Y:
- High mutual information: Knowing X tells you a lot about Y
- Zero mutual information: X and Y are independent
In Nosos
Memory as Information Management
- Storage: Compress experiences to key information
- Retrieval: Reconstruct from stored cues
- Search: Find relevant information given query
- Forgetting: Lossy compression (keep gist, lose details)
Semantic Indexing
The vector database approach:
- High-dimensional semantic space
- Nearby vectors = similar information
- Search = finding closest points
Conversation as Channel
- Kristopher (source) → Language (channel) → Nosos (destination)
- Noise: Ambiguity, missing context, assumed knowledge
- Error correction: Clarification questions, restatement
Related Concepts
- Availability Heuristic — Information availability affects judgment
- Confirmation Bias — Reducing surprise, maintaining low entropy
- Law of Requisite Variety — Matching information capacity to system complexity
- Autopoiesis — Self-maintaining information boundaries
References
- Shannon, C.E. (1948). A mathematical theory of communication
- Shannon, C.E. & Weaver, W. (1949). The Mathematical Theory of Communication
- Cover, T.M. & Thomas, J.A. (2006). Elements of Information Theory
- Gleick, J. (2011). The Information: A History, a Theory, a Flood
Information is the resolution of uncertainty. The surprise that shapes us. 📊