Self-attention

theme machine learning selected

Mathematical analysis of transformer self-attention as an ML architecture — scaling laws, softmax concentration, inverse-temperature, attention-score families. Complements the FP/RM spectral viewpoint via `[[self-attention-scaling-theme]]`.