Self-attention
Mathematical analysis of transformer self-attention as an ML architecture — scaling laws, softmax concentration, inverse-temperature, attention-score families. Complements the FP/RM spectral viewpoint via `[[self-attention-scaling-theme]]`.
Findings (1)
Connections
This topic …
bridgesSelf-attention spectra