Machine learning
subtree 9 descendants 7 findings 1 note
DNN theory through random-matrix lenses — free random projection, dynamical isometry, meta-RL, training dynamics.
Members (9)
- themeDynamical isometry
Conditions under which signal propagation in deep networks preserves norms and gradients; spectral analysis of layerwise Jacobians and Fisher information.
Featured: The Spectrum of Fisher Information of Deep Networks Achieving Dynamical Isometry
- methodOrthogonal initialization
Initializing weight matrices as random orthogonal matrices to preserve singular values.
- themeDNN architectures as random-matrix systems
Reading deep architectures (MLP-Mixer, attention, sparse MLPs) through random-matrix and Kronecker-structure lenses to expose implicit regularization.
Featured: Understanding MLP-Mixer as a Wide and Sparse MLP
- methodFree Random Projection
Random representation-based projection method for in-context and meta-reinforcement learning.
Featured: Free Random Projection for In-Context Reinforcement Learning
- themeMeta reinforcement learning
Learning algorithms that adapt to new tasks from limited interaction.
Featured: Free Random Projection for In-Context Reinforcement Learning
- themeReinforcement learning
Sequential decision-making under uncertainty — the umbrella over meta-RL adaptive learning and the VR-scene exploration policies in the adjacent thread.
- themeSelf-attention
Mathematical analysis of transformer self-attention as an ML architecture — scaling laws, softmax concentration, inverse-temperature, attention-score families. Complements the FP/RM spectral viewpoint via `[[self-attention-scaling-theme]]`.
Featured: A Unified Framework for Critical Scaling of Inverse Temperature in Self-Attention
- themeInterpretability and training dynamics
Layer-wise interpretability via identity initialization, implicit bias of gradient regularization, and selective forgetting / unlearning.
Featured: Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias
- methodState-space models
Structured state-space sequence models (S4 / S5) — linear-time recurrent backbones for long-context sequence modeling, adopted in in-context RL as an alternative to transformer attention.
Findings — subtree (7)
Papers (7)
- A Unified Framework for Critical Scaling of Inverse Temperature in Self-Attention
- Free Random Projection for In-Context Reinforcement Learning
- Understanding MLP-Mixer as a Wide and Sparse MLP
- Understanding Gradient Regularization in Deep Learning: Efficient Finite-Difference Computation and Implicit Bias
- Layer-Wise Interpretation of Deep Neural Networks Using Idneity Initialization
- The Spectrum of Fisher Information of Deep Networks Achieving Dynamical Isometry
- Selective Forgetting of Deep Networks at a Finer Level than Samples
Notes — subtree (1)
Connections
No topic connections yet.