license: public domain CC0
Design Document: Multi-Scale Neural Network Visualization via CA, Voxels, and Fractal Compression
1. Overview
This document defines a high-performance, multi-scale visualization framework for representing the internal state of deep neural networks using:
Cellular automata (CA)
3D voxel grids
Subpixel and multi-resolution compression
Fractal-inspired scaling derived from network weights and dynamics
The framework converts high-dimensional tensors (activations, weights, gradients, attention maps) into structured, recursively compressed visual fields capable of scaling to billion-parameter models.
The system supports:
Static snapshots (single forward pass)
Time evolution (training iterations)
Layer transitions
CA-driven emergent visualizations
Recursive zoom / fractal exploration
The architecture is model-agnostic (CNNs, transformers, MLPs, diffusion models, etc.).
2. Objectives
2.1 Interpretability
Provide structured visibility into:
Activation sparsity patterns
Feature hierarchies
Attention clustering
Gradient flow and vanishing/exploding behavior
Residual path dominance
Spectral structure of weight matrices
Interpretability goal: expose structure, not raw magnitude.
2.2 Scalability
Target constraints:
Handle ≥10⁹ parameters
Maintain interactive performance (30–60 FPS for moderate models)
Support progressive refinement
Strategies:
Hierarchical spatial compression
Tensor factorization (PCA/SVD)
Block quantization
Octree voxelization
Multi-resolution caching
2.3 Artistic and Structural Insight
Neural networks inherently exhibit:
Recursive composition
Hierarchical feature reuse
Spectral decay
Self-similar clustering
Power-law distributions
The system intentionally leverages these properties to produce fractal-like representations grounded in real model statistics.
3. System Architecture
3.1 Data Sources
3.1.1 Activation Capture
Implementation (PyTorch example conceptually):
Register forward hooks on modules
Capture:
Input tensor
Output tensor
Intermediate states (if needed)
Memory constraints:
For large models, stream activations layer-by-layer.
Use half precision (FP16/BF16).
Optionally detach and move to CPU asynchronously.
3.1.2 Gradients
Use backward hooks or register_full_backward_hook.
Store:
dL/dW
dL/dX
Gradient norms
Gradient sign maps
Optionally compute:
[
||\nabla W||_F, \quad ||\nabla X||_2
]
These become color or intensity drivers.
3.1.3 Weight Statistics
Precompute per layer:
Frobenius norm
Spectral norm (via power iteration)
Singular values (top-k)
Channel norms
Kernel norms
Sparsity ratio
Weight distribution histogram
Cache results for rendering.
3.1.4 Attention Matrices
For transformer layers:
Extract:
[
A \in \mathbb{R}^{H \times N \times N}
]
Where:
H = number of heads
N = sequence length
Store:
Mean across heads
Per-head matrices
Symmetrized attention
Eigenvalues of A
3.1.5 Jacobians (Optional)
Expensive but powerful.
Approximate Jacobian norm via:
[
||J||_F^2 = \sum_i ||\frac{\partial y}{\partial x_i}||^2
]
Efficient approximation:
Hutchinson trace estimator
Random projection methods
Used to visualize sensitivity fields.
3.2 Processing Pipeline
Stage 1 — Tensor Acquisition
Normalize tensors per layer:
Options:
Min-max scaling
Z-score normalization
Robust scaling (median + MAD)
Log scaling for heavy-tailed distributions
Recommended default:
[
x' = \tanh(\alpha x)
]
Prevents outlier domination.
Stage 2 — Dimensionality Compression
CNN Feature Maps
Input shape:
[
B \times C \times H \times W
]
Steps:
Aggregate batch:
mean across B
Compute:
mean activation per channel
variance per channel
Reduce channels:
PCA across C
Top 3 components → RGB
Optional:
Spatial pooling pyramid:
1×
1/2×
1/4×
1/8×
Store as mipmap pyramid.
MLP Activations
Vector shape:
[
B \times D
]
Options:
Reshape D into 2D grid (nearest square)
PCA to 3 components
Use block averaging
Spectral embedding
Attention Compression
Compute recursive powers:
[
A^{(2^k)} = A^{(2^{k-1})} \cdot A^{(2^{k-1})}
]
Normalize at each step.
This produces long-range interaction amplification.
Also compute:
Laplacian:
[
L = D - A
]Eigenvectors for cluster visualization.
Stage 3 — Fractal Scaling
3.3.1 Weight Norm Scaling
For each layer:
[
s_L = ||W_L||_F
]
For each channel:
[
s_c = ||W_{L,c}||
]
Use scaling factor:
[
\tilde{x} = x \cdot \frac{s_c}{\max(s_c)}
]
Maps structural importance to visual prominence.
3.3.2 Spectral Scaling
Compute top singular values:
[
\sigma_1 \ge \sigma_2 \ge \dots
]
Define recursive zoom depth:
[
depth \propto \log(\sigma_1 / \sigma_k)
]
High spectral dominance → deeper fractal recursion.
3.3.3 Residual Path Branching
For networks with skip connections:
Represent each residual branch as a child region in CA or voxel tree.
Branch width ∝ branch weight norm.
This creates visible branching trees.
3.3.4 Jacobian Field Visualization
Map:
Jacobian norm → brightness
Largest singular vector direction → color angle
Results often produce ridge-like structures in input space.
4. Compression Techniques
4.1 Subpixel Encoding
Each pixel subdivided into:
2×2 grid or 3×3 microcells
Encode:
Mean
Variance
Gradient magnitude
Sign ratio
Use bit-packing for GPU upload:
Example:
8 bits mean
8 bits variance
8 bits gradient
8 bits sign entropy
Packed into RGBA texture.
4.2 Octree Voxelization
Data structure:
Node:
bounds
mean_activation
variance
children[8]
Merge rule:
If:
[
|a_i - a_j| < \epsilon
]
And variance below threshold → collapse children.
Provides O(N log N) construction.
4.3 Density-Aware Merging
Define density:
[
\rho = |activation|
]
High ρ:
Subdivide
Low ρ:
Merge
Adaptive voxel resolution.
4.4 Multi-Resolution Blending
Algorithm:
Downsample tensor via average pooling
Upsample via bilinear
Blend:
[
x_{blend} = \lambda x + (1-\lambda)x_{up}
]
Repeat recursively.
Produces controlled fractal texture.
5. Cellular Automaton Layer
Each CA cell contains:
struct Cell:
activation_mean
activation_variance
gradient_mean
weight_scale
spectral_scale
Neighborhood:
Moore (8-neighbor)
3D 26-neighbor (voxels)
Update rule example:
[
x_{t+1} = f(x_t, \text{neighbor mean}, \text{gradient}, \text{weight scale})
]
Possible update equation:
[
x' = x + \alpha \cdot \Delta_{neighbors}
]
[
x' = x' \cdot (1 + \beta \cdot weight_scale)
]
Optionally nonlinear activation (ReLU/tanh).
Can be:
Hand-crafted
Learned (Neural CA)
6. Voxel Rendering
6.1 Mapping Strategy
Dimension mapping examples:
X,Y → spatial
Z → channel index
Brightness → activation
Hue → gradient direction
Opacity → weight norm
6.2 GPU Rendering
Recommended:
OpenGL / Vulkan
WebGL for browser
CUDA volume ray marching
Techniques:
3D textures
Ray marching with early termination
Transfer functions for opacity
Instanced cube rendering for sparse voxels
Acceleration:
Frustum culling
Level-of-detail switching
Sparse voxel octrees
7. Color Encoding
7.1 Diverging Maps
Map:
[
x < 0 → blue
]
[
x > 0 → red
]
Gamma correct before display.
7.2 PCA → RGB
Compute PCA:
[
X \rightarrow U \Sigma V^T
]
Take first 3 columns of UΣ.
Normalize per component.
Map to RGB.
7.3 HSV Gradient Encoding
Hue:
[
\theta = \text{atan2}(g_y, g_x)
]
Saturation:
[
||\nabla||
]
Value:
[
|activation|
]
8. Rendering Modes
8.1 Static
Single layer spectral map
Attention fractal heatmap
Weight norm landscape
Voxel activation cloud
8.2 Animated
Training evolution over epochs
Gradient flow over time
CA emergent patterns
Recursive zoom via spectral scale
8.3 Interactive
User controls:
Layer selection
Head selection
Compression threshold
Spectral depth
Toggle raw vs scaled
Voxel slicing plane
Add inspection overlay:
Hover → show tensor statistics
Click → show singular values
9. Performance Considerations
9.1 Memory
Use FP16 where possible
Stream tensors instead of storing entire model
Compress PCA bases
9.2 Parallelism
GPU for voxel + CA
CPU for PCA/SVD (or cuSOLVER)
Async prefetch
9.3 Caching
Cache:
Downsample pyramids
PCA bases per layer
Weight norms
Spectral norms
Invalidate cache when model updates.
10. Stability & Safety
Always normalize before visualization.
Clamp extreme outliers.
Provide legends and numeric scales.
Separate aesthetic exaggeration from faithful mode.
Provide “scientific mode” toggle (no scaling distortions).
11. Future Extensions
Learned Neural CA visualizers
VR exploration of voxel space
Differentiable visualization loss
Integration with experiment tracking systems
Spectral topology analysis
Persistent homology overlays
12. Implementation Roadmap (High-Level)
Phase 1
Activation capture
PCA compression
2D heatmap renderer
Phase 2
Multi-resolution pyramid
Octree voxelization
GPU volume rendering
Phase 3
Spectral scaling
Attention recursion
CA evolution engine
Phase 4
Interactive UI
Training-time animation
VR or WebGL deployment
No comments:
Post a Comment