Saturday, February 14, 2026

art

license: public domain CC0


Design Document: Multi-Scale Neural Network Visualization via CA, Voxels, and Fractal Compression


1. Overview

This document defines a high-performance, multi-scale visualization framework for representing the internal state of deep neural networks using:

  • Cellular automata (CA)

  • 3D voxel grids

  • Subpixel and multi-resolution compression

  • Fractal-inspired scaling derived from network weights and dynamics

The framework converts high-dimensional tensors (activations, weights, gradients, attention maps) into structured, recursively compressed visual fields capable of scaling to billion-parameter models.

The system supports:

  • Static snapshots (single forward pass)

  • Time evolution (training iterations)

  • Layer transitions

  • CA-driven emergent visualizations

  • Recursive zoom / fractal exploration

The architecture is model-agnostic (CNNs, transformers, MLPs, diffusion models, etc.).


2. Objectives

2.1 Interpretability

Provide structured visibility into:

  • Activation sparsity patterns

  • Feature hierarchies

  • Attention clustering

  • Gradient flow and vanishing/exploding behavior

  • Residual path dominance

  • Spectral structure of weight matrices

Interpretability goal: expose structure, not raw magnitude.


2.2 Scalability

Target constraints:

  • Handle ≥10⁹ parameters

  • Maintain interactive performance (30–60 FPS for moderate models)

  • Support progressive refinement

Strategies:

  • Hierarchical spatial compression

  • Tensor factorization (PCA/SVD)

  • Block quantization

  • Octree voxelization

  • Multi-resolution caching


2.3 Artistic and Structural Insight

Neural networks inherently exhibit:

  • Recursive composition

  • Hierarchical feature reuse

  • Spectral decay

  • Self-similar clustering

  • Power-law distributions

The system intentionally leverages these properties to produce fractal-like representations grounded in real model statistics.


3. System Architecture


3.1 Data Sources

3.1.1 Activation Capture

Implementation (PyTorch example conceptually):

  • Register forward hooks on modules

  • Capture:

    • Input tensor

    • Output tensor

    • Intermediate states (if needed)

Memory constraints:

  • For large models, stream activations layer-by-layer.

  • Use half precision (FP16/BF16).

  • Optionally detach and move to CPU asynchronously.


3.1.2 Gradients

Use backward hooks or register_full_backward_hook.

Store:

  • dL/dW

  • dL/dX

  • Gradient norms

  • Gradient sign maps

Optionally compute:

[
||\nabla W||_F, \quad ||\nabla X||_2
]

These become color or intensity drivers.


3.1.3 Weight Statistics

Precompute per layer:

  • Frobenius norm

  • Spectral norm (via power iteration)

  • Singular values (top-k)

  • Channel norms

  • Kernel norms

  • Sparsity ratio

  • Weight distribution histogram

Cache results for rendering.


3.1.4 Attention Matrices

For transformer layers:

Extract:

[
A \in \mathbb{R}^{H \times N \times N}
]

Where:

  • H = number of heads

  • N = sequence length

Store:

  • Mean across heads

  • Per-head matrices

  • Symmetrized attention

  • Eigenvalues of A


3.1.5 Jacobians (Optional)

Expensive but powerful.

Approximate Jacobian norm via:

[
||J||_F^2 = \sum_i ||\frac{\partial y}{\partial x_i}||^2
]

Efficient approximation:

  • Hutchinson trace estimator

  • Random projection methods

Used to visualize sensitivity fields.


3.2 Processing Pipeline


Stage 1 — Tensor Acquisition

Normalize tensors per layer:

Options:

  1. Min-max scaling

  2. Z-score normalization

  3. Robust scaling (median + MAD)

  4. Log scaling for heavy-tailed distributions

Recommended default:

[
x' = \tanh(\alpha x)
]

Prevents outlier domination.


Stage 2 — Dimensionality Compression


CNN Feature Maps

Input shape:
[
B \times C \times H \times W
]

Steps:

  1. Aggregate batch:

    • mean across B

  2. Compute:

    • mean activation per channel

    • variance per channel

  3. Reduce channels:

    • PCA across C

    • Top 3 components → RGB

Optional:

  • Spatial pooling pyramid:

    • 1/2×

    • 1/4×

    • 1/8×

Store as mipmap pyramid.


MLP Activations

Vector shape:
[
B \times D
]

Options:

  • Reshape D into 2D grid (nearest square)

  • PCA to 3 components

  • Use block averaging

  • Spectral embedding


Attention Compression

Compute recursive powers:

[
A^{(2^k)} = A^{(2^{k-1})} \cdot A^{(2^{k-1})}
]

Normalize at each step.

This produces long-range interaction amplification.

Also compute:

  • Laplacian:
    [
    L = D - A
    ]

  • Eigenvectors for cluster visualization.


Stage 3 — Fractal Scaling


3.3.1 Weight Norm Scaling

For each layer:

[
s_L = ||W_L||_F
]

For each channel:

[
s_c = ||W_{L,c}||
]

Use scaling factor:

[
\tilde{x} = x \cdot \frac{s_c}{\max(s_c)}
]

Maps structural importance to visual prominence.


3.3.2 Spectral Scaling

Compute top singular values:

[
\sigma_1 \ge \sigma_2 \ge \dots
]

Define recursive zoom depth:

[
depth \propto \log(\sigma_1 / \sigma_k)
]

High spectral dominance → deeper fractal recursion.


3.3.3 Residual Path Branching

For networks with skip connections:

Represent each residual branch as a child region in CA or voxel tree.

Branch width ∝ branch weight norm.

This creates visible branching trees.


3.3.4 Jacobian Field Visualization

Map:

  • Jacobian norm → brightness

  • Largest singular vector direction → color angle

Results often produce ridge-like structures in input space.


4. Compression Techniques


4.1 Subpixel Encoding

Each pixel subdivided into:

  • 2×2 grid or 3×3 microcells

Encode:

  • Mean

  • Variance

  • Gradient magnitude

  • Sign ratio

Use bit-packing for GPU upload:

Example:

  • 8 bits mean

  • 8 bits variance

  • 8 bits gradient

  • 8 bits sign entropy

Packed into RGBA texture.


4.2 Octree Voxelization

Data structure:

Node:
    bounds
    mean_activation
    variance
    children[8]

Merge rule:

If:
[
|a_i - a_j| < \epsilon
]

And variance below threshold → collapse children.

Provides O(N log N) construction.


4.3 Density-Aware Merging

Define density:

[
\rho = |activation|
]

High ρ:

  • Subdivide

Low ρ:

  • Merge

Adaptive voxel resolution.


4.4 Multi-Resolution Blending

Algorithm:

  1. Downsample tensor via average pooling

  2. Upsample via bilinear

  3. Blend:

[
x_{blend} = \lambda x + (1-\lambda)x_{up}
]

Repeat recursively.

Produces controlled fractal texture.


5. Cellular Automaton Layer

Each CA cell contains:

struct Cell:
    activation_mean
    activation_variance
    gradient_mean
    weight_scale
    spectral_scale

Neighborhood:

  • Moore (8-neighbor)

  • 3D 26-neighbor (voxels)

Update rule example:

[
x_{t+1} = f(x_t, \text{neighbor mean}, \text{gradient}, \text{weight scale})
]

Possible update equation:

[
x' = x + \alpha \cdot \Delta_{neighbors}
]
[
x' = x' \cdot (1 + \beta \cdot weight_scale)
]

Optionally nonlinear activation (ReLU/tanh).

Can be:

  • Hand-crafted

  • Learned (Neural CA)


6. Voxel Rendering


6.1 Mapping Strategy

Dimension mapping examples:

  • X,Y → spatial

  • Z → channel index

  • Brightness → activation

  • Hue → gradient direction

  • Opacity → weight norm


6.2 GPU Rendering

Recommended:

  • OpenGL / Vulkan

  • WebGL for browser

  • CUDA volume ray marching

Techniques:

  • 3D textures

  • Ray marching with early termination

  • Transfer functions for opacity

  • Instanced cube rendering for sparse voxels

Acceleration:

  • Frustum culling

  • Level-of-detail switching

  • Sparse voxel octrees


7. Color Encoding


7.1 Diverging Maps

Map:

[
x < 0 → blue
]
[
x > 0 → red
]

Gamma correct before display.


7.2 PCA → RGB

Compute PCA:

[
X \rightarrow U \Sigma V^T
]

Take first 3 columns of UΣ.

Normalize per component.

Map to RGB.


7.3 HSV Gradient Encoding

Hue:
[
\theta = \text{atan2}(g_y, g_x)
]

Saturation:
[
||\nabla||
]

Value:
[
|activation|
]


8. Rendering Modes


8.1 Static

  • Single layer spectral map

  • Attention fractal heatmap

  • Weight norm landscape

  • Voxel activation cloud


8.2 Animated

  • Training evolution over epochs

  • Gradient flow over time

  • CA emergent patterns

  • Recursive zoom via spectral scale


8.3 Interactive

User controls:

  • Layer selection

  • Head selection

  • Compression threshold

  • Spectral depth

  • Toggle raw vs scaled

  • Voxel slicing plane

Add inspection overlay:

  • Hover → show tensor statistics

  • Click → show singular values


9. Performance Considerations


9.1 Memory

  • Use FP16 where possible

  • Stream tensors instead of storing entire model

  • Compress PCA bases


9.2 Parallelism

  • GPU for voxel + CA

  • CPU for PCA/SVD (or cuSOLVER)

  • Async prefetch


9.3 Caching

Cache:

  • Downsample pyramids

  • PCA bases per layer

  • Weight norms

  • Spectral norms

Invalidate cache when model updates.


10. Stability & Safety

  • Always normalize before visualization.

  • Clamp extreme outliers.

  • Provide legends and numeric scales.

  • Separate aesthetic exaggeration from faithful mode.

  • Provide “scientific mode” toggle (no scaling distortions).


11. Future Extensions

  • Learned Neural CA visualizers

  • VR exploration of voxel space

  • Differentiable visualization loss

  • Integration with experiment tracking systems

  • Spectral topology analysis

  • Persistent homology overlays


12. Implementation Roadmap (High-Level)

Phase 1

  • Activation capture

  • PCA compression

  • 2D heatmap renderer

Phase 2

  • Multi-resolution pyramid

  • Octree voxelization

  • GPU volume rendering

Phase 3

  • Spectral scaling

  • Attention recursion

  • CA evolution engine

Phase 4

  • Interactive UI

  • Training-time animation

  • VR or WebGL deployment



No comments:

Post a Comment