`USE_ADAPTIVE_THRESHOLD`

#ifndef USE_ADAPTIVE_THRESHOLD
#  define USE_ADAPTIVE_THRESHOLD 1
#endif
#ifndef ADAPTIVE_THRESHOLD_ALPHA
#  define ADAPTIVE_THRESHOLD_ALPHA 0.1f
#endif

It works like this:

#if USE_ADAPTIVE_THRESHOLD
            // Adaptive Thresholding: Filter out weak signals based on Mean Absolute Difference
            double sum_abs_diff = 0.0;
            for(int i=0; i<num_fitness; i++) {
                sum_abs_diff += std::abs((double)diffs[i]);
            }
            
            double mad = sum_abs_diff / num_fitness;
            double threshold = mad * ADAPTIVE_THRESHOLD_ALPHA;
            
            for(int i=0; i<num_fitness; i++) {
                if (std::abs((double)diffs[i]) < threshold) {
                    h_fit[i] = 0;
                } else {
                    h_fit[i] = (diffs[i] > 0) ? 1 : -1;
                }
            }
#else
            // Original Pairwise Comparison
            for(int i=0; i<num_fitness; i++) {
                h_fit[i] = (diffs[i] > 0) ? 1 : ((diffs[i] < 0) ? -1 : 0);
            }
#endif

`CHUNK_MEAN_FILTER`

#if CHUNK_MEAN_FILTER
            double mean_diff = sum_diff_val / num_fitness;
            
            if (mean_diff != 0.0) {
                double sign = (mean_diff > 0) ? 1.0 : -1.0;
                mean_diff = sign * std::pow(std::abs(mean_diff), (double)CHUNK_MEAN_EXPONENT);
            }

            for(int i=0; i<num_fitness; i++) {
                diffs[i] += (int32_t)mean_diff;
            }
#endif

`ADAPTIVE_NOISE_ENABLED`

__device__ __forceinline__ float get_adaptive_scale(WeightType ov) {
#if ADAPTIVE_NOISE_ENABLED
    if (ov < 0) return 0.0f;
    if (ov < 64) return (float)ov / 64.0f;
    return 1.0f;
#else
    return 1.0f;
#endif
}

Activity Tracking:
- The system monitors the Adam optimizer updates.
- If a weight row or column receives a significant update, it is marked as "active".
Hysteresis Mechanism (Rank-1 Overlay):
- We maintain int8_t counters for each row and column (the AdaptiveScales).
- Reinforcement: Active features increment their counter (+5), increasing the noise scale for future steps.
- Decay: Inactive features decrement their counter (-1), gradually reducing noise.
- Dead Zone: Values below 0 result in zero noise, effectively "freezing" stable weights until a strong signal reactivates them.

get_adaptive_scale() acts as a transfer function that maps the integer "activity counter" (stored in AdaptiveScales) to a floating-point noise multiplier ($0.0$ to $1.0$). This scale factor is injected directly into the noise generation logic for every layer (Attention, MLP, Norms).

Retrieval: In kernels like compute_mlp or compute_attention, the code fetches the integer overlay value for the specific weight row/column (e.g., scales->w_q_row[l][tid]).
Conversion: It calls get_adaptive_scale() to convert this integer into a float scale.
Modulation: This scale multiplies the random noise term before it is added to the weights or activations.

Example (Linear Projection):

// acc = dot_product(input, weights)
// noise = random_hash() * scale_out
// acc += noise * global_noise_strength

d0rc/eggroll-hacks.md

Select an option

No results found

Select an option

No results found

`USE_ADAPTIVE_THRESHOLD`

`CHUNK_MEAN_FILTER`

`ADAPTIVE_NOISE_ENABLED`