#ifndef USE_ADAPTIVE_THRESHOLD
# define USE_ADAPTIVE_THRESHOLD 1
#endif
#ifndef ADAPTIVE_THRESHOLD_ALPHA
# define ADAPTIVE_THRESHOLD_ALPHA 0.1f
#endifIt works like this:
#if USE_ADAPTIVE_THRESHOLD
// Adaptive Thresholding: Filter out weak signals based on Mean Absolute Difference
double sum_abs_diff = 0.0;
for(int i=0; i<num_fitness; i++) {
sum_abs_diff += std::abs((double)diffs[i]);
}
double mad = sum_abs_diff / num_fitness;
double threshold = mad * ADAPTIVE_THRESHOLD_ALPHA;
for(int i=0; i<num_fitness; i++) {
if (std::abs((double)diffs[i]) < threshold) {
h_fit[i] = 0;
} else {
h_fit[i] = (diffs[i] > 0) ? 1 : -1;
}
}
#else
// Original Pairwise Comparison
for(int i=0; i<num_fitness; i++) {
h_fit[i] = (diffs[i] > 0) ? 1 : ((diffs[i] < 0) ? -1 : 0);
}
#endif#if CHUNK_MEAN_FILTER
double mean_diff = sum_diff_val / num_fitness;
if (mean_diff != 0.0) {
double sign = (mean_diff > 0) ? 1.0 : -1.0;
mean_diff = sign * std::pow(std::abs(mean_diff), (double)CHUNK_MEAN_EXPONENT);
}
for(int i=0; i<num_fitness; i++) {
diffs[i] += (int32_t)mean_diff;
}
#endif__device__ __forceinline__ float get_adaptive_scale(WeightType ov) {
#if ADAPTIVE_NOISE_ENABLED
if (ov < 0) return 0.0f;
if (ov < 64) return (float)ov / 64.0f;
return 1.0f;
#else
return 1.0f;
#endif
}-
Activity Tracking:
- The system monitors the Adam optimizer updates.
- If a weight row or column receives a significant update, it is marked as "active".
-
Hysteresis Mechanism (Rank-1 Overlay):
- We maintain
int8_tcounters for each row and column (theAdaptiveScales). - Reinforcement: Active features increment their counter (
+5), increasing the noise scale for future steps. - Decay: Inactive features decrement their counter (
-1), gradually reducing noise. - Dead Zone: Values below 0 result in zero noise, effectively "freezing" stable weights until a strong signal reactivates them.
- We maintain
get_adaptive_scale() acts as a transfer function that maps the integer "activity counter" (stored in AdaptiveScales) to a floating-point noise multiplier (
- Retrieval: In kernels like
compute_mlporcompute_attention, the code fetches the integer overlay value for the specific weight row/column (e.g.,scales->w_q_row[l][tid]). - Conversion: It calls
get_adaptive_scale()to convert this integer into a floatscale. - Modulation: This
scalemultiplies the random noise term before it is added to the weights or activations.
Example (Linear Projection):
// acc = dot_product(input, weights)
// noise = random_hash() * scale_out
// acc += noise * global_noise_strength