If we could take a low res sample of the network's hidden layers, with the right sampling algorithm it might be possible to compare them and see how they relate to each other.
If so, it might tell us all kinds of useful info about the labels used in the training data, let us visualise them, group, graph, find interesting intersections, areas of the network that seem suboptimal, or use it to write prompt preprocessor and weight translators, link bad prompts to weird combination effects and so on.
Crudest example idea at the moment but it's my first thought in this space.
Every G generations, and maybe combine these somehow
# get the features out of the latent space
feature_groups = [net.features for net in latent_nets]
features = chain(features)
# split into groups
bucket_count = 512
bucket_width = len(features) / bucket_count
grouper = array.split # might need to be better
groups = grouper(features, size)
# downsample to a 1k fingerprint
aggregate = lambda group: sum(group) / group_size
bucket = lambda statistic: int(aggregate(group) * 65535)
buckets = [bucket(group) for group in groups]
class Fingerprint:
def __init__(self, ngram, data=None, count=0):
self.data = data or data[0] * bucket_count
self.count = count
load = save = NotImplemented
def get_ngrams(words):
ngrams = []
for length in range(1, len(words)+1):
ngrams_this_long = sliding_window(words, length)
ngrams.extend(ngrams_this_long)
return ngrams
words = phrase.split()
ngrams = get_ngrams(words)
for ngram in ngrams:
fingerprint = Fingerprint.load(ngram)
# weight function might need a better choice
weight = len(ngram) / len(word)
for idx, score in enumerate(buckets):
fingerprint.data[idx] += score * weight
fingerprint.count += weight
fingerprint.save()
- Explore better grouper, aggregate, weight and reduce functions. Possibly using ML itself.
- Try to find stability over image size and permutations.
- Get real world phrases to test with.
- Prevent duplicate (seed, ngram) pairs from skewing data.
- Filter ngrams before fingerprints run out of precision.
Statistically? Max? Store as
0..1
, start at0.5
and convert to a bitmap + mask of convergent/chaotic regions?