AbstractPhila PRO
AI & ML interests
Recent Activity
Organizations
Geometric Memory III: Resonant Optimization, Consensus Distillation, and Evolutionary Training Paradigms
I've noticed some definite data overlap in the system - a percentage of the validation data has been trained, the R1 constraints aren't necessarily perfect in the current state. Even without the overlap, the early stage model can be directly trained to be R1@100% within 3 epochs, so this isn't the crucial fail point. The fail point is in the large scheme tests having potential overlap, and these cannot have overlap for the big train.
This overlap happened when I switched from the 200k set to the 500k set, so the full 12 million set will need a new validation target other than itself. I only ran it twice, but both times likely bled 20k images of mixed origin from the other, which likely is less than 8% or so bleedover but it's enough to taint the outcome.
I'll require another dataset to validate, something completely removed from the attribution and completely differentiated. I'm going to likely use my own dataset as validation, which is essentially a billion trash prompts that cannot simply be solved, and often make zero sense.
Even a small percentage of the validation data having been trained is enough for me to resort to extreme measures, not to mention the damn thing is reporting R1 100% all the time which is annoying me. I want to see a legitimate series of impossible combinations that cannot be represented, essentially garbage noise mixed with pure captions that the model has never learned from.
The model cannot easily solve these, which will give a perfect measure. I'd say maybe a million of these will be the best possible impossible goal.
AbstractPhil/bulk-coco-features
This... is going to be a odd one to describe. Based on the research with Bert, creating a uniformed patchwork using a multitude of vit composites will be very achievable. It shouldn't be soup, which is really hard to explain, but by creating a second geometric anchor, the system will align in a way that I could never predict without many more model analysis and must test. I simply didn't test all these vits for geometry, so this will be the test.
This is essentially 34 directly extracted views of coco, which is already prepared feature data. With this data, we have 34 experts that can distill into a single unified vit. I'm hesitant to even call this distillation anymore, it's more interpolative data alignment, and it's absurdly retentive.
ADDITIONALLY, we can anchor to frozen geolip-bert and create cross-contrast between the anchors for a learned anchor median, which will allow further integrations directly into the geometric core.
This will require a few overlapping internal mechanisms to guarantee vit differentiation, however I believe the full unified patchwork will be... different from what is currently known as a vit.
geolip-bert-vit will likely be cooking within the month. The alignment statistics say it will be... 100% accurate to the specifications.
I CAN prepare 34 vits worth of imagenet, but I would need probably 34 vits worth of laion aesthetics, which is substantially more than I currently have. In the process I would need to ensure everything isn't corrupt, and the captions are correctly synthesized in our expert student bert with the correct anchoring rotation.
Probably 3 vits is enough for the full version prototype, 34 vits for the bulk experiment.