Wow, it’s amazing that just 3.3% of the training set coming from the same model can already start to mess it up.
Wow, it’s amazing that just 3.3% of the training set coming from the same model can already start to mess it up.
I’ve read some snippets of AI written books and it really does feel like my brain is short circuiting
At least in this case, we can be pretty confident that there’s no higher function going on. It’s true that AI models are a bit of a black box that can’t really be examined to understand why exactly they produce the results they do, but they are still just a finite amount of data. The black box doesn’t “think” any more than a river decides its course, though the eventual state of both is hard to predict or control. In the case of model collapse, we know exactly what’s going on: the AI is repeating and amplifying the little mistakes it’s made with each new generation. There’s no mystery about that part, it’s just that we lack the ability to directly tune those mistakes out of the model.
Thank you!!
Being a Python simp, I find GDscript just different enough to nag. There’s a lot of QoL stuff they don’t have and aren’t (currently) looking to add in order to keep the language simple. Honestly has me looking to use C# instead.
Being a Python simp, I find GDscript just different enough to nag. There’s a lot of QoL stuff they don’t have and aren’t (currently) looking to add in order to keep the language simple. Honestly has me looking to use C# instead.
Honestly C# has grown on me quite a bit. Shakes off some of the bloat of Java and linq is pretty handy. God knows if I can’t tell you what the distinction is between C# and .NET Core and whatever the hell ASP is.
I mean, we’ve seen already that AI companies are forced to be reactive when people exploit loopholes in their models or some unexpected behavior occurs. Not that they aren’t smart people, but these things are very hard to predict, and hard to fix once they go wrong.
Also, what do you mean by synthetic data? If it’s made by AI, that’s how collapse happens.
The problem with curated data is that you have to, well, curate it, and that’s hard to do at scale. No longer do we have a few decades’ worth of unpoisoned data to work with; the only way to guarantee training data isn’t from its own model is to make it yourself