SmolLM was mostly trained on synthetic and educational content – not scraped web data. They used AI-generated textbooks,But doesn’t that just mean they trained SmolLM with output from another model that’s presumably trained on scraped web data? If so...
As a website owner fighting against the spread of misinformation and proliferation of AI-generated slop sites, can I prevent this so-called “tool” from generating its supposed “key points” of my pages?