Building an Internet That Speaks to AI: The Race to Standardize “Do Not Train”
In just five years, the question of who controls web data used by AI has moved from the corners of technical forums to the center of global policy debates. Between 2020 and 2025, the rise of generative AI has forced the Internet to evolve—not in how it looks, but in how it talks to machines . From early stop-gaps like robots.txt and “no AI” meta tags to advanced standards such as the W3C’s TDMRep and the IETF AI Preferences Working Group , we are witnessing the birth of a new digital language: one that lets humans and AI negotiate data use through machine-readable signals . π§© From Chaos to Coordination At first, every AI company made its own rules. OpenAI introduced GPTBot with a robots.txt opt-out; Google followed with Google-Extended . Artists embedded “NoAI” tags to protect their work; news outlets manually blocked crawlers. These efforts worked in isolation but lacked harmony. The web needed a unified framework —a way for content creators to declare “Yes, index me for search, b...