Anthropic, a California-based startup founded by former OpenAI members Daniela and Dario Amodei, has announced a new initiative to fund the development of benchmarks for evaluating AI models, including its own generative AI model, Claude. The program, backed by $6 billion in funding from Amazon, aims to enhance AI model performance and safety.
The initiative addresses the growing need for robust AI benchmarks, as existing ones often fail to capture the full range of AI capabilities and impacts. Anthropic’s program will focus on creating evaluations that measure advanced AI capabilities, safety, infrastructure requirements, and climate impact. The company has invited third parties to apply for funding to develop these benchmarks, with applications being reviewed on a rolling basis.
Anthropic’s program stands out by aiming to assess AI models’ ability to address security risks, societal impacts, and infrastructure needs. This includes evaluating AI’s potential to launch cyberattacks, manipulate through deepfakes, and other security threats. The program also seeks to create an early warning system for governments to flag national security issues related to AI.
The program will support research into AI benchmarks, focusing on AI’s potential in scientific studies, multilingual capabilities, and reducing biases related to race, gender, and religion. Additionally, it aims to develop tools to auto-censor toxic outputs from AI models.
Anthropic’s initiative could be significant for AI startups, providing them with funding opportunities and access to Anthropic’s domain experts. The company is positioning itself as a thought leader in the AI field, despite potential challenges from competitors like Google and Microsoft, who are less transparent about their AI’s environmental impact and other details.
The success of this ambitious project remains to be seen, but it reflects Anthropic’s commitment to advancing AI evaluation standards. The company aims to attract independent researchers and startups to participate in this groundbreaking initiative, hoping to set a new industry standard for comprehensive AI evaluation