Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Anthropic | Charting a Path to AI Accountability, published by Gabriel Mukobi on June 14, 2023 on LessWrong.
Below is the text copied from the linked post (since it's short). I'm not affiliated with Anthropic but wanted to link-post this here for discussion.
This week, Anthropic submitted a response to the National Telecommunications and Information Administration’s (NTIA) Request for Comment on AI Accountability. Today, we want to share our recommendations as they capture some of Anthropic’s core AI policy proposals.
There is currently no robust and comprehensive process for evaluating today’s advanced artificial intelligence (AI) systems, let alone the more capable systems of the future. Our submission presents our perspective on the processes and infrastructure needed to ensure AI accountability. Our recommendations consider the NTIA’s potential role as a coordinating body that sets standards in collaboration with other government agencies like the National Institute of Standards and Technology (NIST).
In our recommendations, we focus on accountability mechanisms suitable for highly capable and general-purpose AI models. Specifically, we recommend:
Fund research to build better evaluations
Increase funding for AI model evaluation research. Developing rigorous, standardized evaluations is difficult and time-consuming work that requires significant resources. Increased funding, especially from government agencies, could help drive progress in this critical area.
Require companies in the near-term to disclose evaluation methods and results. Companies deploying AI systems should be mandated to satisfy some disclosure requirements with regard to their evaluations, though these requirements need not be made public if doing so would compromise intellectual property (IP) or confidential information. This transparency could help researchers and policymakers better understand where existing evaluations may be lacking.
Develop in the long term a set of industry evaluation standards and best practices. Government agencies like NIST could work to establish standards and benchmarks for evaluating AI models’ capabilities, limitations, and risks that companies would comply with.
Create risk-responsive assessments based on model capabilities
Develop standard capabilities evaluations for AI systems. Governments should fund and participate in the development of rigorous capability and safety evaluations targeted at critical risks from advanced AI, such as deception and autonomy. These evaluations can provide an evidence-based foundation for proportionate, risk-responsive regulation.
Develop a risk threshold through more research and funding into safety evaluations. Once a risk threshold has been established, we can mandate evaluations for all models against this threshold.
If a model falls below this risk threshold, existing safety standards are likely sufficient. Verify compliance and deploy.
If a model exceeds the risk threshold and safety assessments and mitigations are insufficient, halt deployment, significantly strengthen oversight, and notify regulators. Determine appropriate safeguards before allowing deployment.
Establish pre-registration for large AI training runs
Establish a process for AI developers to report large training runs ensuring that regulators are aware of potential risks. This involves determining the appropriate recipient, required information, and appropriate cybersecurity, confidentiality, IP, and privacy safeguards.
Establish a confidential registry for AI developers conducting large training runs to pre-register model details with their home country’s national government (e.g., model specifications, model type, compute infrastructure, intended training completion date, and safety plans) before training commences. Aggregated registry data should be protect...