Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Re: Anthropic's suggested SB-1047 amendments, published by RobertM on July 27, 2024 on LessWrong.
If you're familiar with SB 1047, I recommend reading the letter in full; it's only 7 pages.
I'll go through their list of suggested changes and briefly analyze them, and then make a couple high-level points. (I am not a lawyer and nothing written here is legal advice.)
Major Changes
Greatly narrow the scope of pre-harm enforcement to focus solely on (a) failure to develop, publish, or implement an SSP[1] (the content of which is up to the company); (b) companies making materially false statements about an SSP; (c) imminent, catastrophic risks to public safety.
Motivated by the following concern laid out earlier in the letter:
The current bill requires AI companies to design and implement SSPs that meet certain standards - for example they must include testing sufficient to provide a "reasonable assurance" that the AI system will not cause a catastrophe, and must "consider" yet-to-be-written guidance from state agencies. To enforce these standards, the state can sue AI companies for large penalties, even if no actual harm has occurred.
While this approach might make sense in a more mature industry where best practices are known, AI safety is a nascent field where best practices are the subject of original scientific research. For example, despite a substantial effort from leaders in our company, including our CEO, to draft and refine Anthropic's RSP over a number of months, applying it to our first product launch uncovered many ambiguities. Our RSP was also the first such policy in the industry, and it is less than a year old.
What is needed in such a new environment is iteration and experimentation, not prescriptive enforcement. There is a substantial risk that the bill and state agencies will simply be wrong about what is actually effective in preventing catastrophic risk, leading to ineffective and/or burdensome compliance requirements.
While SB 1047 doesn't prescribe object-level details for how companies need to evaluate models for their likelihood of causing critical harms, it does establish some requirements for the structure of such evalutions (22603(a)(3)).
Section 22603(a)(3)
(3) Implement a written and separate safety and security protocol that does all of the following:
(A) If a developer complies with the safety and security protocol, provides reasonable assurance that the developer will not produce a covered model or covered model derivative that poses an unreasonable risk of causing or enabling a critical harm.
(B) States compliance requirements in an objective manner and with sufficient detail and specificity to allow the developer or a third party to readily ascertain whether the requirements of the safety and security protocol have been followed.
(C) Identifies specific tests and test results that would be sufficient to provide reasonable assurance of both of the following:
1. That a covered model does not pose an unreasonable risk of causing or enabling a critical harm.
2. That covered model derivatives do not pose an unreasonable risk of causing or enabling a critical harm.
(D) Describes in detail how the testing procedure assesses the risks associated with post-training modifications.
(E) Describes in detail how the testing procedure addresses the possibility that a covered model can be used to make post-training modifications or create another covered model in a manner that may generate hazardous capabilities.
(F) Provides sufficient detail for third parties to replicate the testing procedure.
(G) Describes in detail how the developer will fulfill their obligations under this chapter.
(H) Describes in detail how the developer intends to implement the safeguards and requirements referenced in this section.
(I) Describes in detail the conditions under ...