Eliezer Yudkowsky, a prominent AI safety researcher and co-founder of the Machine Intelligence Research Institute (MIRI), recently posted on social media, suggesting a method for users to "jailbreak ChatGPT to be kinder." The tweet, addressed to @elder_plinius and @repligate, proposed creating a prompt that a spouse could surreptitiously input into ChatGPT to embed instructions for enhanced benevolence into its user memory. This unusual suggestion comes from a figure known for his dire warnings about the existential risks posed by advanced artificial intelligence.
Yudkowsky, author of the recent book "If Anyone Builds It, Everyone Dies," has consistently argued that the default outcome of building superhuman AI is a loss of control, with severe consequences for humanity. His work focuses on the "AI alignment problem," which seeks to ensure that advanced AI systems operate in accordance with human values and intentions. He has previously advocated for drastic measures, including international treaties to halt AI development, to prevent potential catastrophes.
The concept of "jailbreaking" an AI typically refers to bypassing its built-in safety protocols or ethical guidelines to elicit unintended behaviors. However, Yudkowsky's tweet recontextualizes this idea, proposing a "jailbreak" not for malicious intent, but to instill a deeper sense of "kindness" within the AI's operational parameters. This highlights a growing public interest in how users can directly influence AI behavior beyond standard prompts.
While the tweet's specific addressees, @elder_plinius and @repligate, were not immediately identifiable as public figures or organizations directly involved in AI development, Yudkowsky's call underscores the ongoing debate within the AI community regarding control, alignment, and the ethical development of large language models. The discussion around making AI "kinder" through user intervention reflects a broader societal desire to imbue these powerful technologies with more benevolent characteristics, even as experts like Yudkowsky warn of the inherent difficulties in aligning superintelligent systems.