LLM Prompts Found Reversible from Internal States, Raising Security and Utility Questions

Recent academic research indicates that prompts given to large language models (LLMs) can be reconstructed from the models' internal states, a discovery that could have significant implications for AI privacy and utility. The findings suggest that distinct prompts map to unique hidden states within an LLM, making them potentially reversible. This concept was highlighted by researcher Ethan Mollick, who colorfully dubbed the "possibility space of prompts" as the "Pringle of Prompting" in a recent social media post.

A paper titled "Language Models are Injective and Hence Invertible," from institutions including Sapienza University of Rome and EPFL, claims that LLMs are not as "lossy" as previously thought, and their internal "thoughts" can be inverted to reconstruct original prompts with high accuracy. Complementing this, research on "Reverse Prompt Engineering" (RPE) by Northwestern University details a method to recover prompts from as few as five text outputs. This RPE method demonstrated a 5.8% increase in cosine similarity over previous state-of-the-art techniques for prompt reconstruction.

The ability to reverse-engineer prompts from an LLM's internal workings or outputs presents a dual-edged sword. On one hand, it raises concerns about the privacy and security of user inputs, as proprietary or sensitive prompts might be retrievable. As one article noted, "Your LLM Prompts Are Not Safe." On the other hand, this reversibility opens new avenues for AI development, such as generating high-quality content by inferring and adapting prompts from exemplary outputs.

The RPE method, which leverages the LLM itself as an optimizer, requires no prior training data and offers flexibility in generating free-form, natural language prompts. This contrasts with earlier methods that sometimes produced non-linguistic sequences or were constrained by specific output formats. Human evaluators consistently favored RPE-generated content over template-generated alternatives in various use cases, including marketing plans, video game designs, and song lyrics, indicating its potential for creating more nuanced and high-quality data.