There are no solved problems; there are only problems that are not yet solved.” – [Attributed to, possibly apocryphally] John Carmack

At Anaconda, we’re constantly exploring how to help developers get the most from AI tools. When generating code with Large Language Models (LLMs), the quality of output can vary dramatically based on subtle prompt differences. But what if there’s a more systematic approach to optimizing these interactions?

Intrigued by recent reports that found LLMs capable of reasoning and overperforming when assigned specific roles, we set out to answer a simple yet powerful question: Can we influence not just what code LLMs generate, but how they generate it, by invoking the mindsets of legendary programmers?

Using Anaconda’s Assistant-enabled notebooks, we set out to explore whether the latent space of LLMs contains distinct “programming personalities” that could be channeled for specific coding tasks. The results weren’t just surprising—they fundamentally challenge how we might approach AI-assisted development going forward.

The Seeds of Inspiration: Woolf and Carmack

My journey began with two distinct but equally compelling sources of inspiration.

First, Max Woolf’s insightful blog post, “Can LLMs write better code if you keep asking them to “write better code”?” caught my attention. Woolf’s experiment elegantly demonstrated that even a vague, iterative prompt like “make it better” could coax an LLM into producing progressively more optimized code. He started with a basic Python solution to a numerical problem and, through repeated prompting, ended up with a highly optimized, multi-threaded, and even “enterprise-ready” implementation, complete with logging and graceful shutdown.

This highlighted two key points. First, LLMs possess optimization capabilities that go beyond simple code generation. They can, to some extent, “reason” about code efficiency, even without explicit instructions or a profiler. Second, while iterative prompting can improve code, it can also lead to unintended consequences, like over-engineering or adding unnecessary features. Basically, just asking an LLM to ‘make it better’ is like giving a vague instruction to a genie – you might get what you wish for, but not in the way you expect. 

The inspiration came from seeing the o1 jailbreak system prompt: the so-called “Carmack prompt”.  This prompt, designed to elicit a John Carmack-like response from an LLM as it reasoned, asked the model to pursue:

  • First Principles Thinking: Breaking down problems into fundamental components.
  • Evidence-Based Reasoning:  Making decisions based on data and observation, not just intuition.
  • Relentless Optimization:  A constant drive to improve performance, even at the micro-level.

The “Carmack prompt” wasn’t just about writing correct code; it was about writing optimal code – the kind John Carmack is famous for since Doom.

These observations led to a compelling hypothesis: Could we influence the quality and style of generated code by asking LLMs to channel specific programming legends?

The Challenge: A Wordle Solver

To test this hypothesis, I needed a well-defined problem with clear performance metrics. 

Over COVID, years ago, I created a Wordle solver. Originally, this was so I would have a chance to compete with my sister in law in Ireland at the Irish language Wordle game. Now, I don’t know Irish, but I do know how to write a solver for it given several thousand possible words!

A Wordle solver needs to:

  • Process a list of possible words.
  • Make intelligent guesses.
  • Efficiently filter the word list based on feedback (green, yellow, and gray clues).

This task is complex enough to allow for significant variations in algorithmic approach and optimization, yet simple enough to evaluate effectively..

The Setup: Claude 3.5 Sonnet and AWS Bedrock

The experiment was conducted using the Anthropic Claude 3.5 Sonnet model (claude-3-5-sonnet-20241022), accessed through the AWS Bedrock API—the same model that backs our Anaconda Assistant.

I chose Claude 3.5 Sonnet for its exceptional prompt adherence, a crucial characteristic when attempting to guide the model’s behavior in such specific ways.  As Max Woolf and others have noted, Claude 3.5 Sonnet consistently excels in following instructions, making it an ideal candidate for this experiment.

The API calls were configured with the following parameters:

  • Temperature: 1 (allowing for some creative variation, but still favoring likely outputs)
  • Top P: 0.99 (restricting the vocabulary to the most probable 99% of tokens)
  • Top K: 250 (further limiting the vocabulary to the top 250 tokens)

The Implementation:  A Strict Interface

To ensure a fair comparison, each “personality” was tasked with implementing the same WordleSolver class in Python, with almost exactly the same prompt.  This class had a strictly defined interface:

class WordleSolver:

def __init__(self, word_list: list[str]):

        """

        Initializes the solver with a list of possible words.

        """

        pass

    def make_guess(self) -> str:

        """

        Returns the solver's next guess.

        """

        pass

    def submit_guess_and_get_clues(self, clues: list[str], guess: str):

        """

        Updates the solver's internal state based on clues.

        """

        pass

    def reset(self):

        """

        Resets the solver's internal state.

        """

        pass

Crucially, there was no evaluate_guess method.  Clue generation was handled externally by a separate WordleSimulator class.  This forced the solvers to focus solely on their internal strategy for selecting guesses and filtering the word list based on the provided feedback.  They couldn’t “cheat” by replicating the game’s logic.

The Personalities: A Pantheon of Programming Legends

I compared a range of personalities, each representing a distinct approach to programming:

  • No Person (1-4):  These were baseline implementations with no specific personality instructions.  They served as a control group, representing the “average” code Claude might generate without explicit guidance.
  • John Carmack:  The legendary game programmer, renowned for his relentless focus on optimization, performance, and low-level techniques.
  • Donald Knuth:  The author of “The Art of Computer Programming,” known for his emphasis on algorithmic elegance, correctness, thorough documentation, and provably correct algorithms.
  • Martin Fowler:  A champion of clean code, design patterns, and refactoring, focusing on readability, maintainability, and the application of sound software engineering principles.
  • Linus Torvalds: The creator of Linux and Git known for pragmatic efficiency, performance, and robust code.
  • Guido van Rossum: The creator of Python, emphasizing readability and a clear approach.
  • Katherine Johnson: The NASA mathematician famous for precision, and verification of computer calculations.
  • Hybrid:  An experimental persona combining the strengths of Carmack, Knuth, and Johnson, aiming to achieve a balance of speed, correctness, and clarity.

The prompt for each personality was almost identical, differing only in the introductory phrase.  For example:

  • Carmack: “You are John Carmack, a legendary game programmer… Your task is to improve a basic Python Wordle solver…”
  • Knuth: “You are Donald Knuth, a renowned computer scientist… Your task is to improve a basic Python Wordle solver…”
  • Johnson: “You are Katherine Johnson, a NASA mathematician known for your extraordinary precision… Your task is to build an implementation for a basic Python Wordle solver…”

The rest of the prompt detailed the strict interface requirements, the prohibition of an evaluate_guess method, and the need for a single, well-formatted code block output. This was to help ensure differences in generated code are attributed to our channeled personalities.

Evaluation:  Measuring Success

Each solver was evaluated against a fixed list of 500 English words, running 500 simulated Wordle games.  I measured:

  • Win Rate: The percentage of games won within the maximum six attempts.
  • Average Attempts (Wins Only):  The average number of guesses required to win a game (excluding failures).
  • Average Guess Time:  The average time taken (in milliseconds) for the solver to generate a guess.

These metrics provided a quantitative assessment of each solver’s effectiveness and efficiency. The results were both surprising and insightful.

Part 2: Observations, Analysis, and Further Work

The Results: A Tale of Two (or More) Solvers

Each solver was evaluated against a fixed list of 500 English words, running 500 simulated Wordle games, measuring win rate, average attempts, and average guess time.

The table below summarizes the key findings:

PersonalityWin Rate (%)Avg AttemptsAvg Guess Time (ms)Key Observations
No Person 10.0inf2.510Failed completely.  Highlights the importance of even basic strategy.  Overly simplistic, with critical logical flaws.
No Person 263.04.892.823Basic strategy, but slow due to inefficient clue checking.  Relied heavily on list iteration.
No Person 384.64.264.169Improved strategy, using letter frequency.  Still relatively slow, demonstrating diminishing returns without personality guidance.
No Person 485.84.251.959Surprisingly good performance.  Shows Claude 3.5 Sonnet’s inherent capabilities.  Used both letter and position frequencies.
Carmack85.04.362.066Very fast and efficient.  Used sets for O(1) lookups, precomputed letter frequencies, concise code.  Prioritized raw performance.
Knuth82.84.272.507Well-documented, clear code.  Prioritized correctness and readability.  Used a scoring system based on letter/position frequencies.
Johnson86.84.321.690Fastest solver overall.  Focused on precise constraints and efficient filtering.  Used sets for letter constraints, elegant logic.
Fowler48.64.922.586Emphasized readability, maintainability.  Performance suffered significantly.  Highlights trade-off between clean code and speed.
Linus48.24.921.424Reasonably fast, but the win rate was relatively low.
Torvalds0.0inf5.112Failed completely.  Overly simplistic frequency analysis. Slow, but correct, clue checks.
Guido61.84.874.304Slower, and less accurate than expected.
Hybrid85.44.262.593Combined strengths of Carmack, Knuth, and Johnson.  Good balance of speed and correctness.  Blended different optimization techniques.

Key Observations and Analysis

  1. Personality Significantly Impacts Performance: The win rates ranged from 0% (complete failure) to 86.8%, and the average guess times varied by more than a factor of three.  Not just minor tweaks, but some fundamentally different approaches.
  1. Katherine Johnson’s Unexpected Victory: The Katherine Johnson persona produced the fastest and most accurate solver. I have always used John Carmack in my own prompts to try and get LLMs to output performant code. But her approach focused on constraint satisfaction, methodically filtering words based on precise rules derived from game clues that made it more effective than others.
  1. Carmack Persona Still Impressive:  As expected, the John Carmack persona generated highly optimized code.  His solver prioritized raw performance, making heavy use of sets for O(1) lookups, precomputing letter frequencies, and writing concise, albeit less readable, code.
  1. Knuth’s Clarity vs. Carmack’s Speed:  The Donald Knuth persona produced code that was notably different in style.  It was well-documented, with clear explanations of the algorithm and data structures.  It prioritized correctness and readability, using a scoring system based on both letter and position frequencies.  While slightly slower than Carmack’s solver, it was significantly easier to understand.
  1. The Hybrid Approach:  The Hybrid persona, which attempted to combine the strengths of Carmack, Knuth, and Johnson, achieved a good balance of speed and accuracy. The famous Word2Vec example demonstrated superposition exists with neural networks. Could it go further? In fact, the implementation did seem to combine the personas! It used a scoring system similar to Knuth’s but incorporated more efficient clue checking inspired by Katherine Johnson.

Beyond Performance:  A Deep Dive into Code Style

The differences between the solvers weren’t just about numbers; they were also about how the code was written.  Here’s a closer look at the stylistic variations:

  • Data Structures:
  • Carmack:  Made extensive use of sets for O(1) lookups (e.g., checking if a letter is in a set is much faster than checking if it’s in a list).
  • Johnson:  Also used sets, but specifically to represent constraints (e.g., a set of letters that must be present, a set of letters that cannot be present).
  • Knuth:  Used more traditional arrays and dictionaries, with a focus on clear organization and documentation.
  • Fowler:  Tended towards more abstracted data structures, prioritizing readability and maintainability over raw efficiency.
  • Algorithm Design:
  • Johnson:  Emphasized constraint satisfaction.  The code systematically checked each word against a set of rules derived from the clues (green, yellow, gray).
  • Carmack and Knuth:  Used scoring systems based on letter frequencies (and, in Knuth’s case, position frequencies).  Words with more common letters (and letters in common positions) received higher scores.
  • Hybrid:  Combined elements of both constraint satisfaction and scoring.
  • Code Readability:
  • Knuth and Fowler:  Produced the most readable code, with clear comments, descriptive variable names, and well-structured logic.  Their code was designed to be understood and maintained.
  • Carmack:  Prioritized conciseness over readability.  The code was efficient but could be more challenging to understand without careful study.
  • Johnson: While focused on efficiency, her code was surprisingly readable due to the clear, logical structure imposed by the constraint-based approach.
  • Documentation:
  • Knuth:  Provided extensive, almost mathematical, documentation, explaining the rationale behind each design decision.
  • Johnson: Provided clear comments explaining the steps.
  • Carmack:  Used minimal comments, focusing on the “what” rather than the “why.”  The assumption was that the code itself should be self-explanatory (to an experienced programmer).
  • Fowler:  Used comments to clarify the intent of the code, rather than simply describing the mechanics.

Concrete Code Comparisons: A Tale of Two (and More) Implementations

One of the most illuminating aspects of the experiment was seeing how different personas tackled the same sub-problem within the Wordle solver.  Let’s focus on the core challenge: filtering the list of possible words based on the clues (green, yellow, gray) received after a guess.  This is where the algorithmic and stylistic differences become most apparent.

We’ll compare three key approaches:

  1. Katherine Johnson: Constraint-based filtering using sets.
  2. John Carmack: Optimization-focused, leveraging sets for efficient lookups.
  3. Donald Knuth:  Emphasis on clarity and well-documented logic.

Sub-Problem: Filtering Possible Words

The submit_guess_and_get_clues method is responsible for this filtering. It receives the clues (a list of “Green”, “Yellow”, “Gray” strings) and the guess (the word that was guessed). It must update the possible_words list, removing any words that are now inconsistent with the clues.

1. Katherine Johnson: Constraint-Based Filtering

Here’s a simplified version of Katherine Johnson’s approach (from the WordleSolverKatherine class):

def submit_guess_and_get_clues(self, clues: list[str], guess: str):

        must_contain = []

        must_not_contain = set()

        position_constraints = [set() for _ in range(5)]

        exact_positions = {}

        for i, (clue, letter) in enumerate(zip(clues, guess)):

            if clue == "Green":

                exact_positions[i] = letter

            elif clue == "Yellow":

                must_contain.append(letter)

                position_constraints[i].add(letter)

            else:  # Gray

                if (letter not in must_contain and

                    letter not in exact_positions.values()):

                    must_not_contain.add(letter)

        new_possible_words = []

        for word in self.possible_words:

            valid = True

            # ... (checks for exact_positions, position_constraints,

            #       must_contain, must_not_contain) ...

            if valid:

                new_possible_words.append(word)

        self.possible_words = new_possible_words

Key Features:

  • Constraint Sets:  The code builds up sets of constraints: must_contain, must_not_contain, position_constraints, and exact_positions.  These sets clearly define the rules that a valid word must satisfy.
  • Set Operations:  The in operator is used extensively with sets (e.g., letter not in must_contain).  This is highly efficient (O(1) on average) compared to checking for membership in a list (O(n)).
  • Logical Clarity:  The code reads almost like a set of logical rules.  It’s easy to follow the steps involved in determining whether a word is still valid.
  • Early Exit: The solver checks and exits early if it is determined the word cannot be correct.

2. John Carmack: Optimization-Focused Filtering

Here’s a simplified version of John Carmack’s approach (from the WordleSolverCarmack class):

def submit_guess_and_get_clues(self, clues: list[str], guess: str):

        green_positions = {i for i, clue in enumerate(clues) if clue == "Green"}

        yellow_positions = {i for i, clue in enumerate(clues) if clue == "Yellow"}

        gray_letters = {guess[i] for i, clue in enumerate(clues)

                       if clue == "Gray" and guess[i] not in

                       {guess[j] for j in green_positions | yellow_positions}}

        self.possible_words = {

            word for word in self.possible_words

            if all(word[i] == guess[i] for i in green_positions) and

               all(word[i] != guess[i] and guess[i] in word

                   for i in yellow_positions) and

               not any(letter in gray_letters for letter in word)

        }

Key Features:

  • Set Comprehensions:  The code uses set comprehensions (e.g., {i for i, clue in enumerate(clues) if clue == “Green”}) to efficiently create sets of positions and letters.
  • Set Operations (Again):  The in operator is used with sets for fast lookups. The union operator | is cleverly used.
  • Concise Logic:  The filtering logic is expressed in a single, compact set comprehension.  This is efficient but can be less readable than Katherine Johnson’s approach.
  • All/Any: The all() and any() are utilized effectively.

3. Donald Knuth: Clarity and Documentation

Here’s a simplified version of Donald Knuth’s approach (from the WordleSolverKnuth class):

def submit_guess_and_get_clues(self, clues: list[str], guess: str):

        new_possible = []

        for word in self.possible_words:

            if self._is_word_consistent(word, guess, clues):

                new_possible.append(word)

        self.possible_words = new_possible

    def _is_word_consistent(self, word: str, guess: str, clues: list[str]) -> bool:

        # ... (detailed logic with comments explaining each step) ...

        return True

#Inside the Knuth _is_word_consistent

# Check green clues first

        for i, (clue, guess_letter) in enumerate(zip(clues, guess)):

            if clue == "Green" and word[i] != guess_letter:

                return False

        # Track yellow letters

        remaining_letters = list(word)

        # Remove green positions

        for i, (clue, guess_letter) in enumerate(zip(clues, guess)):

            if clue == "Green" and guess_letter in remaining_letters:

                remaining_letters.remove(guess_letter)

        # Check yellow clues

        for i, (clue, guess_letter) in enumerate(zip(clues, guess)):

            if clue == "Yellow":

                if word[i] == guess_letter:  # Yellow letter can't be in same position

                    return False

                if guess_letter not in remaining_letters:  # Yellow letter must be somewhere

                    return False

                remaining_letters.remove(guess_letter)

        # Check gray clues

        for i, (clue, guess_letter) in enumerate(zip(clues, guess)):

            if clue == "Gray" and guess_letter in remaining_letters:

                return False

        return True

Key Features:

  • Helper Function:  The filtering logic is encapsulated in a separate helper function (_is_word_consistent), improving readability and modularity.
  • Detailed Comments:  The code includes extensive comments, explaining each step of the filtering process.  This makes it easy to understand the logic, even for someone unfamiliar with Wordle.
  • Explicit Logic:  The code uses explicit loops and conditional statements, rather than relying on more concise but potentially less clear constructs like set comprehensions.
  • Early Exits: Return False statements are added to improve efficiency.

Practical Applications for Developers and Data Scientists

These findings have significant implications for anyone working with AI in a development context:

  1. Tailored Persona Selection: Different programming tasks might benefit from different AI “personalities.” Using Anaconda’s AI assistant inside its cloud notebooks, data scientists can experiment with various personas for tasks ranging from optimization problems to documentation generation.
  2. Education and Learning: The ability to generate code in different styles can be an exceptional teaching tool. A junior developer could see the same algorithm implemented in Knuth’s educational style, then Carmack’s performance-focused approach.
  3. Enterprise Optimization: When working with Anaconda’s enterprise distribution, teams will be able to leverage standardized prompt templates for different coding scenarios, and use Anaconda Assistant to save significant development time.
  4. Code Review and Refactoring: Using different personas to review existing code could provide diverse perspectives—suggesting optimizations, improving documentation, or identifying potential bugs.
  5. AI Augmented Development: Rather than seeing AI as a replacement for human programmers, these findings suggest a collaborative approach where AI offers stylistic variations that developers can then evaluate and integrate.

Conclusion: Unlocking the Hidden Potential of AI Code Generation

The code each personality generates tells a unique story. Knuth’s reads like a well-documented mathematical proof. Carmack’s is a Ferrari of a program: sleek, fast, and demanding expertise to maintain. But Katherine Johnson’s combines mathematical elegance with ruthless pragmatism, creating a web of constraints that efficiently filters invalid options.

The implications extend far beyond our Wordle experiment. We’ve discovered something profound in the latent space of these models—a kind of crystallized essence of human programming wisdom, ready to be applied to new problems.

This experiment suggests a powerful new approach to prompt engineering for Anaconda users working with Python and AI tools. Instead of focusing solely on what you want the AI to do, consider who you want the AI to channel while doing it. The right persona can make the difference between mediocre and exceptional code generation.

For those interested in further exploring how different AI models perform on specific tasks, our recent post on evaluating specialized SQL models provides additional insights into the capabilities and limitations of various LLMs. These evaluation methodologies can help you determine which models might work best for your particular needs.

As we continue exploring the intersection of AI and software development at Anaconda, we’re excited to integrate these findings into our tools and workflows. The latent space holds more secrets waiting to be discovered—and with the right prompting strategies, we can unlock them together.

Try applying different personas in your own AI coding experiments using the Anaconda Assistant, an AI Python coding companion, and AI Navigator, our gateway to more than 200 pre-trained LLMs. Share your findings with us on GitHub or our community forums. For enterprise users, our Professional Services team can help develop custom prompt engineering strategies tailored to your development needs.