đ” Can AI Learn Music Theory Like a Human? Teaching ChatGPT, Claude, and Gemini with Step-by-Step Prompts

đ” Can AI Learn Music Theory Like a Human? Teaching ChatGPT, Claude, and Gemini with Step-by-Step Prompts
Artificial intelligence is becoming the ultimate multitaskerâwriting essays, coding apps, even generating music. But what about learning music theory the way a student would? Can Large Language Models (LLMs) like ChatGPT really understand intervals, scales, or cadencesânot just generate them, but actually break them down, analyze them, and pass a real music theory exam?
Researchers Liam Pond and Ichiro Fujinaga from McGill University’s Schulich School of Music set out to explore this very question. Their study dives deep into how we can âteachâ LLMs complex topics like music theory using smart promptsâspecifically, in-context learning and chain-of-thought reasoning. Spoiler alert: with the right strategy, these models can learn more than you might think.
In this blog, weâll unpack how they turned robots into students, evaluated their progress using real exam questions from Canadaâs Royal Conservatory of Music (RCM), and what it could all mean for the future of AI education tools.
đ Teaching Music Theory to AI: Why It Matters
Music theory isnât just dry theoryâitâs the grammar of music. It allows musicians to understand, analyze, and create music with intention. So naturally, if AI is going to meaningfully engage with or assist in music education, it needs to do more than spit out generic, auto-generated Mozart wannabe sonatas.
Until now, most AI in music has focused on generationâwriting songs or recommending playlistsânot understanding the foundations that human musicians spend years mastering.
This study takes a different path. It asks: What if we teach AIs the way we teach students? Could they eventually help us learn better, faster, and more affordably?
By using questions from a certified Level 6 RCM theory exam (which thousands of students take across North America), the research gives us real data on the capabilitiesâand limitsâof top AI models in learning core music theory skills.
đ§ How Do You Teach a Robot? Two Key Strategies
The study used two core teaching techniques repurposed for AIs:
1. In-Context Learning (ICL)
Think of this as giving the model a mini class before asking a question. You donât change or retrain the model itselfâyou just feed it a prompt that includes instructions and examples. The goal: teach it the rules by showing patterns.
Imagine asking, âWhatâs the interval between C and G?â But before that, you include:
âAn interval is the distance between two notes. For example, C to E is a major thirdâŠâ
This method lets the model learn on the fly, using just the information in the prompt. And thanks to newer models with massive context windows (up to 2 million tokens in Gemini 1.5 Pro!), you can include a ton of info while still keeping it âlocalâ to the prompt.
But thereâs a tradeoff: long prompts risk overwhelming the model. More details mean more decisionsâso writing effective prompts is its own art.
2. Chain-of-Thought Prompting (CoT)
Here, instead of just asking a question and expecting an answer, you encourage the model to âthink out loudââand you train it to do so with step-by-step examples.
For example:
âTo identify this cadence, I first notice the chord progression ends on V-I. Thatâs an authentic cadence…â
Showing worked examples, even just a few, helps the model mimic human-style reasoningâbreaking problems into baby steps. This is especially helpful in complex multi-rule systems, like music theory.
Combined, these two techniques allow you to turn language models into learners capable of tackling unfamiliar, nuanced subjects.
đŒ Not Just Notes: The Role of Music Encoding Formats
Machines donât read sheet music. They need everything encoded as data. The researchers tested four common music encoding formats:
- ABC: Simple, lightweight, originally for folk music.
- Humdrum: Great for detailed analytical work, used in computational musicology.
- MEI (Music Encoding Initiative): Flexible and academic-focused, good for early/non-Western music.
- MusicXML: The most widely adopted format, but geared toward Western notation.
Interestingly, while all can encode standard sheet music, some work better than others when fed into LLMsâespecially based on how much exposure the models might have had to that data during training.
MEI turned out to be a winnerâmore on that below.
đč The Test: A Real RCM Level 6 Exam
Students studying music in Canada and the U.S. often take Royal Conservatory of Music exams as part of their musical journey. Level 6 includes topics like:
- Key signatures
- Intervals and scales
- Chords and cadences
- Transpositions
- Music terms and history
These arenât basic trivia questionsâthey require analysis, reading music, and applying theory.
The researchers asked ChatGPT, Claude, and Gemini these questionsâboth with and without context (examples and guides)âto see how much the models could figure out on their own vs. how much they could learn from the prompts.
All models were evaluated in each of the four encoding formats. Then their answers were reviewed for accuracy, just like a teacher would grade a studentâs test.
đ What Did the Results Show?
Hereâs where it gets juicy.
Without any teaching (no context prompts), all models performed poorlyâthe best score was 52% (ChatGPT using MEI), far from the 60% passing mark for RCM exams.
But after being given chain-of-thought and in-context guidance?
đ Claude scored a whopping 75% using MEIâbeating out both ChatGPT and Gemini and well above the human minimum pass grade.
Letâs break it down:
| Model | Format | No Context | With Context | |——-|——–|————|————–| | ChatGPT | MEI | 52% | 60% âïž | | Claude | MEI | 44% | 75% đ | | Gemini | MEI | 30% | 52% â |
Claude also scored 74% on both Humdrum and MusicXML when given full context. Thatâs the equivalent of receiving Honors on the exam.
Another key finding: Contextual prompts helped across the board, especially for topics like:
- Intervals
- Scales
- Transposition
- Cadences
These areas responded especially well to examples and worked solutions.
On the flip side, rhythm, rests, and chords remained tricky. Models had a hard time grasping the complex rules of note grouping and time signatures, even with examples. A likely reason? Rhythm involves a lot of hidden nuance, and possibly noisy or incorrect data from their training sources (think amateur sheet music floating on the internet).
Bit of a relief: all modelsâwith or without contextânailed the music history questions. So at least they can memorize composers.
đ ïž Teaching AI Like a Music Student
Whatâs fascinating is that many techniques used to improve LLM responses mirror how we teach humans:
- Give clear, focused explanations
- Show worked examples
- Start simple, then increase complexity
- Encourage step-by-step reasoning
It turns out that the gap between human and machine learning might not be as wide as we thoughtâat least in structure, if not in cognition. When we treat the AI like a student, it acts more like one.
Even strategies like asking the model to âsummarize what it understands so far before answeringââsomething teachers do with kidsâhelped these machines stay on track.
đ Real-World Impact: Why This Matters
Still wondering why you should care if ChatGPT knows when to use a G# instead of an Aâ?
Hereâs why this research is exciting:
For Students
Imagine having a 24/7 AI music tutor that walks you through theory problems, explains tricky concepts, and gives immediate, personalized feedback. Better than late-night YouTube spirals, right?
For Teachers
Educators could generate custom quizzes, interactive problem sets, and adaptive theory guides tailored to each student’s levelâall while saving tons of time on grading.
For Developers
Creating smarter music theory apps just got a boost. With refined prompting, developers can build tools that don’t just quiz, but actually teach.
For AI Research
This is a case study in transferable learning. If we can teach AI music theory using human-style pedagogy, the same techniques might be usable across other difficult subject areasâmath, logic, even ethics.
đĄ Key Takeaways
- LLMs like GPT-4, Claude, and Gemini can learn music theoryâto a degreeâthrough well-crafted prompts.
- Context is king: prompting the model with examples and explanations (in-context learning + chain-of-thought) massively improves performanceâup to 31%.
- Claude outperformed the others, especially when using MEI and MusicXML encoding formats.
- Not all tasks are equally teachable: while models grasp intervals and cadences well, they still struggle with rhythm and complex chord analysis.
- Prompt design matters: effective educational AI isnât just about the modelâitâs about the way you teach it.
- Music education is heading toward an AI-augmented future, with digital tutors and assistants that can scale high-quality instruction across time zones and socioeconomic boundaries.
Want to level up your own prompting game? Try designing CoT prompts for the topics youâre helping an LLM understand. Start simple, show step-by-step reasoning, and donât be afraid to âteachâ like a human tutor.
And if youâve ever struggled through a theory classâwell, now you can say youâve technically got something in common with ChatGPT. đ¶
Let us know your thoughts: Could an AI music tutor replace (or at least assist) real ones? What topics do you want to see LLMs tackle next?
If you are looking to improve your prompting skills and haven’t already, check out our free Advanced Prompt Engineering course.
This blog post is based on the research article “Teaching LLMs Music Theory with In-Context Learning and Chain-of-Thought Prompting: Pedagogical Strategies for Machines” by Authors: Liam Pond, Ichiro Fujinaga. You can find the original article here.