Supercharging REST API Testing: How AI Can Amplify Test Coverage and Catch Bugs

REST APIs power the internet. Every app you use—whether it’s a banking app, an e-commerce site, or even your favorite social media platform—relies on APIs to connect data and services. But with great connectivity comes great responsibility: ensuring these APIs function correctly is crucial.

Testing REST APIs thoroughly can be a daunting task, given the sheer number of possible interactions and edge cases. Enter Large Language Models (LLMs) like ChatGPT and GitHub Copilot, which are revolutionizing how developers approach automated testing.

This blog dives into an exciting study that shows how AI-driven test amplification can enhance REST API test coverage, improve readability, and even surface hidden bugs. Let’s break it down.

Why REST API Testing is So Challenging

REST APIs operate as intermediaries between software applications, helping them communicate seamlessly. However, testing these APIs is notoriously difficult for several reasons:

Complexity of Interactions – APIs can have many endpoints, each with multiple possible inputs and responses.
Boundary Value Testing – Finding the “edge cases” where APIs might break is crucial but complicated.
Readability Issues in Automated Testing – Many test amplification tools generate cryptic variable names and unstructured test scripts, making them hard to understand.

Given these challenges, researchers explored whether out-of-the-box AI models could help improve REST API testing—both in terms of quantity (test coverage) and quality (readability and usefulness).

How AI Enhances REST API Testing

What is Test Amplification?

Test amplification is like upgrading your test suite with smarter, harder-hitting tests. Instead of manually crafting each test, developers can start with a well-written “baseline” test and let automation generate variations that cover more scenarios.

LLMs are particularly good candidates for this task because they can:

Create new tests based on existing ones (instead of starting from scratch).
Improve naming conventions and readability of test code.
Follow coding best practices automatically.

But how well do these AI models actually perform in generating meaningful and useful tests?

Putting AI to the Test: The Experiment

The researchers evaluated ChatGPT 3.5, ChatGPT 4, and GitHub Copilot by asking them to amplify test cases for a well-known cloud application, PetStore—an open-source platform with various API endpoints.

The Method

They used a step-by-step approach:

Start with a simple test case (aka “the happy-path test”), which follows a normal API workflow.
Ask AI to generate additional tests beyond this basic case.
Provide extra information (like OpenAPI documentation) to guide the AI models toward better output.
Evaluate the results based on test coverage, readability, and the ability to expose hidden bugs.

Prompting Matters: How Different Prompts Affected Results

AI’s effectiveness depended heavily on how it was prompted. Researchers tested three different approaches:

Basic Prompting – “Can you perform test amplification?”
This yielded additional test scenarios but was inconsistent across AI models.
Enhanced Prompting with API Documentation – Asking AI to use structured API documentation improved API coverage significantly.
Maximized Prompting – Asking AI to generate as many tests as possible resulted in the highest number of tests, covering more API endpoints.

What Did AI-Generated Tests Look Like?

GPT-4 produced the strongest results, balancing comprehensiveness with readability.
Copilot benefited most from structured documentation, boosting the number of created tests dramatically.
GPT-3.5 struggled with fine-tuning accuracy and required significantly more manual corrections.

Moreover, some AI-generated tests even exposed real bugs—like one test that should have triggered an error but instead returned a “success” response, revealing a flaw in the API!

The Pros and Cons of AI-Generated Tests

The Good

✅ Better Test Coverage – AI-generated tests exercised more paths and status codes than manually written ones.
✅ Readable Code – AI models produced clean, well-structured test names and comments.
✅ Bug Discovery – Some amplified test cases revealed actual defects in API behavior.

The Challenges

❌ Post-processing Required – AI-generated tests were helpful but often needed minor human tweaks.
❌ Model Variability – AI responses aren’t 100% consistent and vary slightly over time.
❌ Security Risks – Some API documentation shared with LLMs might be sensitive, raising concerns for enterprise use.

What This Means for Developers

For developers working with REST APIs, these findings open up exciting possibilities. AI tools can:

Automatically expand test suites with minimal effort.
Identify hidden defects without extensive manual probing.
Improve test readability and maintainability over time.

However, effective prompt engineering is essential. Refining AI inputs—such as adding API documentation or requesting maximum test cases—can make a big difference in results.

Future Directions: Where Can We Go from Here?

While this study focused on REST API testing, the potential applications of AI-assisted test amplification could extend further. Researchers suggest future explorations in:

UI testing and mobile app testing beyond REST APIs.
Using AI to expose even more bugs with better prompting strategies.
Refining models for test generation through Retrieval-Augmented Generation (RAG) techniques.

As AI models continue to evolve, automated testing could become even more precise, efficient, and accessible—benefiting companies and developers alike.

🔥 Key Takeaways

📌 AI-powered test amplification helps improve REST API test coverage and readability.
📌 GPT-4 and Copilot performed best, particularly when supplemented with API documentation.
📌 Good prompt engineering makes a huge difference—supplying extra context results in better tests.
📌 AI-generated tests exposed real API defects, showcasing their value in strengthening test suites.
📌 Post-processing is still needed, but the effort is often minimal compared to the benefits.

Final Thought

AI is not replacing developers anytime soon, but it’s making their lives easier—especially when it comes to tedious, time-consuming tasks like API testing. If you’re working on robust API development, integrating AI-based test amplification might just be the edge you need to catch hidden defects and improve overall software quality effortlessly. 🚀

Would you trust AI-generated tests for your next project? Let us know in the comments!

If you are looking to improve your prompting skills and haven’t already, check out our free Advanced Prompt Engineering course.

This blog post is based on the research article “Test Amplification for REST APIs Using “Out-of-the-box” Large Language Models” by Authors: Tolgahan Bardakci, Serge Demeyer, Mutlu Beyazit. You can find the original article here.

Blog

Supercharging REST API Testing: How AI Can Amplify Test Coverage and Catch Bugs

Supercharging REST API Testing: How AI Can Amplify Test Coverage and Catch Bugs

Why REST API Testing is So Challenging