- TDD is a design practice first and a testing practice second — engineers who treat it as only a testing tool miss the point
- 90%+ test coverage is achievable and maintainable in enterprise Java when you build coverage culture from day one
- BDD with Cucumber closes the gap between business requirements and automated tests — when done correctly
- Mutation testing (PIT) tells you the truth about your test suite; line coverage lies
- Claude Code and AI tools change TDD dynamics — writing tests first becomes faster than writing implementation first
I have been doing TDD since 2014. In that time, I have seen it dismissed as academic overhead by delivery managers, cargo-culted by teams chasing coverage metrics, and done correctly by the rare team that treats it as a design discipline. The difference between those outcomes is not the tools. It is the understanding of what TDD is actually for.
This is my manifesto for why TDD + BDD still wins in 2026 — even with AI coding assistants in the picture, even under sprint pressure, even in complex enterprise Java systems. Especially then.
TDD Is a Design Practice, Not a Testing Practice
The most important thing I can say about TDD is the thing that most introductions bury or skip entirely: TDD is primarily a design tool. The tests are not the goal. The design pressure that writing tests first applies to your production code — that is the goal.
When you write a test before you write the implementation, you are forced to answer a question that most engineers skip: How does this code want to be used? Not how it will be implemented — how it will be called, what arguments it takes, what it returns, what errors it signals. You are writing the API from the consumer’s perspective before you have written a single line of implementation.
This is why code written test-first tends to have better interfaces than code written implementation-first. The test is your first client. If the test is ugly to write — if the setup is verbose, if the assertions are convoluted, if you need to mock five collaborators — that ugliness is a signal. Your production code design needs to change. The test is doing its job.
The 90%+ Coverage Story — What It Actually Takes
On a healthcare data migration project I led early in my career, we maintained 90%+ Cucumber BDD coverage on Spring Batch pipelines processing EDI 834/835/837 transactions. Millions of records. Complex transformation logic. Multiple downstream systems. The coverage was not a checkbox we hit for a QA report — it was a survival mechanism.
Healthcare data migration is unforgiving. An off-by-one error in an EDI 837 transaction can result in a claim that never gets filed. A type coercion bug in 834 enrollment data can mean a patient’s coverage record is incorrect. In that context, a comprehensive test suite is not nice to have. It is the difference between a safe deployment and a production incident that affects patient care.
What made 90%+ achievable and maintainable:
- Cucumber feature files owned by the business analyst — not by engineers. The BA wrote the scenarios in Gherkin. Engineers wrote the step definitions. This created accountability: if the feature file didn’t describe the requirement, that was a requirements problem, not a test problem.
- Spring Test Slices for isolation —
@DataJpaTest,@WebMvcTest,@MockBeanat the right boundaries. We did not load the full application context for unit tests. - One test class per production class — obvious, but many teams don’t do this systematically. Every production class had a corresponding test class. No exceptions without a documented reason.
- Testing the unhappy path with the same rigour as the happy path — most teams write one test per method and only test the happy path. In an EDI pipeline, the malformed-record path is equally important.
The BDD Loop That Actually Works in Enterprise Java
BDD with Cucumber gets cargo-culted heavily. Teams write feature files that are essentially unit tests in Gherkin syntax, with step definitions that directly call internal service methods. This misses the point entirely.
The loop that actually works:
- Business analyst writes the scenario in Gherkin — in plain English, describing the business behaviour, not the technical implementation.
- Engineer reviews and clarifies — before writing a single line of code, engineer and BA align on the scenario. Ambiguities surface here, not in production.
- Engineer writes the step definitions — these call the public API of the system under test (an HTTP endpoint, a service interface, a batch job runner).
- Engineer writes the failing unit tests + implementation — TDD at the unit level, driving toward the scenario passing.
- Scenario passes — ship it.
Here is what a good Gherkin scenario looks like for a payment processing feature:
Feature: Payment Event Processing
Scenario: Valid payment event updates account balance
Given a payment event for account "ACC-123" with amount 1500.00
When the payment processor receives the event
Then the account balance for "ACC-123" should be 1500.00
And a confirmation event should be published to "balance-updated"
Scenario: Duplicate payment event is idempotent
Given a payment event for account "ACC-123" with idempotency key "PAY-456"
And the event "PAY-456" has already been processed
When the payment processor receives the event again
Then the account balance for "ACC-123" should remain unchanged
And no confirmation event should be published
The step definitions for these scenarios call your actual service or REST API — not mocks of internal methods. The BDD layer tests observable behaviour, not implementation details.
Spring Test Slices: The Enterprise Java Secret Weapon
One of the biggest barriers to good TDD in Spring Boot applications is test setup cost. If every test loads the full application context, a test suite of 500 tests becomes painful to run. Spring’s test slice annotations are the solution:
// Test only the web layer — no database, no service beans
@WebMvcTest(PaymentController.class)
class PaymentControllerTest {
@Autowired
private MockMvc mockMvc;
@MockBean
private PaymentService paymentService;
@Test
void shouldReturn200ForValidPayment() throws Exception {
given(paymentService.process(any())).willReturn(PaymentResult.success("PAY-001"));
mockMvc.perform(post("/api/payments")
.contentType(APPLICATION_JSON)
.content("""{ "accountId": "ACC-123", "amount": 1500.00 }"""))
.andExpect(status().isOk())
.andExpect(jsonPath("$.paymentId").value("PAY-001"));
}
}
// Test only the JPA layer — uses embedded H2, no web layer
@DataJpaTest
class PaymentRepositoryTest {
@Autowired
private PaymentRepository repository;
@Test
void shouldFindByIdempotencyKey() {
var payment = Payment.builder()
.idempotencyKey("PAY-456")
.amount(BigDecimal.valueOf(1500.00))
.build();
repository.save(payment);
var found = repository.findByIdempotencyKey("PAY-456");
assertThat(found).isPresent();
assertThat(found.get().getAmount()).isEqualByComparingTo("1500.00");
}
}
These slice tests load a fraction of the full context. The full @SpringBootTest integration test is reserved for the BDD scenario layer. This is the architecture that keeps test suites fast enough to run on every commit.
Mutation Testing: Where Coverage Metrics Tell the Truth
Line coverage is the most lied-about metric in software engineering. A test that calls a method and asserts nothing about its output will still give you 100% line coverage. The method’s logic could be completely wrong and the test would still pass.
Mutation testing (the PIT framework for Java) is the antidote. PIT modifies your production code in small ways — mutants — and checks whether your tests catch the change. If a mutant survives (your tests pass despite the code being wrong), you have a gap in your test suite.
Common mutants PIT introduces:
- Changing
>to>=in a conditional - Negating a boolean expression
- Replacing arithmetic operators (
+to-,*to/) - Removing method calls
- Changing return values to
nullor0
In Maven, adding PIT takes two minutes:
<plugin>
<groupId>org.pitest</groupId>
<artifactId>pitest-maven</artifactId>
<version>1.15.3</version>
<configuration>
<targetClasses>
<param>com.example.payments.*</param>
</targetClasses>
<targetTests>
<param>com.example.payments.*Test</param>
</targetTests>
<mutationThreshold>80</mutationThreshold>
</configuration>
</plugin>
Setting mutationThreshold to 80 in CI means your build fails if fewer than 80% of mutants are killed. This is a far more honest quality gate than 80% line coverage.
How Claude Code Changes TDD Dynamics
Since I integrated Claude Code into my development workflow, the TDD cycle has changed in an interesting way: writing the test first has become easier than writing the implementation first.
Here is the prompt pattern that works for me:
I have a PaymentService that needs to process payment events from Kafka. It should: (1) check for duplicate processing using an idempotency key against PostgreSQL, (2) update the account balance if not duplicate, (3) publish a balance-updated event to Kafka, (4) throw a NonRetryableException for malformed events. Write all the JUnit 5 + Mockito test cases first, covering both happy path and all edge cases. Use the @ExtendWith(MockitoExtension.class) pattern. Do not write the implementation yet.
Claude Code generates a comprehensive test class with all edge cases — including ones I might have forgotten (null event, event with missing required fields, database connection failure during idempotency check, Kafka publish failure after successful persistence). I then review the generated tests, add any domain-specific edge cases I know from experience, and then write the implementation.
This workflow preserves all the design benefits of TDD — the test is still my first client, and I still feel the design pressure of hard-to-test code — while dramatically reducing the time cost of the test-first approach. The AI handles the mechanical completeness; I handle the domain knowledge.
One important caveat: Claude Code generates syntactically correct tests, but it does not know your domain rules. It will generate a test for “null payment amount” but it won’t know that amounts of exactly zero are valid for refund reversal events in your specific system. That domain knowledge is yours to add. The AI gives you the test skeleton; you give it the soul.
The TDD Anti-Patterns That Kill Productivity
TDD failure modes I have seen across teams:
Anti-pattern 1: Testing implementation, not behaviour. If your test breaks every time you rename a private method, you are testing implementation. Test what the code does, not how it does it. Rename-resilient tests are a sign of good test design.
Anti-pattern 2: Mocking everything. A test that mocks 8 collaborators is testing the mock framework, not your code. If you have to mock everything to write a test, your design has too many dependencies. TDD is telling you to redesign.
Anti-pattern 3: Writing tests after the fact. Retroactive TDD is not TDD. Writing tests after you have already written the implementation means the implementation will drive the test design, and you lose the design pressure benefit entirely. The test will fit around the implementation rather than the implementation fitting around the test.
Anti-pattern 4: Giant test classes. A test class with 50 test methods is usually a sign that the production class has too many responsibilities. Each test class should test a class with a single, clear responsibility. If your test class feels like it is covering too many things, your production class is doing too many things.
Anti-pattern 5: Ignoring the green-to-refactor phase. TDD has three phases: Red (failing test), Green (minimal passing implementation), Refactor (clean up without breaking tests). Most engineers skip the Refactor phase under time pressure. This is where technical debt accumulates. The Refactor phase is not optional — it is where TDD pays its long-term dividends.
Frequently Asked Questions
Does TDD slow down development?
In the short term, on a greenfield feature: slightly. In the medium term, when you are maintaining and extending that feature: significantly faster. The velocity advantage of TDD comes from two sources — the design quality of test-driven code (which is easier to change) and the safety net that allows you to refactor aggressively. Teams that abandon TDD for speed almost always pay the cost back in debugging time and fear of refactoring.
What is the difference between TDD and BDD?
TDD (Test-Driven Development) is a developer practice: write a failing unit test, write minimal code to make it pass, refactor. It operates at the class/method level. BDD (Behaviour-Driven Development) is a collaboration practice: business stakeholders and engineers define expected behaviour in plain language (Gherkin), which then drives automated acceptance tests. BDD sits above TDD in the testing pyramid. You use TDD inside the implementation of scenarios described in BDD.
What is mutation testing and how is it different from code coverage?
Code coverage measures which lines of production code are executed by your tests. Mutation testing measures whether your tests can detect changes to your production code. A test that executes a line but never asserts its output contributes to coverage but provides zero mutation score. Mutation testing (PIT for Java) introduces deliberate code modifications and checks that your tests fail when they should. A mutation score above 75% is a meaningful quality signal; raw line coverage is not.
Should I use Cucumber for all tests or just some?
Use Cucumber only at the acceptance/BDD layer — the outermost ring of your testing pyramid. It should describe business scenarios in business language. Do not use Cucumber for unit tests; JUnit is faster and better suited. A reasonable split: 70% unit tests (JUnit + Mockito), 20% integration tests (Spring slice tests, TestContainers), 10% BDD acceptance tests (Cucumber).
How does Claude Code help with TDD?
Claude Code is particularly effective for generating comprehensive test cases from requirements descriptions. Give it the specification in natural language and ask it to write tests before writing implementation. It generates edge cases (null inputs, boundary values, exception paths) that engineers often miss under deadline pressure. The key is always reviewing and adding domain-specific cases that the AI cannot know from the specification alone.
The Bottom Line
TDD is not about coverage percentages. It is not about Cucumber feature files. It is not about CI gates. It is about a discipline of designing code that is testable because you design it from the test’s perspective first.
Every team I have seen adopt TDD correctly has produced code that is easier to change, easier to onboard new engineers into, and dramatically less likely to produce production incidents from regression. Every team I have seen abandon TDD “for speed” has eventually spent more time debugging and firefighting than they saved by skipping tests.
In 2026, with tools like Claude Code available to dramatically reduce the mechanical overhead of test writing, the remaining barrier to TDD is cultural — not technical. The time argument is weaker than it has ever been. The design argument is as strong as it has ever been.
Write the test first. Read what it tells you about your design. Write the simplest implementation that makes it pass. Refactor. Repeat.
More from this blog:
Leave a comment