SBDD is the new BDD

SBDD maps the behavioral surface BDD leaves unexplored

Bryan Boyko

Good engineering is about handling the exceptional cases gracefully: malformed inputs, unusual combinations, timing issues, or cases that technically follow the spec but rarely appear during development.

Tests written to validate an implementation work well when the input space is small and predictable, but become harder to rely on once systems interact with larger specifications, external integrations, or inconsistent real-world inputs.

Large language models (LLMs) have already mapped much of this terrain through specifications, documentation, and edge cases, and can pull in additional context through retrieval and tooling. Synthetic Behavioral-Driven Development (SBDD) extends traditional BDD by using them to generate additional behavioral scenarios from their corpus, specs, docs, and ecosystem context.

How It Works

SBDD pulls in specifications and generates scenarios across happy paths, failures, malformed inputs, and ambiguous cases, returning a behavioral map. That map can then feed into a traditional test harness.​​​​​​​​​​​​​​​​

Current

SBDD

Behavior listed out by devs

Behavior surfaced with help from the model

Inputs handpicked from examples

Synthetic inputs emulating real-world variation

Coverage = paths devs thought of

Coverage = input space explored

How We’re Applying It

We're using this approach in building a mobile ad renderer. The functional surface is wide: multiple ad formats, VAST and MRAID versions, plus edges like malformed payloads, unusual timing, and unsupported combinations. Creatives from a campaign or two will leave much of the surface untested, especially the edges. Synthetic generation lets us exercise some combinations more efficiently than with real traffic.

One edge case the model surfaced: a VAST wrapper chain that loops back on itself. Wrappers compose the ad-tech supply chain: a DSP wraps an SSP wraps an ad server. The IAB spec, along with broader ecosystem patterns the model has been trained on, implies depth limits and loop detection. The resulting test asserts the parser bails out after a configured depth, something an implementation-anchored pass could have easily missed.

The cases were generated without reading the implementation, starting from the behavioral surface the model had already mapped: specs, documentation, and ecosystem context.

SBDD Works Best When

  • The input space is too large to enumerate manually
  • Behavior is defined by external specifications
  • Edge cases matter disproportionately
  • Real-world inputs vary significantly from idealized examples

In Short

SBDD is not a replacement for existing testing practices, and is less useful when the system is tightly constrained and the behavioral surface is small. It is a practical way to broaden coverage in systems where real-world inputs are difficult to anticipate, one that scales as models grow more agentic.

Bryan Boyko