Understanding A/B Test Results

This guide explains the key metrics and statistical concepts you'll see in your SplitWisp experiment results dashboard.

Dashboard Overview

When you open an experiment in the dashboard, the results page shows:

Metric cards — total impressions, total conversions, overall conversion rate, best variant lift, and total revenue
Winner banner — appears when a variant achieves statistical significance
Conversion rate chart — bar chart with error bars showing 95% confidence intervals
Daily time-series chart — line chart showing conversion rate trends over time per variant
Statistical summary — significance badge and sample size guidance
Detailed results table — per-variant breakdown with conversion rate, 95% CI, lift vs. control, p-value, and significance badges
UTM source breakdown — results segmented by traffic source (when UTM parameters are present)
Experiment notes — collapsible section for documenting context, learnings, and decisions

Key Metrics

Impressions

The number of unique sessions assigned to a variant. Each visitor is counted once per session. The SDK automatically tracks an impression event when a visitor is assigned to a variant.

Conversions

The number of sessions that triggered a conversion event for this variant. Conversions can come from automatic conversion goals (page visit, element click, form submit, scroll depth, time on page) or manual trackConversion() calls.

Conversion Rate

Conversions divided by impressions, shown as a percentage.

Formula: conversion_rate = conversions / impressions × 100%

Revenue (Total Value)

The sum of all conversion event values for a variant. Pass revenue in cents via trackConversion(experimentId, value) — e.g. 4999 for $49.99. The dashboard displays revenue in dollars.

Confidence Interval (CI)

A range around the observed conversion rate that is likely to contain the true conversion rate. SplitWisp uses a 95% Wilson Score Interval, which performs well even with low sample sizes.

Example: A conversion rate of 12.0% with a 95% CI of [10.5%, 13.7%] means we're 95% confident the true rate is between 10.5% and 13.7%.

The detailed results table shows CI values for every variant. Narrower intervals indicate more precise estimates — driven by larger sample sizes.

Statistical Significance

When the p-value is below 0.05, results are marked as statistically significant — meaning the observed difference is unlikely to be due to random chance alone. The dashboard shows:

A "Yes" or "No" significance badge per variant in the detailed results table
A winner banner when at least one non-control variant is significant and outperforms control
A "Not yet significant" banner when there isn't enough data yet

p-value

The probability of seeing a difference this large (or larger) if there were no real difference between variants. A lower p-value means stronger evidence of a real effect.

p-value	Interpretation
< 0.01	Very strong evidence
< 0.05	Strong evidence (significant)
< 0.10	Moderate evidence
≥ 0.10	Weak or no evidence

Lift

The percentage improvement of a variant over the control. Shown with a confidence interval in the detailed results table.

Formula: lift = (variant_rate - control_rate) / control_rate × 100%

A lift of +50% means the variant's conversion rate is 50% higher than control. The control row always shows "baseline" in the lift column.

Minimum Detectable Effect (MDE)

Given your current sample size and baseline conversion rate, the MDE tells you the smallest improvement you'd be able to detect with statistical significance.

If MDE is small (e.g. 1%), you can detect subtle improvements
If MDE is large (e.g. 20%), you need more data to detect smaller effects

The dashboard displays MDE in the statistical summary panel to help you decide whether to keep running or stop the experiment.

UTM Source Breakdown

If your traffic includes UTM parameters (utm_source, utm_medium, utm_campaign), the SDK captures them automatically and attaches them to all track events. The dashboard shows a source breakdown table with per-source variant results:

Impressions and conversions segmented by traffic source
Conversion rate per source — useful for identifying sources where a variant performs especially well or poorly
Source filtering — quickly compare performance across paid, organic, social, and email traffic

This helps answer questions like "Does the green CTA button work better for Google Ads traffic than for organic?"

Daily Time-Series Chart

The Conversion Rate Over Time chart shows how each variant's conversion rate has changed day-by-day since the experiment started. This helps you:

Verify result stability — see if the winning variant has been consistently better or if early traffic skewed the results
Detect anomalies — spot days with unusual traffic patterns or conversion spikes
Build confidence — watch the gap between variants grow (or shrink) over time
Decide when to stop — if lines have converged and stayed flat for several days, more data may not change the outcome

Each line represents a variant's daily conversion rate. Hover over any point to see the exact impression and conversion counts for that day. The chart automatically updates as new data arrives.

What to look for:

Consistent separation — if the winning variant stays above control every day, that's a strong signal
Convergence — if lines start apart but merge over time, the early difference may have been noise
Volatility — wild day-to-day swings suggest low daily traffic; wait for more data before deciding

Experiment Notes

The Notes section on the experiment detail page lets you document:

Why this test was created — hypothesis, business context, stakeholder requests
What was learned — insights from the results, unexpected findings
Follow-up actions — next experiments to run, implementation tasks

Notes are editable in all experiment statuses and are copied when you duplicate an experiment. Use them to preserve context for team handoff and future reference.

Significance vs. Sample Size

Seeing "Not significant" doesn't mean there's no difference — it may mean you don't have enough data yet.

What to do:

Check the MDE — if it's larger than the effect you'd care about, keep running
Wait for at least 1,000 impressions per variant as a minimum
Run for at least one full business cycle (usually 1–2 weeks)
Check the sample size banner in the dashboard for guidance

Understanding Confidence Intervals on Lift

When the dashboard says "Variant B has +15% lift with 95% CI: +8% to +22%", it means:

Point estimate: +15% is the best guess based on current data
Lower bound: We're 95% confident the true lift is at least +8%
Upper bound: We're 95% confident the true lift is no more than +22%

Conservative decision-making: Even in the worst case (+8%), you still win. That's a safe bet.

Traffic Allocation

SplitWisp uses hash-based deterministic assignment. Each session ID is hashed to consistently assign visitors to the same variant on repeat visits. This ensures:

Sticky sessions — a visitor always sees the same variant across page loads
No database lookups — assignments are computed at the edge, keeping latency low
Consistent experience — even if cookies are cleared, the session ID in localStorage preserves the assignment
Configurable weights — set any split ratio (50/50, 70/30, etc.) that sums to 100%

Why even splits? Maximum statistical power — you detect differences faster with balanced sample sizes. Weighted allocations (e.g. 80/20) reduce risk but require longer run times.

When to Trust Your Results

✅ Trust the results when:

p-value is below 0.05
You have at least 1,000 impressions per variant
The experiment has run for at least 1 week
Confidence intervals don't overlap zero lift

⚠️ Be cautious when:

Sample size is small (< 500 per variant)
The experiment ran for less than 3 days
Results look significant but the sample size warning is showing
You paused and resumed with visual changes (the changed_while_paused flag was set)

After the Experiment

Once you have a statistically significant winner:

Click Complete to finalize the experiment — see Experiment Lifecycle
Click Promote Winner to serve the winning variant at 100% traffic
Use the Developer Handoff card to implement changes permanently in code
Click Mark as Implemented to archive the experiment

You can also export results as CSV from the experiment detail page for offline analysis.

What's Next?

Getting Started — Install the SDK and launch your first experiment
Conversion Goals — Set up automatic conversion tracking with five goal types
Experiment Lifecycle — Status transitions, promotion, archiving, and best practices
SDK Reference — Full API documentation including trackConversion() and UTM capture
Visual Editor Guide — Create visual A/B tests without writing code