SplitWisp

Docs

Pricing

Dashboard

Understanding A/B Test Results

This guide explains the key metrics and statistical concepts you'll see in your SplitWisp experiment results dashboard.

Dashboard Overview

When you open an experiment in the dashboard, the results page shows:

  • Metric cards — total impressions, total conversions, overall conversion rate, best variant lift, and total revenue
  • Winner banner — appears when a variant achieves statistical significance
  • Conversion rate chart — bar chart with error bars showing 95% confidence intervals
  • Daily time-series chart — line chart showing conversion rate trends over time per variant
  • Statistical summary — significance badge and sample size guidance
  • Detailed results table — per-variant breakdown with conversion rate, 95% CI, lift vs. control, p-value, and significance badges
  • UTM source breakdown — results segmented by traffic source (when UTM parameters are present)
  • Experiment notes — collapsible section for documenting context, learnings, and decisions

Key Metrics

Impressions

The number of unique sessions assigned to a variant. Each visitor is counted once per session. The SDK automatically tracks an impression event when a visitor is assigned to a variant.

Conversions

The number of sessions that triggered a conversion event for this variant. Conversions can come from automatic conversion goals (page visit, element click, form submit, scroll depth, time on page) or manual trackConversion() calls.

Conversion Rate

Conversions divided by impressions, shown as a percentage.

Formula: conversion_rate = conversions / impressions × 100%

Revenue (Total Value)

The sum of all conversion event values for a variant. Pass revenue in cents via trackConversion(experimentId, value) — e.g. 4999 for $49.99. The dashboard displays revenue in dollars.

Confidence Interval (CI)

A range around the observed conversion rate that is likely to contain the true conversion rate. SplitWisp uses a 95% Wilson Score Interval, which performs well even with low sample sizes.

Example: A conversion rate of 12.0% with a 95% CI of [10.5%, 13.7%] means we're 95% confident the true rate is between 10.5% and 13.7%.

The detailed results table shows CI values for every variant. Narrower intervals indicate more precise estimates — driven by larger sample sizes.

Statistical Significance

When the p-value is below 0.05, results are marked as statistically significant — meaning the observed difference is unlikely to be due to random chance alone. The dashboard shows:

  • A "Yes" or "No" significance badge per variant in the detailed results table
  • A winner banner when at least one non-control variant is significant and outperforms control
  • A "Not yet significant" banner when there isn't enough data yet

p-value

The probability of seeing a difference this large (or larger) if there were no real difference between variants. A lower p-value means stronger evidence of a real effect.

p-valueInterpretation
< 0.01Very strong evidence
< 0.05Strong evidence (significant)
< 0.10Moderate evidence
≥ 0.10Weak or no evidence

Lift

The percentage improvement of a variant over the control. Shown with a confidence interval in the detailed results table.

Formula: lift = (variant_rate - control_rate) / control_rate × 100%

A lift of +50% means the variant's conversion rate is 50% higher than control. The control row always shows "baseline" in the lift column.

Minimum Detectable Effect (MDE)

Given your current sample size and baseline conversion rate, the MDE tells you the smallest improvement you'd be able to detect with statistical significance.

  • If MDE is small (e.g. 1%), you can detect subtle improvements
  • If MDE is large (e.g. 20%), you need more data to detect smaller effects

The dashboard displays MDE in the statistical summary panel to help you decide whether to keep running or stop the experiment.

UTM Source Breakdown

If your traffic includes UTM parameters (utm_source, utm_medium, utm_campaign), the SDK captures them automatically and attaches them to all track events. The dashboard shows a source breakdown table with per-source variant results:

  • Impressions and conversions segmented by traffic source
  • Conversion rate per source — useful for identifying sources where a variant performs especially well or poorly
  • Source filtering — quickly compare performance across paid, organic, social, and email traffic

This helps answer questions like "Does the green CTA button work better for Google Ads traffic than for organic?"

Daily Time-Series Chart

The Conversion Rate Over Time chart shows how each variant's conversion rate has changed day-by-day since the experiment started. This helps you:

  • Verify result stability — see if the winning variant has been consistently better or if early traffic skewed the results
  • Detect anomalies — spot days with unusual traffic patterns or conversion spikes
  • Build confidence — watch the gap between variants grow (or shrink) over time
  • Decide when to stop — if lines have converged and stayed flat for several days, more data may not change the outcome

Each line represents a variant's daily conversion rate. Hover over any point to see the exact impression and conversion counts for that day. The chart automatically updates as new data arrives.

What to look for:

  • Consistent separation — if the winning variant stays above control every day, that's a strong signal
  • Convergence — if lines start apart but merge over time, the early difference may have been noise
  • Volatility — wild day-to-day swings suggest low daily traffic; wait for more data before deciding

Experiment Notes

The Notes section on the experiment detail page lets you document:

  • Why this test was created — hypothesis, business context, stakeholder requests
  • What was learned — insights from the results, unexpected findings
  • Follow-up actions — next experiments to run, implementation tasks

Notes are editable in all experiment statuses and are copied when you duplicate an experiment. Use them to preserve context for team handoff and future reference.

Significance vs. Sample Size

Seeing "Not significant" doesn't mean there's no difference — it may mean you don't have enough data yet.

What to do:

  1. Check the MDE — if it's larger than the effect you'd care about, keep running
  2. Wait for at least 1,000 impressions per variant as a minimum
  3. Run for at least one full business cycle (usually 1–2 weeks)
  4. Check the sample size banner in the dashboard for guidance

Understanding Confidence Intervals on Lift

When the dashboard says "Variant B has +15% lift with 95% CI: +8% to +22%", it means:

  • Point estimate: +15% is the best guess based on current data
  • Lower bound: We're 95% confident the true lift is at least +8%
  • Upper bound: We're 95% confident the true lift is no more than +22%

Conservative decision-making: Even in the worst case (+8%), you still win. That's a safe bet.

Traffic Allocation

SplitWisp uses hash-based deterministic assignment. Each session ID is hashed to consistently assign visitors to the same variant on repeat visits. This ensures:

  • Sticky sessions — a visitor always sees the same variant across page loads
  • No database lookups — assignments are computed at the edge, keeping latency low
  • Consistent experience — even if cookies are cleared, the session ID in localStorage preserves the assignment
  • Configurable weights — set any split ratio (50/50, 70/30, etc.) that sums to 100%

Why even splits? Maximum statistical power — you detect differences faster with balanced sample sizes. Weighted allocations (e.g. 80/20) reduce risk but require longer run times.

When to Trust Your Results

Trust the results when:

  • p-value is below 0.05
  • You have at least 1,000 impressions per variant
  • The experiment has run for at least 1 week
  • Confidence intervals don't overlap zero lift

⚠️ Be cautious when:

  • Sample size is small (< 500 per variant)
  • The experiment ran for less than 3 days
  • Results look significant but the sample size warning is showing
  • You paused and resumed with visual changes (the changed_while_paused flag was set)

After the Experiment

Once you have a statistically significant winner:

  1. Click Complete to finalize the experiment — see Experiment Lifecycle
  2. Click Promote Winner to serve the winning variant at 100% traffic
  3. Use the Developer Handoff card to implement changes permanently in code
  4. Click Mark as Implemented to archive the experiment

You can also export results as CSV from the experiment detail page for offline analysis.

Further Reading

What's Next?