Product & Use Cases
We just shipped Feature Test inside B2Metric CDP. Here's what it does and why we built it.
Most product teams I talk to have the same release problem. You build something, you think it'll work, and then you ship it to everyone at once. If it breaks, everyone sees the breakage. If it works, you still can't prove why — maybe it was the feature, maybe it was a seasonal spike, maybe it was something else entirely. You're guessing with production traffic.
We built the Feature Test Module to fix that.
The Technical Architecture
Three layers, each doing a specific job.
Layer 1: SDK and Data Ingestion — integration happens at the client side: backend systems, mobile apps, or web. The SDK captures user events and forwards them to the B2Metric CDP pipeline. Onboarding is designed to take hours, not weeks. You're not rebuilding your event tracking — you're plugging into it.
Layer 2: Hash-Based Bucketing Service — this is where user assignment happens. When a user hits an experiment, the bucketing service computes a deterministic hash from the user ID and the experiment key. The result maps to a bucket, which maps to a variant, consistently across every session and every device. The same user always lands in the same bucket for the same experiment, regardless of when or where they trigger it. Traffic allocation is configurable: you can split randomly (e.g., 20/80) or target a pre-defined segment using your existing CDP audience builder, with no duplication of logic.
Layer 3: The Statistical Engine — this is the part most homegrown experiment setups skip, and it's where bad decisions come from. The engine runs continuously in the background, processing logged event data through statistical models. For each metric you're tracking, it computes the conversion rate delta between control and treatment groups, confidence intervals around that delta, and statistical significance surfaced clearly in the UI. The platform uses two-sided significance testing by default. You define your significance threshold, and the engine tells you when you've crossed it — or when you need more data before making a call. Metrics are reusable across experiments, which is what lets you build institutional knowledge over time.
Controlled Rollout: The Risk Management Piece
Feature flagging without rollout control is half a solution. The Feature Test Module includes a progressive rollout mechanism.
You start at 10% exposure. You watch your metrics. If there's no performance degradation, no latency spike, no error rate increase, no conversion drop, you step up to 25%, then 50%, then 100%. Each step requires zero redeployment — it's a configuration change on the platform side.
If something looks wrong at 10%, you kill it. The other 90% never see it. That's the rollout contract.
Who Actually Uses This
Product Managers are the primary decision-makers. They define what the experiment is trying to prove, set the metric, and read the results. The module is built so a PM can own the full experiment lifecycle — from setup to statistical read-out — without pulling in engineering for every step.
Software Developers get the most time back. Building experiment infrastructure in-house — bucketing logic, logging, and result analysis — typically takes weeks and becomes a maintenance burden. With SDK integration, the developer's job is to instrument the feature flag in the code and pass the right user context. The platform handles everything else.
Growth and Data Teams use it for velocity. They're running multiple experiments in parallel, iterating quickly, and need a clean separation between experiments so results don't bleed into each other.
Why This Matters for Fintech and Banking
E-commerce teams A/B test checkout buttons. Fintech teams have spent years arguing about whether they're even allowed to test. The regulatory environment, the risk aversion, the instinct that "we can't experiment on customers' financial lives" — all of it has kept experimentation culture out of most banks and fintech products longer than it should have.
That instinct is understandable. It is also costly. Here's where A/B testing matters most in financial products.
KYC and onboarding flows are where the most value is lost and the least testing gets done. A 5% improvement in onboarding completion at a neobank or lending platform doesn't sound dramatic until you run the numbers against customer acquisition cost. If you're spending €80 to acquire a user who doesn't finish onboarding, fixing the flow is more valuable than optimizing the ad.
Credit application steps are a textbook case for progressive rollout. Testing a new step sequencing or simplified income declaration UI to 10% of applicants — measuring completion rate and downstream loan performance — is not just good product practice. It's defensible to a risk committee.
Feature rollouts in mobile banking carry a specific risk product teams underestimate. A redesigned transaction history screen that confuses 3% of your user base will generate call center volume, app store complaints, and churn among your highest-tenure customers. Testing at 5% exposure first is the correct engineering approach.
Fraud and risk feature exposure is an area most product teams don't think of as an experiment surface, but it is. Rolling out a new step-up authentication flow to a controlled segment first — with metrics on both friction and fraud signal — gives your risk team data they wouldn't otherwise have.
Where This Fits
The module is designed for companies with high release frequency and low tolerance for metric regression: e-commerce, banking, fintech, and SaaS. The pattern is the same across all of them — shipping fast while needing to prove impact, not just assum




