
Sanjeev Dhanush Challapalli
Supply Chain Analyst
Sanjeev is a Supply Chain Analyst working across semiconductor, biotech, and pharma. At Thermo Fisher Scientific he supported S&OP demand planning for 700+ SKUs across North America, rebuilt the forecasting process using time-series modeling (ARIMA, moving averages) to close the gap between demand signals and supply decisions, and surfaced $180K in excess and slow-moving inventory through SKU-level consumption and ageing analysis. At ASM International he manages 70-140 weekly spares orders at 95% consignment reconciliation accuracy and recalibrated SAP S/4HANA inventory parameters to cut expedite dependency 20% and late deliveries 12%. At Vestas Pharmaceuticals he built ARIMA-based forecasting models that fed cost budgeting, and cut end-to-end procurement cycle time 20% via Value Stream Mapping. Tools: SAP S/4HANA, Python, SQL, Power BI, Tableau. Six Sigma Green Belt; CSCMP Demand Forecasting; APICS CPIM in progress. MS in Business Analytics, Northeastern University; BTech in Mechanical Engineering, SRM University.
What is S&OP demand forecasting?
S&OP demand forecasting is the cross-functional process of producing a single, reconciled demand number for every SKU at a defined horizon (typically 12-18 months, monthly buckets) that drives the supply, inventory, and financial plans. The forecast is produced statistically, adjusted by sales/marketing intelligence, reconciled across hierarchy levels (SKU, family, region, total), and signed off in a monthly S&OP consensus cycle.
Which forecasting method should I use for my SKUs?
Segment first, model second. Use ADI (average demand interval) and CV² (coefficient of variation squared) to classify each SKU into smooth, intermittent, lumpy, or erratic. Smooth SKUs win with exponential smoothing (ETS) or ARIMA. Intermittent/lumpy SKUs need Croston or its TSB variant. Erratic SKUs benefit from Prophet or regression-based models with external drivers. Running one model across the whole portfolio is the most common reason an S&OP forecast underperforms a hand-tuned spreadsheet.
How do you measure forecast accuracy in S&OP?
Track three metrics together, never one alone: WMAPE (weighted mean absolute percentage error, weighted by SKU value or volume — more meaningful than MAPE on a heterogeneous portfolio), bias (the signed forecast error, to detect systematic over- or under-forecasting), and forecast value-add (whether judgmental overrides actually improved or worsened the statistical baseline). MAPE alone is misleading because it weights every SKU equally, so a 50% miss on a $10 SKU dominates a 5% miss on a $100K SKU.
How long does it take to rebuild a broken S&OP forecast?
On a 700+ SKU portfolio, expect 8-12 weeks for the rebuild itself plus one full S&OP cycle (4-5 weeks) of parallel running before retiring the old forecast. The compressed timeline at Thermo Fisher worked because the rebuild was sequenced — data audit, segmentation, method matching, accuracy benchmarking, consensus integration — each phase gated on the previous one passing. Skipping the data audit and jumping to model selection is the most common reason rebuilds drift to 6+ months.
Most "demand forecasting at scale" content on the internet stops at three sentences about ARIMA and a screenshot of a Power BI line chart. The reality at S&OP scale — 700+ SKUs across North America, multi-tier inventory, monthly consensus cycles where finance, sales, ops, and supply all need the same number — is that the forecasting model is the easy half. The hard half is which model for which SKU, on which data, with which accuracy metric, reviewed in which cadence. Get any one of those wrong and the forecast is technically running but operationally useless.
The Thermo Fisher rebuild closed the gap between demand signals and supply decisions because every layer below the model — the segmentation, the data audit, the accuracy framework, the consensus cycle — was rebuilt with it. The model itself was the last decision, not the first.
- S&OP Demand Forecasting
S&OP demand forecasting is the cross-functional process of producing a single, reconciled demand number for every SKU at a defined horizon (typically 12-18 months in monthly buckets) that drives the supply, inventory, and financial plans. The forecast is produced statistically, adjusted with sales and marketing intelligence, reconciled across hierarchy levels (SKU, product family, region, total), and signed off in a monthly S&OP consensus cycle so that supply, finance, and commercial functions all act on the same number.
A working forecast at 50 SKUs is one engineer with a spreadsheet and good demand signal. A working forecast at 700+ SKUs is a system. The shift is not a question of more compute — it is a question of structure. Four things silently degrade S&OP forecast quality once a portfolio grows past the spreadsheet limit:
- Demand-pattern heterogeneity. A portfolio of 700 SKUs is rarely 700 instances of the same demand pattern. Some SKUs ship every day. Some ship once a quarter in batches of 100. Some have promotional spikes. A single statistical model averaged across all of them lands somewhere between mediocre and actively wrong on the SKUs that matter most.
- Data-quality drift. SKU masters accumulate dirty entries — discontinued items still flagged active, new SKUs without enough history, hierarchy misclassifications. A statistical model trained on the dirty data inherits the noise and forecasts confidently around it.
- Process drift. The consensus cycle that worked at 50 SKUs ("everyone reviews every line") collapses past 200. Without exception-based review, sales and finance silently stop participating, and the forecast becomes a stat-team output that ops doesn't trust.
- Metric drift. A single accuracy metric — usually MAPE — is reported because it sounds rigorous. On a heterogeneous portfolio, MAPE rewards the wrong things: a 50% error on a $10 part counts the same as a 5% error on a $100K part.
The Thermo Fisher rebuild treated all four of these failure modes as gates the new forecast had to clear, not as nice-to-haves. The data audit closed the data-quality gap before any model ran. The segmentation work prevented one-model-fits-all. The accuracy framework reported WMAPE and bias side by side. And the consensus cadence was redesigned around exception-based review so finance and sales actually participated.
S&OP demand forecasting at scale is not a modeling problem — it is a system problem. The four silent failure modes (demand-pattern heterogeneity, data-quality drift, process drift, metric drift) all degrade forecast quality independently of how good the statistical model is. A rebuild that addresses only the model and ignores the other three layers ships a forecast that runs but is not trusted.
The single most common reason an S&OP forecast underperforms a hand-tuned spreadsheet is that the same model is applied to every SKU. ARIMA on the whole portfolio. Or Prophet on the whole portfolio. Or worse, simple moving average on the whole portfolio. Each of these is the right answer for some SKUs and the wrong answer for the others.
Demand patterns split into four canonical shapes, and each shape favors a different family of methods. The classification originates in the operations-research literature (Syntetos and Boylan, 2005) and has held up in production because it captures the two dimensions that actually matter: how often demand occurs (intermittence) and how variable it is when it does occur (variability).
| Pattern | What it looks like | Typical SKU example | Methods that work |
|---|---|---|---|
| Smooth | Demand occurs in most periods, variability is low | High-volume, fast-moving consumables | Moving average, ETS (exponential smoothing), ARIMA |
| Intermittent | Demand occurs in many periods missing, but quantities are stable when they do | Spare parts, low-velocity but predictable items | Croston's method, TSB (Teunter-Syntetos-Babai) variant |
| Erratic | Demand occurs in most periods but with high variability | Promotional or seasonally driven items, externally influenced | Prophet, regression-based models with external drivers, ARIMAX |
| Lumpy | Demand both intermittent AND highly variable when it occurs | Project-driven, capital spares, large-batch items | Croston with bias correction, judgmental overlay, bootstrap simulation |
The mistake is not running ARIMA — ARIMA is excellent on smooth, autocorrelated demand. The mistake is running ARIMA on a lumpy SKU, where the model fits noise as if it were signal and produces a confident-looking forecast that misses every spike.
- Demand Pattern (Syntetos-Boylan Classification)
A demand pattern is a categorical classification of a SKU's demand-time series along two dimensions: ADI (average demand interval — periods between non-zero demand) and CV² (squared coefficient of variation of non-zero demand). The four canonical patterns — smooth (low ADI, low CV²), intermittent (high ADI, low CV²), erratic (low ADI, high CV²), and lumpy (high ADI, high CV²) — predict which family of forecasting methods will outperform on that SKU.
The reason monolithic forecasts persist despite their poor performance is operational: a single model is easy to explain, easy to audit, easy to run in batch. Segmenting the portfolio means maintaining multiple model pipelines, multiple parameter sets, and a routing layer that decides which method runs for which SKU. The complexity is real — but so is the gap between a 30% WMAPE on the whole portfolio and a 12% WMAPE on the same portfolio with segmented methods.
No single forecasting method wins on a heterogeneous 700+ SKU portfolio. Smooth demand favors ETS and ARIMA; intermittent favors Croston; erratic favors Prophet or regression with external drivers; lumpy needs judgment plus simulation. Segmentation is what turns the model selection from a guess into a decision, and it is the single highest-leverage change in any forecast rebuild.
The four-pattern classification is operational only when it can be computed from the demand history without judgment calls. Two metrics are enough:
- ADI (Average Demand Interval) — the average number of periods between non-zero demand. A SKU that ships every period has ADI = 1. A SKU that ships once every 4 months on a monthly bucket has ADI ≈ 4. The threshold separating "smooth/erratic" from "intermittent/lumpy" is conventionally set at ADI = 1.32, derived empirically by Syntetos and Boylan as the inflection point where Croston's method starts beating exponential smoothing.
- CV² (Squared Coefficient of Variation) — the variance of non-zero demand divided by its squared mean. A SKU that ships exactly 100 units every time it ships has CV² = 0. A SKU whose non-zero quantities range wildly has CV² > 0.49 by convention.
- ADI / CV² Segmentation
ADI / CV² segmentation is a two-dimensional classification scheme that routes every SKU in a portfolio to one of four demand-pattern categories — smooth, intermittent, erratic, or lumpy — based on the average interval between non-zero demand (ADI) and the squared coefficient of variation of non-zero demand (CV²). Conventional thresholds (ADI = 1.32, CV² = 0.49) come from the Syntetos-Boylan operations-research literature. Each category routes to a different family of forecasting methods.
The four-quadrant matrix:
| Quadrant | ADI | CV² | Routes to |
|---|---|---|---|
| Smooth | ≤ 1.32 (frequent demand) | ≤ 0.49 (low variability) | ETS, ARIMA, simple seasonal models |
| Erratic | ≤ 1.32 (frequent demand) | > 0.49 (high variability) | Prophet, regression with drivers, ARIMAX |
| Intermittent | > 1.32 (sporadic demand) | ≤ 0.49 (stable when occurring) | Croston, TSB (Teunter-Syntetos-Babai) |
| Lumpy | > 1.32 (sporadic demand) | > 0.49 (variable when occurring) | Croston with bias correction + judgmental overlay + bootstrap simulation |
| ABC × XYZ × Pattern | Treatment | Review cadence |
|---|---|---|
| A-class, X (stable), smooth | Auto-forecast with ETS/ARIMA, low-touch monthly review | Monthly exception only |
| A-class, Z (volatile), erratic | Prophet or regression + drivers, full S&OP consensus review | Monthly + weekly hot-list |
| B-class, intermittent | Croston/TSB auto-route, sample-based audit | Quarterly methodology audit |
| C-class, lumpy | Croston + bias correction; consider buy-to-order or vendor-managed | Quarterly review only; flag for portfolio rationalization |
Demand patterns are not static. A SKU that was smooth a year ago can become lumpy after a customer change, a product transition, or a market shift. Re-classifying the portfolio on a fixed cadence (quarterly is typical for S&OP) and routing SKUs to the appropriate method automatically prevents the slow drift where 100 SKUs are silently being forecasted with the wrong method because their pattern shifted but the routing did not.
ADI and CV² turn demand-pattern classification from a judgment call into a deterministic, reproducible step. Layered with ABC (value) and XYZ (variability), the classification routes each SKU to the right method, the right review cadence, and the right level of human attention — which is what makes a 700+ SKU portfolio actually plannable instead of theoretically modeled.
Once SKUs are segmented, method selection collapses to a small, defensible set of choices. The five families that handle 95%+ of S&OP demand are:
Moving Average (and Naive Baselines)
Best for: short-history SKUs, baseline benchmarking, or stable demand with no trend or seasonality. The moving-average forecast for the next period is the average of the last N actuals. The naive baseline (next period equals the last period) is even simpler and is the default benchmark every other method must beat.
The baseline is not a placeholder — it is a critical accuracy check. If ARIMA does not beat naive on a SKU, the model is doing nothing useful and should be replaced with naive.
ETS (Exponential Smoothing State Space)
Best for: smooth demand with optional trend or seasonality. The Hyndman-Khandakar ETS framework automatically selects the right combination of error (additive/multiplicative), trend (none/additive/damped), and seasonality (none/additive/multiplicative) by minimizing AIC. It handles trend and seasonal patterns without manual configuration and runs fast on large portfolios.
ETS is often the right default for smooth, A-class SKUs when the rebuild needs a high-quality auto-routing layer rather than per-SKU hand-tuning.
ARIMA (Auto-Regressive Integrated Moving Average)
auto.arima algorithm in Hyndman's forecast R package, or pmdarima in Python) automatically searches over (p, d, q) and seasonal (P, D, Q) terms.ARIMA's strength is also its limitation: it assumes the demand-generating process is stationary or differenceable to stationary. On erratic or lumpy demand it overfits the historical noise and forecasts confidently into the wrong shape.
Prophet (Meta's Decomposable Additive Model)
Best for: erratic or seasonal demand, especially where holidays, promotions, and external drivers matter. Prophet decomposes the time series into trend, seasonality, holidays, and external regressors. It tolerates missing data and outliers better than ARIMA.
Prophet's weakness is its strength: the additive decomposition makes it intuitive and explainable, but on intermittent demand the additive trend term can drift in unrealistic directions. Prophet is the wrong tool for low-velocity spare parts; it is often the right tool for promotional consumer goods or capital-equipment demand with seasonality.
Croston's Method (and TSB)
Best for: intermittent and lumpy demand. Croston decomposes the series into two separate components — non-zero demand size and the interval between non-zero demand events — and applies exponential smoothing to each. The TSB (Teunter-Syntetos-Babai) variant corrects a known bias in classical Croston when the demand interval is changing.
Croston is the only one of these methods that explicitly models the "many zeros" pattern of spare-parts and slow-mover demand. Running ARIMA or ETS on intermittent demand routinely produces forecasts of "0.3 units per month" — operationally meaningless. Croston produces an expected demand rate, which is what inventory parameter calculations actually need.
| Method | Best for | Wins when | Loses when |
|---|---|---|---|
| Naive / Moving Average | Baseline benchmark; short-history SKUs | Demand has no trend, no seasonality, no autocorrelation worth modeling | Always; it's a baseline, not a winner |
| ETS | Smooth demand, A-class SKUs, default auto-route | Trend or seasonal patterns are present and consistent | Demand is intermittent or driven by external factors |
| ARIMA (auto) | Smooth, autocorrelated demand | Past values strongly predict next values | Lumpy demand; fits noise as if it were signal |
| Prophet | Erratic, promotional, or holiday-driven demand | External drivers, seasonality, and changepoints matter | Intermittent low-velocity demand; over-smooth trend |
| Croston / TSB | Intermittent and lumpy demand | Many zeros in the demand series | Demand becomes smooth; switch to ETS |
Statistical methods are one layer; judgmental overrides from sales and marketing are another. Forecast Value-Add (FVA) is the discipline of measuring whether each override actually improved forecast accuracy or made it worse. A surprising fraction of overrides — often 30-50% in unaudited environments — degrade accuracy. Tracking FVA per contributor surfaces this and turns the consensus cycle from a debate into a data-driven review.
Method matching is not "pick the most sophisticated model." It is routing each SKU to the family of methods that fits its demand pattern, then benchmarking every choice against a naive baseline. Naive, ETS, ARIMA, Prophet, and Croston cover ~95% of S&OP demand when paired with the right segmentation; the remaining 5% lives in judgmental overlays that must be measured (FVA) to ensure they help rather than hurt.
The compressed timeline at Thermo Fisher worked because the rebuild was sequenced. Each phase gated on the previous one passing, which prevented the most common rebuild failure mode: jumping straight to model selection on dirty data.
Audit the demand history before touching a model
Pull 24-36 months of demand history per SKU. Flag SKUs with less than 12 months of history (separate cold-start treatment). Identify obvious data anomalies: stockout-suppressed demand (where actual demand was capped by supply), one-time customer events that should not be repeated in the forecast, returns that were booked as negative demand. Clean before you classify; classify before you model. Skipping this step is the single most common reason forecast rebuilds drift past 6 months.
Classify every SKU by demand pattern
Compute ADI and CV² for each SKU on the cleaned history. Apply the Syntetos-Boylan thresholds (ADI 1.32, CV² 0.49) to assign each SKU to smooth, intermittent, erratic, or lumpy. Overlay ABC (by inventory value) and XYZ (by quantity variability). The output is a portfolio-level routing table: every SKU has an assigned method family before any model is configured.
Build the baseline forecast first
Run naive and simple moving average across the entire portfolio. Record WMAPE and bias per SKU and per segment. Every subsequent method must beat this baseline on the SKUs it claims to fit. If ETS does not beat naive on smooth A-class SKUs, ETS is not the answer for that segment — go investigate before tuning hyperparameters.
Run the matched method per segment, benchmark, and route
Run ETS / ARIMA on smooth segments, Prophet on erratic, Croston/TSB on intermittent and lumpy. For each SKU, compute WMAPE, bias, and the lift over the baseline. Route each SKU to whichever method shows the highest skill — sometimes the matched-family method wins, sometimes a simpler method wins for SKUs near the segmentation boundary. The routing is data-driven, not rule-driven.
Layer judgmental overrides with FVA tracking
Sales, marketing, and category-management overrides go on top of the statistical baseline as a separate layer — never replacing it. Track Forecast Value-Add per contributor: did their override improve or degrade accuracy? Surface this monthly. Overrides that consistently degrade accuracy are not approved; overrides that consistently improve it are honored. This single discipline prevents the consensus cycle from devolving into political negotiation.
Reconcile the forecast across the hierarchy
Forecasts at SKU level rarely sum cleanly to forecasts at family or region level — and the family-level forecast is usually more accurate (aggregation reduces noise). Use hierarchical reconciliation (top-down, bottom-up, or middle-out depending on the portfolio) so the SKU-level forecast respects the more reliable family-level signal. The MinT (minimum trace) reconciliation method from the Hyndman literature is a defensible default when no business reason favors a specific direction.
Run parallel for one full S&OP cycle before retiring the old forecast
Run the rebuilt forecast alongside the legacy forecast for one complete S&OP cycle (typically 4-5 weeks). Compare WMAPE, bias, and operational outcomes (shortages, expedites, dead stock). Only retire the legacy forecast after the rebuild has demonstrably outperformed on the metrics that matter. The parallel-run period is also when sales and finance build trust in the new number — without it, the rebuild is technically live but operationally rejected.
The single line that separates a rebuild that lands in 8-12 weeks from one that drags to 6+ months is whether the data audit happens first. Modeling on dirty data produces confident-looking forecasts that fail the moment they hit production. The audit is unglamorous, takes 1-2 weeks, and is the highest-leverage activity in the entire rebuild.
The 7-step playbook gates each phase on the previous one passing. Audit before classify, classify before model, baseline before benchmark, statistical before judgmental, SKU-level before hierarchical, parallel-run before retire. Skipping a phase compresses the schedule on paper but extends it in reality, because the work that was skipped surfaces later as a forecast nobody trusts.
The accuracy framework is where good rebuilds quietly fail. Reporting a single number — usually MAPE — is rigorous-sounding and operationally misleading. The discipline that holds up under S&OP review is reporting three metrics together, every cycle.
- MAPE (Mean Absolute Percentage Error)
MAPE is the average of the absolute percentage errors across all forecasts. Formula: MAPE = mean(|actual − forecast| / |actual|) × 100. Strength: intuitive and easy to communicate. Weakness: undefined when actual = 0 (a real problem on intermittent SKUs), and on a heterogeneous portfolio it weights every SKU equally — so a 50% miss on a $10 part counts the same as a 5% miss on a $100K part.
- WMAPE (Weighted Mean Absolute Percentage Error)
WMAPE weights each SKU's absolute error by its actual demand (or inventory value, or revenue). Formula: WMAPE = sum(|actual − forecast|) / sum(|actual|) × 100. Strength: the metric reflects what actually matters to the business — errors on high-volume or high-value SKUs count more. WMAPE is the default S&OP accuracy metric in mature demand-planning organizations precisely because it does not let small SKUs dominate the score.
- Forecast Bias
Bias is the signed (not absolute) average forecast error: bias = mean(forecast − actual). Positive bias means the forecast systematically over-predicts; negative bias means it systematically under-predicts. Bias is critical because absolute-error metrics (MAPE, WMAPE) hide directional patterns: a forecast can have low WMAPE while consistently over-forecasting, which translates directly into excess inventory. Tracking bias separately surfaces this.
| Metric | What it measures | When it wins | When it misleads |
|---|---|---|---|
| MAPE | Average of per-SKU percentage errors | Homogeneous portfolios; small datasets where every SKU is equal weight | Heterogeneous portfolios; intermittent demand (undefined when actual = 0); high-value SKUs get drowned out |
| WMAPE | Total absolute error divided by total demand | S&OP at scale; portfolios with mixed value and volume | Almost never — this is the default S&OP metric for a reason |
| Bias | Signed mean error (over- or under-forecasting) | Detecting systematic forecast skew that absolute metrics hide | On its own; bias near zero with high WMAPE is still a bad forecast — both must be tracked |
| Forecast Value-Add (FVA) | Whether each step in the forecasting process improved or degraded accuracy | Validating that judgmental overrides actually help; auditing the consensus cycle | When the baseline is poorly chosen — FVA is meaningful only against a defensible baseline (typically naive) |
| Tracking Signal | Cumulative bias normalized by mean absolute deviation | Real-time alerting when a forecast starts drifting | On its own; needs threshold tuning and pairs with bias for diagnosis |
The four-metric stack — WMAPE, bias, FVA, and a tracking signal — is what separates a forecast that is reportable from a forecast that is operationally trustworthy. WMAPE quantifies the magnitude of error in business terms. Bias surfaces direction. FVA validates the consensus cycle. The tracking signal alerts when a forecast that was working is starting to drift.
A forecast can hit a 15% WMAPE target every cycle and still produce shortages and excess inventory if the bias is consistently negative on A-class SKUs and positive on C-class. Aggregate accuracy hides segment-level skew. The accuracy framework that holds up in production reports WMAPE and bias by segment (A/B/C × demand pattern), not just by portfolio total.
S&OP accuracy is reported as a stack, not a number. WMAPE for magnitude (weighted by what the business actually cares about), bias for direction, FVA to validate that human overrides help rather than hurt, and tracking signals to catch drift in real time. Reporting MAPE alone is the most common reason a forecast looks accurate on the dashboard and produces operational problems on the floor.
A forecast that nobody acts on is academic exercise. The S&OP consensus process is the cross-functional cadence that turns the demand number into supply, inventory, and financial decisions. The model can be technically excellent and the operational outcomes still be poor if this layer is broken.
- S&OP Consensus Process
The S&OP consensus process is the monthly cross-functional cycle that produces a single, signed-off demand and supply plan. It typically runs as a 5-step cadence: data review, demand review (sales/marketing input), supply review (operations capacity check), pre-S&OP (resolve gaps), and executive S&OP (signoff and exception escalation). The output is one number — the consensus forecast — that finance, operations, and commercial all act on.
The conventional five-step cadence (sometimes described as the "5-step S&OP cycle"):
Data Review (Week 1)
Demand planning team publishes the statistical baseline forecast plus actuals from the previous month. WMAPE, bias, and FVA from prior overrides are reported. Sales, marketing, and finance review the data before the demand review meeting — not in it. This pre-work is what makes the rest of the cycle exception-based instead of line-by-line.
Demand Review (Week 2)
Sales, marketing, and category management review the statistical baseline and propose overrides backed by intelligence the model cannot see — promotional plans, pipeline movements, customer-specific events. Each override is logged with rationale and contributor (for FVA tracking). The output is the unconstrained demand forecast.
Supply Review (Week 3)
Operations and supply planning evaluate the unconstrained demand against capacity, lead times, and inventory positions. Gaps are quantified — capacity shortfalls, supplier risks, inventory imbalances. The output is a supply plan that meets the demand or a documented gap to escalate.
Pre-S&OP (Week 3-4)
Cross-functional resolution of gaps before the executive meeting. Trade-offs are quantified: expedite to meet demand vs accept stockout vs reduce demand commitment. Recommendations are prepared for executive signoff. This step prevents the executive S&OP from devolving into a debate.
Executive S&OP (Week 4)
Executive review and signoff of the consensus plan. Only unresolved exceptions and major trade-offs are escalated here. The output is the signed-off demand and supply plan that drives finance, operations, and commercial execution for the next cycle.
At 50 SKUs, line-by-line review works. At 700+, it does not — the meetings stretch to 4 hours and stakeholders silently disengage. The cadence that scales is exception-based: SKUs flagged by accuracy threshold breach, bias drift, or override conflict are reviewed in detail; everything else is signed off on the baseline. Exception-based review is the operational mechanic that lets a stat team and a sales team actually collaborate at portfolio scale.
The forecast becomes a decision through the S&OP consensus process — the 5-step monthly cadence (data, demand, supply, pre-S&OP, executive) that produces one signed-off number for finance, ops, and commercial to execute against. At scale, the cadence works only when reviews are exception-based: the bottom 80% of SKUs accept the baseline; the top 20% by impact get the cross-functional debate. Without this layer, the rebuilt model produces accuracy that nobody operationalizes.
After three roles' worth of demand-forecasting work across pharma, biotech, and semiconductor — and one full S&OP rebuild on a 700+ SKU portfolio — the mistakes that quietly degrade forecast quality are remarkably consistent.
The seven mistakes are remarkably consistent across organizations and industries. Each is preventable with the discipline already documented in the playbook above — segment before model, report WMAPE plus bias plus FVA, audit before classify, track override value-add, reconcile hierarchically, parallel-run before retire, and review by exception. None of these is technically novel; what makes them rare in practice is the discipline of doing all seven, every cycle.
- Full control over segmentation logic, method choice, and accuracy framework — tunable to the actual portfolio
- Lower long-term total cost of ownership once the team is built; no per-seat or per-SKU licensing fees
- The team owning the rebuild becomes the team that operates the forecast, which compresses learning cycles
- Tooling stays modular: ARIMA from statsmodels, Prophet from Meta, Croston from any time-series library, BI in Power BI or Tableau — all can be swapped without vendor lock-in
- Forecast logic is auditable end-to-end, which matters in regulated industries (pharma, biotech) where forecast inputs to financial planning are scrutinized
- Higher upfront investment: data engineering, modeling, BI, and S&OP cadence design are non-trivial
- Requires retaining domain talent; turnover in the demand-planning team can stall the operating cadence
- No vendor accountability when accuracy drifts; the team owns the failure mode
- Hierarchical reconciliation, FVA tracking, and exception-based review need to be built rather than configured
- Less defensible to executives unfamiliar with the build vs buy trade-off — vendor solutions are often easier to justify on the budget line even when in-house outperforms operationally
- 01S&OP forecasting at 700+ SKUs is a system problem, not a modeling problem — demand-pattern heterogeneity, data drift, process drift, and metric drift are the four silent failure modes a rebuild must address
- 02Demand-pattern segmentation (smooth, intermittent, erratic, lumpy) via ADI / CV² classification routes each SKU to the family of methods that fits its pattern — the highest-leverage change in any rebuild
- 03Method matching: ETS or ARIMA on smooth, Prophet on erratic, Croston/TSB on intermittent and lumpy, with naive as the baseline every method must beat
- 04Report WMAPE, bias, and FVA together — not MAPE alone — and report them by segment, not just portfolio total
- 05The 7-step rebuild playbook gates each phase on the previous: audit before classify, classify before model, baseline before benchmark, statistical before judgmental, SKU-level before hierarchical, parallel-run before retire
- 06The S&OP consensus process turns the forecast into decisions through a 5-step monthly cadence — and only scales when reviews are exception-based, not line-by-line
- 07The seven common mistakes (one-model-fits-all, MAPE only, skipping audit, unaudited overrides, no hierarchical reconciliation, no parallel run, exhaustive review) are preventable with the discipline of doing all seven preventives every cycle
How big does a portfolio need to be before segmentation matters?
Segmentation starts paying off around 50-100 SKUs and becomes essential past 200. Below 50, the overhead of maintaining multiple model pipelines may exceed the accuracy gain. Past 200, running one model across the portfolio leaves accuracy on the table for the SKUs that matter most. The 700+ SKU rebuild at Thermo Fisher sat squarely in the 'segmentation is essential' range.
Is Prophet better than ARIMA for S&OP demand forecasting?
Neither is universally better — they win on different demand patterns. Prophet excels on erratic, seasonal, or externally driven demand (promotions, holidays, events). ARIMA excels on smooth, autocorrelated demand without strong external drivers. Running them head-to-head on the same SKU and picking the winner is more useful than picking a global default. Both should always be benchmarked against a naive baseline.
What forecast horizon should S&OP target?
Typical S&OP horizon is 12-18 months in monthly buckets. The first 3 months drive operational decisions (orders, expedites, inventory); months 4-12 drive supply planning, capacity, and procurement contracts; months 13-18 drive financial planning and longer-term capacity decisions. Different horizons may need different methods — short-horizon forecasts often benefit from more reactive models, longer-horizon from more stable seasonal/trend models.
How do you handle new SKUs with no demand history?
Cold-start SKUs need a separate process: analog-based forecasting (use the demand profile of a similar existing SKU as the starting point), parameter overlays from product management's launch plan, and aggressive review cadence (weekly or bi-weekly) until the SKU accumulates 6-12 months of history and can transition to the standard segmented forecast. Forcing a new SKU into the standard pipeline produces forecasts that look statistical but are essentially fabricated.
What is the relationship between demand forecasting and inventory parameters (safety stock, reorder points)?
The forecast drives the inventory parameters: safety stock formulas use forecast error variance, reorder points use forecast lead-time demand, and economic order quantity uses forecast demand rate. A bad forecast cascades — the inventory parameters built on it are wrong, and the operational outcomes (stockouts and excess) follow. This is why the forecast rebuild is the upstream lever that needs to land before parameter recalibration is meaningful.
How often should the demand-pattern classification be refreshed?
Quarterly is the typical cadence for S&OP. Demand patterns shift over time — a SKU that was smooth a year ago can become lumpy after a customer change, product transition, or market event. Re-running ADI / CV² classification quarterly and re-routing SKUs to the appropriate method prevents the slow drift where 100 SKUs are silently being forecasted with the wrong method.
How do you quantify the business value of a forecast accuracy improvement?
Translate the WMAPE improvement into operational outcomes: reduced safety stock (because forecast variance is lower), reduced expedite frequency (because forecast bias is lower), reduced stockouts (because demand spikes are anticipated better), and reduced excess inventory (because forecasts are not systematically over). Each of these has a dollar value the finance team can quantify. The Thermo Fisher rebuild surfaced $180K in excess inventory partly because the rebuilt forecast exposed which SKUs had been chronically over-forecasted.
- 01Forecasting: Principles and Practice (3rd edition) — Rob J. Hyndman and George Athanasopoulos
- 02The accuracy of intermittent demand estimates — Aris A. Syntetos and John E. Boylan
- 03Forecasting: Methods and Applications (Croston's method) — J.D. Croston
- 04Prophet: forecasting at scale — Sean J. Taylor and Benjamin Letham (Meta)
- 05statsmodels: Time Series Analysis (ARIMA, ETS) — Statsmodels project
- 06Optimal combination forecasts for hierarchical time series (MinT reconciliation) — Shanika L. Wickramasuriya, George Athanasopoulos, Rob J. Hyndman
- 07ASCM (Association for Supply Chain Management) S&OP Reference — ASCM
- 08IBF (Institute of Business Forecasting & Planning) — IBF
- 09Forecast Value Added Analysis: Step by Step — Michael Gilliland (SAS)
- 10SAP Integrated Business Planning for Demand — SAP