What is the Paper-Fill-Simulator?

The Paper-Fill-Simulator is Treeova's high-fidelity simulated execution engine for paper trading. It models limit-order fill conditions, intrinsic-value fallbacks for options when market data is unreliable, and phantom-fill protection that prevents the simulator from booking gains or losses on prices the engine cannot trust.

What is phase-aware success classification?

Phase-aware success classification means trade outcomes are interpreted in the context of the lifecycle phase in which they occurred (entry, management, exit), rather than purely on terminal PnL. This produces an RL signal that distinguishes a well-managed losing trade from a poorly-managed winning trade.

How does regime-segmented learning work?

The platform maintains alpha/beta calibration parameters segmented by detected market regime, so that the system's confidence in a strategy under one regime does not overwrite its confidence in a different regime. Bayesian-style updates incorporate new observations into the regime-specific parameters as outcomes are recorded.

Are these results predictive of live performance?

No. Paper trading results, even with high-fidelity simulation, are not predictive of live performance. Real-world execution introduces slippage, partial fills, broker outages, halts, gaps, and behavioral pressures that no simulator fully captures. Reported aggregate paper statistics describe the simulated environment only.

What does this whitepaper deliberately withhold?

It withholds the reinforcement-learning reward function weights, the specific phase-classification thresholds, the regime-detection internals, the per-pass model assignments, and the exact agent prompts. These constitute the platform's competitive surface.

What are the explicit limitations?

Past performance does not guarantee future results. Paper-trading fills are simulated. Slippage assumptions are conservative but not adversarial. Regime detection is inherently lagged. RL calibration converges only with sufficient sample density per regime — sparse regimes carry larger uncertainty bands.

Paper Trading & RL Calibration Methodology Whitepaper

WP-10 — Methodology Note: Paper Trading Backtesting & RL Calibration

A methodology and limitations note describing how Treeova evaluates AI trading agents in its paper environment and how its reinforcement-learning calibration loop updates regime-segmented expectations from observed outcomes. Past performance does not guarantee future results; paper-trading fills are simulated. Reward weights, classifier thresholds, and regime-detection internals are intentionally withheld.

Authored by Treeova Research· Research CollectiveUpdated 2026-04-18

#1. Overview

This note documents how Treeova evaluates the behavior of AI trading agents in its paper environment, and how the platform's reinforcement-learning calibration loop turns observed outcomes into regime-segmented updates. It is intentionally a methodology and limitations document, not a performance brochure.

Numbers reported anywhere on the platform under "paper trading performance" describe a simulated environment. They are useful for comparing strategies and detecting regressions; they are not predictive of live performance.

#2. Paper-Fill-Simulator

The Paper-Fill-Simulator is the engine that models executed fills inside the paper environment. Its commitments are:

Limit-order fidelity. Limit orders fill only when the simulated quote conditions actually justify a fill — never on an unfilled-price assumption.
Intrinsic-value fallback. When market data for an options contract is unreliable, settlement falls back to intrinsic value rather than booking a phantom price.
Phantom-fill protection. Fills derived from stale or one-sided quotes are refused. The simulator will not crystallize a number it cannot trust.

#3. Phase-Aware Success Classification

Trade outcomes are not classified purely by terminal PnL. Each trade is segmented into lifecycle phases — entry, management, exit — and the reinforcement-learning signal interprets the outcome of each phase in the context appropriate to it. A trade that lost money but exited cleanly under deteriorating conditions can still contribute a positive management-phase signal; a trade that made money on a managed position that should have been closed earlier can still produce a negative management-phase signal.

The point of this design is to give the calibration loop the information it actually needs to improve agent behavior, rather than reducing every outcome to the single bit of "won/lost."

#4. Regime-Segmented Bayesian Calibration

The platform maintains alpha/beta calibration parameters segmented by detected market regime. New observations update the parameters for the regime in which they occurred, in a Bayesian-style posterior update, without overwriting the parameters of unrelated regimes.

The practical consequence: confidence in a strategy under "high volatility, mean-reverting" is tracked separately from confidence in the same strategy under "low volatility, trending." Cross-regime contamination — where a streak in one regime inflates expectations in another — is structurally prevented.

#5. How Aggregate Statistics Are Reported

When the platform surfaces aggregate paper-trading statistics, those statistics are computed from the same audited event log used by the calibration loop. They are scoped to the simulator environment and include explicit dating so a reader can tell the window the statistics describe.

Aggregate statistics are descriptive, not predictive. A reader should treat them as a record of how strategies behaved under the observed regimes during the observed window — not as a forecast of how they will behave next week.

#6. What This Methodology Withholds

By design, this whitepaper does not disclose:

The reinforcement-learning reward function or its weights.
The specific thresholds used by the phase-success classifier.
The regime-detection algorithm and its parameters.
The per-pass model assignments inside Treeova's intelligence stack.
Exact agent prompt strings.

#7. Limitations & Disclaimers

Past performance does not guarantee future results. This applies to every paper-trading number Treeova publishes.
Paper-trading fills are simulated. Slippage assumptions are conservative but not adversarial; live execution introduces slippage, partial fills, broker outages, halts, and gaps that cannot be fully modeled.
Regime detection is inherently lagged. Calibration updates that depend on regime classification carry the same lag.
The reinforcement-learning loop converges only with sufficient sample density per regime. Sparse regimes carry materially larger uncertainty bands than dense ones.
Nothing in this whitepaper is investment advice. Trading options involves substantial risk of loss; users should review Treeova's risk disclosures.