Hermes upgrade plan
Status: working design document
Date: 2026-04-19
This document captures the intended operating philosophy for Hermes, the upgrade phases, and the concrete control points we want to improve. It is meant to be a stable reference while tuning and refactoring the current decision logic.
Core operating philosophy
Strategy roles
Grid trader
- Grid is the default operating mode when it makes sense.
- Grid should stay active when the market offers harvestable noise.
- Grid should not stay active in strong persistent trends that can run through the ladder and damage the wallet.
- Breakouts are not automatically bad for grid. Grid can still profit during breakout conditions if pullbacks remain larger than the effective grid size.
- The key question is not "is there movement?" but "is there enough two-way movement relative to grid spacing to harvest safely?"
Trend follower
- Trend following should take over as early as reasonably possible when a real emerging trend is likely to persist and pullbacks are too small for grid harvesting.
- Earlier trend handoff is better when the move is directional, persistent, and not offering sufficient pullback depth for the grid.
- Trend detection should begin on the smallest timeframe first and then be confirmed upward across timeframes.
- Time hierarchy matters: 1m first, then 5m, then 15m, then higher.
Rebalancer
- Rebalancer has one purpose: restore the wallet into a state where grid can operate again.
- In practical terms, that means bringing the wallet back toward a composition that supports orders on both sides.
- Rebalancer is not an independent profit engine and should not linger longer than necessary.
- Rebalancer should become active when the trend has eased enough that inventory repair is preferable to continued trend following.
- Rebalancer should hand back to grid once the wallet is sufficiently usable and the market is again harvestable.
Key market distinctions Hermes must learn to make
1. Harvestable noise vs destructive trend
Hermes must distinguish between:
- noise large enough for grid harvesting
- emerging trend that may still be noisy enough for grid to survive
- sustained persistent trend that will damage grid if not handed off early
This distinction should explicitly depend on:
- pullback depth relative to grid size
- persistence across timeframes
- local auction behavior and easing
- wallet composition and side capacity
2. Structural trend vs tactical behavior
Hermes should separate:
- structural regime, what the market broadly is doing
- tactical state, what price is doing right now
- execution feasibility, what the wallet and ladder can safely support
Example:
- Meso may still be bearish.
- But 1m can already be ranging or easing.
- That should not necessarily flip the structural view.
- It can still justify releasing from trend or rebalancer into grid if local noise is again harvestable.
3. Timescale sequencing
Trend emergence should be treated as a sequence:
- first visible on 1m
- then seen on 5m
- then 15m
- then higher scopes
Trend easing should also be treated as a sequence:
- first the 1m loses directional drive
- then 5m shows reduced continuation quality or increased overlap
- then broader scopes cool
Hermes should use that sequencing explicitly, not just indirectly through blended scores.
4. Harvestability depends on wallet state
Market structure alone is not enough.
Harvestability also depends on:
- wallet composition
- whether both trade sides can be quoted
- whether remaining ladder geometry is usable
- whether current noise exceeds effective grid size
- whether inventory skew makes one-sided fills too risky
Current upgrade phases
Phase 1, stabilize intent and document expected behavior
Purpose: define the intended behavior clearly enough to tune against it.
Phase 1 deliverables
- Written operating philosophy for grid, trend, and rebalancer.
- Clear description of harvestable noise vs destructive trend.
- Clear statement that rebalancer exists only to repair wallet usability for grid.
- Explicit acknowledgement that trend detection and trend easing begin on lower timeframes first.
- Scenario list for later tests.
Phase 1 status
Mostly captured. The core philosophy is now documented in this file.
Open questions still worth formalizing:
- What exact pullback-to-grid-size ratio should count as harvestable?
- How much weight should 1m have relative to 5m for early trend detection?
- What exact wallet state is "good enough" for grid resumption?
- What should count as sufficient easing for trend -> rebalancer and rebalancer -> grid handoff?
Phase 2, instrument the current system
Purpose: expose why Hermes is making each decision before deeper refactoring.
Goals
- Emit compact decision diagnostics for every cycle.
- Show structural trend state, tactical state, harvestability estimate, wallet usability, and final transition reason.
- Make it easy to compare what Hermes saw versus what Lukas expected.
Proposed diagnostics
- structural_directionality
- tactical_directionality
- tactical_easing_state
- breakout_persistence
- pullback_vs_grid_size
- grid_harvestability
- wallet_grid_usability
- rebalance_urgency
- chosen_transition
- blocked_transitions
Phase 3, separate the market model into explicit layers
Purpose: stop using blended strategy preference scores as if they were pure market truth.
Target layers
Structural layer
Answers:
- Is the market directionally bullish, bearish, or rotational overall?
- Is the regime persistent or fragile?
Inputs should mostly come from:
- 15m+
- 1h+
- higher-level snapshot persistence
Tactical layer
Answers:
- Is price currently impulsing, overlapping, easing, stretching, or reverting?
- Is a breakout accelerating or fading?
- Is local movement range-like enough to harvest?
Inputs should mostly come from:
- 1m
- 5m
- internal short-horizon snapshots
Execution layer
Answers:
- Can grid operate safely now?
- Is the wallet usable on both sides?
- Is trend continuation still operationally better than releasing?
Inputs should include:
- wallet composition
- side capacity
- open order geometry
- grid step size
- pullback depth relative to step size
Phase 4, rewrite the transition logic as an explicit state machine
Purpose: make handoffs interpretable and tunable.
Desired transition model
- GRID -> TREND
- GRID -> REBALANCER
- TREND -> REBALANCER
- REBALANCER -> GRID
Optional but stricter:
- avoid direct REBALANCER -> TREND unless there is a very specific reason
- prefer TREND -> REBALANCER -> GRID as the normal unwind path
Transition design principle
Each transition should have explicit guards based on:
- structural regime
- tactical state
- wallet usability
- harvestability
Not just a single blended strategy score.
Phase 5, expose tuning screws explicitly
Purpose: make behavior adjustable without rewriting logic.
Example tuning screws
- micro_early_trend_weight
- micro_easing_weight
- five_min_confirmation_weight
- meso_persistence_weight
- breakout_confirmation_threshold
- breakout_release_threshold
- grid_harvestability_threshold
- pullback_to_grid_ratio_threshold
- wallet_grid_resume_tolerance
- rebalance_release_threshold
Phase 6, scenario-based test suite and replay tuning
Purpose: verify Hermes against real trading situations instead of only abstract scores.
Scenario families to test
- clean trend emergence from 1m to 5m to 15m
- false breakout that remains grid-harvestable
- persistent trend with too-shallow pullbacks for grid
- trend easing after asymmetric wallet buildup
- rebalancer handing back too early
- rebalancer handing back too late
- local bottom or top where grid can resume harvesting after shift
- one-sided wallet where harvestability is limited despite attractive price action
Architectural observations from the current code
Current opportunity_map origin
Today opportunity_map is generated in src/hermes_mcp/narrative_engine.py from coarse semantic labels such as:
- stance
- meso_structure
- friction
- micro_reversal_risk
This means it is already a compressed interpretation, not a direct market measure.
Current issue
Hermes currently risks confusing:
- market truth
with
- strategy preference score
That makes tuning harder and can cause delayed or sticky handoffs.
Immediate next recommended actions
- Repair the currently failing Hermes tests.
- Add decision diagnostics so current behavior is visible.
- Introduce explicit metrics for:
- structural trend strength
- tactical easing
- pullback depth vs grid size
- grid harvestability
- wallet grid usability
- Rewrite rebalancer release logic to depend on those explicit metrics.
- Then rewrite grid-to-trend handoff using the same model.
Notes for future tuning sessions
When evaluating a bad Hermes decision, ask these questions in order:
- Did Hermes correctly read the structural regime?
- Did Hermes correctly read the tactical short-term behavior?
- Did Hermes correctly estimate whether noise was harvestable relative to grid size?
- Did Hermes correctly assess wallet usability?
- Did Hermes choose the right transition for that combination?
If the answer fails earlier in the chain, later threshold tuning will not solve it.