Hermes upgrade plan

Status: working design document Date: 2026-04-19

This document captures the intended operating philosophy for Hermes, the upgrade phases, and the concrete control points we want to improve. It is meant to be a stable reference while tuning and refactoring the current decision logic.

Core operating philosophy

Strategy roles

Grid trader

Grid is the default operating mode when it makes sense.
Grid should stay active when the market offers harvestable noise.
Grid should not stay active in strong persistent trends that can run through the ladder and damage the wallet.
Breakouts are not automatically bad for grid. Grid can still profit during breakout conditions if pullbacks remain larger than the effective grid size.
The key question is not "is there movement?" but "is there enough two-way movement relative to grid spacing to harvest safely?"

Trend follower

Trend following should take over as early as reasonably possible when a real emerging trend is likely to persist and pullbacks are too small for grid harvesting.
Earlier trend handoff is better when the move is directional, persistent, and not offering sufficient pullback depth for the grid.
Trend detection should begin on the smallest timeframe first and then be confirmed upward across timeframes.
Time hierarchy matters: 1m first, then 5m, then 15m, then higher.

Rebalancer

Rebalancer has one purpose: restore the wallet into a state where grid can operate again.
In practical terms, that means bringing the wallet back toward a composition that supports orders on both sides.
Rebalancer is not an independent profit engine and should not linger longer than necessary.
Rebalancer should become active when the trend has eased enough that inventory repair is preferable to continued trend following.
Rebalancer should hand back to grid once the wallet is sufficiently usable and the market is again harvestable.

Key market distinctions Hermes must learn to make

1. Harvestable noise vs destructive trend

Hermes must distinguish between:

noise large enough for grid harvesting
emerging trend that may still be noisy enough for grid to survive
sustained persistent trend that will damage grid if not handed off early

This distinction should explicitly depend on:

pullback depth relative to grid size
persistence across timeframes
local auction behavior and easing
wallet composition and side capacity

2. Structural trend vs tactical behavior

Hermes should separate:

structural regime, what the market broadly is doing
tactical state, what price is doing right now
execution feasibility, what the wallet and ladder can safely support

Example:

Meso may still be bearish.
But 1m can already be ranging or easing.
That should not necessarily flip the structural view.
It can still justify releasing from trend or rebalancer into grid if local noise is again harvestable.

3. Timescale sequencing

Trend emergence should be treated as a sequence:

first visible on 1m
then seen on 5m
then 15m
then higher scopes

Trend easing should also be treated as a sequence:

first the 1m loses directional drive
then 5m shows reduced continuation quality or increased overlap
then broader scopes cool

Hermes should use that sequencing explicitly, not just indirectly through blended scores.

4. Harvestability depends on wallet state

Market structure alone is not enough. Harvestability also depends on:

wallet composition
whether both trade sides can be quoted
whether remaining ladder geometry is usable
whether current noise exceeds effective grid size
whether inventory skew makes one-sided fills too risky

Current upgrade phases

Phase 1, stabilize intent and document expected behavior

Purpose: define the intended behavior clearly enough to tune against it.

Phase 1 deliverables

Written operating philosophy for grid, trend, and rebalancer.
Clear description of harvestable noise vs destructive trend.
Clear statement that rebalancer exists only to repair wallet usability for grid.
Explicit acknowledgement that trend detection and trend easing begin on lower timeframes first.
Scenario list for later tests.

Phase 1 status

Mostly captured. The core philosophy is now documented in this file.

Open questions still worth formalizing:

What exact pullback-to-grid-size ratio should count as harvestable?
How much weight should 1m have relative to 5m for early trend detection?
What exact wallet state is "good enough" for grid resumption?
What should count as sufficient easing for trend -> rebalancer and rebalancer -> grid handoff?

Phase 2, instrument the current system

Purpose: expose why Hermes is making each decision before deeper refactoring.

Goals

Emit compact decision diagnostics for every cycle.
Show structural trend state, tactical state, harvestability estimate, wallet usability, and final transition reason.
Make it easy to compare what Hermes saw versus what Lukas expected.

Proposed diagnostics

structural_directionality
tactical_directionality
tactical_easing_state
breakout_persistence
pullback_vs_grid_size
grid_harvestability
wallet_grid_usability
rebalance_urgency
chosen_transition
blocked_transitions

Phase 3, separate the market model into explicit layers

Purpose: stop using blended strategy preference scores as if they were pure market truth.

Target layers

Structural layer

Answers:

Is the market directionally bullish, bearish, or rotational overall?
Is the regime persistent or fragile?

Inputs should mostly come from:

15m+
1h+
higher-level snapshot persistence

Tactical layer

Answers:

Is price currently impulsing, overlapping, easing, stretching, or reverting?
Is a breakout accelerating or fading?
Is local movement range-like enough to harvest?

Inputs should mostly come from:

1m
5m
internal short-horizon snapshots

Execution layer

Answers:

Can grid operate safely now?
Is the wallet usable on both sides?
Is trend continuation still operationally better than releasing?

Inputs should include:

wallet composition
side capacity
open order geometry
grid step size
pullback depth relative to step size

Phase 4, rewrite the transition logic as an explicit state machine

Purpose: make handoffs interpretable and tunable.

Desired transition model

GRID -> TREND
GRID -> REBALANCER
TREND -> REBALANCER
REBALANCER -> GRID

Optional but stricter:

avoid direct REBALANCER -> TREND unless there is a very specific reason
prefer TREND -> REBALANCER -> GRID as the normal unwind path

Transition design principle

Each transition should have explicit guards based on:

structural regime
tactical state
wallet usability
harvestability

Not just a single blended strategy score.

Phase 5, expose tuning screws explicitly

Purpose: make behavior adjustable without rewriting logic.

Example tuning screws

micro_early_trend_weight
micro_easing_weight
five_min_confirmation_weight
meso_persistence_weight
breakout_confirmation_threshold
breakout_release_threshold
grid_harvestability_threshold
pullback_to_grid_ratio_threshold
wallet_grid_resume_tolerance
rebalance_release_threshold

Phase 6, scenario-based test suite and replay tuning

Purpose: verify Hermes against real trading situations instead of only abstract scores.

Scenario families to test

clean trend emergence from 1m to 5m to 15m
false breakout that remains grid-harvestable
persistent trend with too-shallow pullbacks for grid
trend easing after asymmetric wallet buildup
rebalancer handing back too early
rebalancer handing back too late
local bottom or top where grid can resume harvesting after shift
one-sided wallet where harvestability is limited despite attractive price action

Architectural observations from the current code

Current opportunity_map origin

Today opportunity_map is generated in src/hermes_mcp/narrative_engine.py from coarse semantic labels such as:

stance
meso_structure
friction
micro_reversal_risk

This means it is already a compressed interpretation, not a direct market measure.

Current issue

Hermes currently risks confusing:

market truth with
strategy preference score

That makes tuning harder and can cause delayed or sticky handoffs.

Immediate next recommended actions

Repair the currently failing Hermes tests.
Add decision diagnostics so current behavior is visible.
Introduce explicit metrics for:
- structural trend strength
- tactical easing
- pullback depth vs grid size
- grid harvestability
- wallet grid usability
Rewrite rebalancer release logic to depend on those explicit metrics.
Then rewrite grid-to-trend handoff using the same model.

Notes for future tuning sessions

When evaluating a bad Hermes decision, ask these questions in order:

Did Hermes correctly read the structural regime?
Did Hermes correctly read the tactical short-term behavior?
Did Hermes correctly estimate whether noise was harvestable relative to grid size?
Did Hermes correctly assess wallet usability?
Did Hermes choose the right transition for that combination?

If the answer fails earlier in the chain, later threshold tuning will not solve it.

UPGRADE_PLAN.md 9.2 KB Historia Raaka

Hermes upgrade plan

Core operating philosophy

Strategy roles

Grid trader

Trend follower

Rebalancer

Key market distinctions Hermes must learn to make

1. Harvestable noise vs destructive trend

2. Structural trend vs tactical behavior

3. Timescale sequencing

4. Harvestability depends on wallet state

Current upgrade phases

Phase 1, stabilize intent and document expected behavior

Phase 1 deliverables

Phase 1 status

Phase 2, instrument the current system

Goals

Proposed diagnostics

Phase 3, separate the market model into explicit layers

Target layers

Structural layer

Tactical layer

Execution layer

Phase 4, rewrite the transition logic as an explicit state machine

Desired transition model

Transition design principle

Phase 5, expose tuning screws explicitly

Example tuning screws

Phase 6, scenario-based test suite and replay tuning

Scenario families to test

Architectural observations from the current code

Current opportunity_map origin

Current issue

Immediate next recommended actions

Notes for future tuning sessions

UPGRADE_PLAN.md 9.2 KB

Historia Raaka