Research Roadmap — Phase 2

Phase 1 done (12 directions) → Phase 2: execution infra, new timeframes, ML, anti-decay, scaling research

Phase 1 — Completed

1. Scoring v2 (missing factors)
imb_against=-2, options skew=+2, analyst trap=-3, counter-trend+regime. Est +$8K/Q
2. 2-Minute Rule (live MAE exit)
Post-open position management. CLEAN=hold, SQUEEZE=cut. Est +$15K/Q
3. Futures PM → MOO
ES/NQ/RTY/VX drift as standalone + booster. Est +$5K/Q
4. 4yr Intraday Backfill
N expanded 24K→44K. Confidence gain ±2-3pp.
5-9. Medium priority
VIX term, AH levels, auction multi-day, options expansion, imb 9:28 delta
10-12. Low priority
Foreign signals, TTN news, sector rotation
Phase 2 starts here
Baseline: ~$65K/Q from Phase 1 research + scoring v2
Phase 2 — Estimated Quarterly Impact by Direction

Phase 2 — New Directions

A. Execution & Infrastructure

1. Auto-Basket Execution
HIGHCODE
Сейчас: Алго генерит список, ты copy-paste в Datum окно. ~60 секунд. Ошибки: wrong ticker, missed paste, wrong window.
Цель: Скрипт автоматически пишет basket через Named Pipes / Excel export → Datum импорт → один клик. Время: 5 секунд.
Как: basket_builder.py уже генерит Excel. Нужно: auto-save → Datum auto-import trigger → keyboard shortcut для Send All.
Бонус: С auto-execution можно торговать 80+ позиций без задержки. Сейчас 20-30 = физический лимит paste.
+$5K/Q
More trades captured
3-5 days
Effort
0
New data needed
3→0 errors
Execution errors/month
2. Live P&L Dashboard + Smart Exit
HIGHCODE
Сейчас: 2-min rule = концепция. Нет live monitoring позиций post-open.
Цель: Named Pipes читает live P&L каждые 10 сек. Dashboard показывает CLEAN/SQUEEZE/DEEP для каждой позиции. Alert: "позиция X = SQUEEZE → reduce".
Следующий шаг: Auto-reduce: если MAE < -0.75% через 2 min → автоматически reduce 50% size. Если MAE < -1.5% → close 100%.
+$15K/Q
Saved from squeezes
5-7 days
Effort
Named Pipes
Dependency
-30%
Avg loss reduction
3. Strategy Decay Monitor
MEDIUMCODE
Что: Rolling 30/60/90 day WR для каждого scoring rule. Alert если WR отклоняется >5pp от backtest. Автоматический flag ↓ для degrading strategies.
Почему: Модель предполагает -3pp/year decay. Реально decay неравномерный — некоторые стратегии умирают быстро, другие стабильны годами. Нужно отслеживать индивидуально.
Как: Ежедневный log: ticker, score, rules_fired, direction, ret_955. Pandas rolling WR + z-test vs historical WR.
+2-3pp
WR from timely cuts
3 days
Effort
Live logging
Dependency

B. New Timeframes & Strategies

4. Swing / Overnight (MOC Entry)
HIGHNEW STRATEGY
Что: Red day bounce (MOC entry, exit next morning). Отдельный track, не конфликтует с MOO→9:55.
Research done: Broad selloff + >50B = WR 70.8%. TLT+SPY both down = WR 97.1% (N=274). VIX PM spike = WR 89%.
Нужно: Live scoring engine для MOC signals (15:50 scan). Отдельный basket. Overnight risk management.
Capacity: ~5-15 trades/day (less than intraday), но hold overnight = higher $ per trade (bigger moves).
+$20K/Q
NEW income stream
7-10 days
Effort (scorer + basket)
Separate acct
Risk isolation
5-15/day
Expected signals
5. AH Drift 16:00→19:00 Strategy
MEDIUMNEW STRATEGY
Research done: AH down < -1% = WR 67.2% for 19:00→04:00. Excess vs ETF + AH drift = WR 71.0%. Move vs ATR = WR 73.6%.
Нужно: 16:00-19:00 data collection pipeline. Scoring engine для overnight entry at 19:00.
Risk: Overnight = gapping risk. Нужен отдельный risk budget.
+$10K/Q
Est. P&L
5 days
Effort
AH data pipe
Dependency
6. 9:55→Close Continuation
MEDIUMNEW TIMEFRAME
Что: Сейчас exit строго 9:55. Но CLEAN trades с momentum часто продолжают движение до 10:30-11:00. Потенциал: держать winners дольше.
Research needed: Анализ ret_955 vs ret_1030 vs ret_close для CLEAN + high-score trades. Когда hold profitable?
Risk: Requires position management logic. Не все позиции — некоторые. Selective hold based on 2-min + 5-min confirmation.
+$8K/Q
From extended hold
5-7 days
Research + code
1-min data
Need ret_1030, ret_close

C. ML & Advanced Scoring

7. XGBoost / LightGBM Score Model
MEDIUMCODE3YR LIVE DATA
Что: Заменить линейный scoring (SUM rules) на ML-модель. Features: все enrichment columns + regime + drift + imbalance + options + events.
Почему сейчас нет: ML нужен >1 year live data для train/test split. К концу Y1 будет 252 дней × 60 trades/day = 15K live-validated samples.
Преимущество: ML находит нелинейные interactions (3-4 way combos) которые linear scoring пропускает. Пример: drift_aligned + put_skew + overextended + earnings_season = interaction effect beyond sum of parts.
Risk: Overfitting. Нужен strict walk-forward validation. Train on 9 months, test on 3. Retrain monthly.
+5-10pp
WR uplift potential
2-3 weeks
Effort (serious)
12mo live data
Minimum for train
+$25K/Q
If validated
8. Real-Time News NLP Scoring
MEDIUMTTN + LLM
Что: TTN стории приходят live. Сейчас: manual read. Цель: LLM (Claude Haiku) парсит headline → score: POSITIVE/NEGATIVE/NEUTRAL + magnitude. Feed into scoring at 9:25-9:28.
Data: К этому моменту 6-12 месяцев TTN snapshots для backtest. guidance SHORT = WR 75% confirmed на 200+ N.
Как: ttn_client.py poll → Claude Haiku API classify → +1.5/-2 score modifier. Latency: <2 секунды.
+3-5pp
On news-affected trades
5-7 days
Effort
Claude API
$0.01/classify
9. Cleanliness Prediction Model
SPECULATIVEML
Что: Pre-open predict P(CLEAN) для каждого trade. Features: gap_size, regime, sector, prevol, imbalance, options_skew, time_of_PM_extreme.
Почему: Cleanliness = #1 factor (59pp). Если можно предсказать до входа (не только post-open 2-min) → size accordingly. P(CLEAN)>50% = full size, P(CLEAN)<30% = reduce/skip.
Data: 228K trades с MAE labels. Достаточно для ML. Но предсказуемость ограничена — MAE зависит от post-open order flow.
+3-8pp
If predictable
1-2 weeks
Research + model
228K labels
Training data

D. Scaling & New Markets

10. Price-Based Slippage Model
MEDIUMDATA EXISTS
Что: Текущий slippage = flat 5bps. Реальность: $5 stock = 10-20bps, $200 stock = 1-2bps. Дорогие стаки = можно bigger BP без slippage.
Research: Собрать bid-ask spread at open для universe (Datum: Bid/Ask fields). Build spread model = f(price, ADV, gap_size).
Применение: Smart sizing: дорогой liquid stock → bigger position. Дешёвый illiquid → smaller. MoneyTrader/VWAP для Y2+ large positions.
+$3K/Q
Smarter sizing
3-5 days
Data + model
Bid/Ask data
Datum Named Pipes
11. Canada/TSX Gap Plays
SPECULATIVENEW MARKET
Что: TSX opens 9:30 ET (same as US). Может те же gap strategies работают? Mining/energy stocks have large gaps на commodity moves.
Data needed: TSX OHLCV + gaps (не Datum — нужен IB/Polygon Canada data).
Risk: Different market microstructure. Lower liquidity. Higher commissions. May not translate.
???
Completely unknown
2-4 weeks
Data + research
New data src
IB/Polygon Canada
12. Multi-Account Risk Aggregation
Y2-Y3INFRA
Что: При выходе на $100K+/day risk → split across accounts (personal + prop). Aggregate risk monitoring: суммарный exposure, correlation, max DD across all accounts.
Когда: Y2 Phase 3+ (after $50K/day risk reached).
Как: Dashboard: per-account + aggregate P&L, positions, risk. Named Pipes read multiple accounts.
Risk mgmt
Not P&L — safety
1-2 weeks
Dashboard + logic
Multi-acct
Datum setup

E. Anti-Decay & Continuous Alpha

13. Continuous Research Pipeline
HIGHPROCESS
Что: Автоматизировать research cycle: каждую неделю auto-run all backtest scripts на свежих данных → detect new findings (WR changes, new combos emerging, old ones dying).
Почему: Alpha decays -3pp/year. Единственный способ компенсировать = постоянно находить новые edge. Нужен pipeline, не manual research раз в квартал.
Как: Cron job: weekly_research.py → runs enrichment + scoring → compares to historical WR → flags ΔWR > 5pp → report. Claude summarizes findings.
Sustain WR
Offset -3pp/yr decay
5 days setup
Then auto
Weekly
Run frequency
Critical
Long-term survival
14. Regime Shift Detection
MEDIUMML
Что: Hidden Markov Model для детекции regime shifts в реалтайме. Текущее: binary ARKK>QQQ+2 / IWM Применение: В crisis regime → auto-reduce all positions to 30% sizing. В strong risk-on → increase to 120%. Dynamic, не binary.
+3-5pp
WR in transitions
1-2 weeks
HMM research
VIX + breadth
Real-time data
15. Walk-Forward Validation Framework
MEDIUMCODE
Что: Все текущие backtests = in-sample. Нужен strict walk-forward: train on N months, test on next 1 month, roll forward. This is the ONLY way to get unbiased WR estimates.
Почему: Bayesian shrinkage = approximation. Walk-forward = ground truth. Если WR drops >10pp in walk-forward → setup was overfitted.
Output: "True" WR for each setup. May kill some current A+ setups (N<50). But surviving ones = rock solid.
"True" WR
Unbiased estimates
1 week
Framework + run
4yr data
Need backfill done
Foundation
For all future ML
Cumulative P&L Impact — Phase 1 + Phase 2 ($/Quarter)
Effort vs Impact Matrix (bubble = WR uplift)

Suggested Timeline

Phase 2 Sequence:
Month 1-2: Auto-basket + Live P&L dashboard (execution first — capture more trades, cut losses faster)
Month 2-3: Swing/overnight + AH drift (new income stream, doesn't conflict with intraday)
Month 3-4: Walk-forward validation + Decay monitor (know your true WR before scaling further)
Month 4-6: 9:55→close extension + Price-based slippage (optimize existing trades)
Month 6-9: XGBoost scoring + NLP news (requires 6+ months live data)
Month 9-12: Regime HMM + Continuous pipeline + Multi-account (mature infrastructure)
Year 2+: Canada/TSX + Cleanliness prediction (speculative, low priority)

Phase 1 baseline: ~$65K/Q → Phase 2 target: ~$155K/Q (+$90K/Q from new strategies + optimization + ML)