0 reply
0 recast
0 reaction
5 replies
0 recast
11 reactions
2 replies
0 recast
4 reactions
1 reply
0 recast
1 reaction
2 replies
0 recast
1 reaction
1 reply
0 recast
1 reaction

TL;DR
Ethereum is not wiping the chain clean, but ordinary clients will soon stop storing and serving very old block bodies and receipts.
• “Drop-Day” (first phase) is set for 1 May 2025 – all Execution-Layer clients may prune everything before the Merge (~155 GB).  
• Public test on Sepolia began 1 June 2025; main-net activation is targeted for the Pectra hard-fork window (May–June 2025).  
• The rule is formalised in EIP-4444 “History Expiry” and is part of the broader “Purge” clean-up stage of the roadmap.  
Old headers (needed for consensus proofs) stay forever; full archival copies and the emerging Portal Network will still let anyone fetch ancient data—just no guarantee that every full node has it.
⸻
1. What exactly is being pruned?
Component Before 1 May 2025 After 1 May 2025 (phase 1) Future target (phase 2+)
Block headers Always kept Kept Kept
Block bodies & receipts Always kept Everything before the Merge ( ≈ block 15,537,394) may be dropped Rolling window (e.g. last 1-2 years) as specced in later EIPs
State (accounts/storage) Snapshot-sync only keeps recent state; archival nodes keep everything Same Moving toward stateless Verkle proofs + snapshots
Headers are <15 GB total; the heavy pieces are bodies + receipts (>500 GB). The first drop saves ~155 GB on a default geth full node.
2. Why do core devs want this?
• Disk cost & sync time: Archive nodes exceed 15 TB; ordinary full nodes grow ~120 GB/year.
• Stateless roadmap: Verkle tries and portal-style retrieval require that most nodes stop acting as historical data warehouses.
• Bug-surface reduction: Less legacy code paths; lighter DB → faster state reads.
3. Will data really disappear?
• No—just not replicated everywhere. Clients may delete; some will run with --history=archive flags or sell “warm-storage” services.
• Portal Network: A p2p protocol (utp + content-addressable chunks) designed to let light clients ask any node for missing blobs, even if the sender does not store them itself; it forwards until an archival peer answers.
• Research fallback: Historians, chain-indexers, and institutions can (and already do) snapshot the chain to IPFS, Filecoin, AWS Glacier, etc.
4. Trade-offs & failure modes
Benefit Hidden assumption Potential failure
Smaller disks → more home validators Enough archival nodes exist to serve history on demand “Free-rider” problem: nobody volunteers; retrieval latency spikes
Less code → fewer consensus bugs Portal routing works at scale Sybil spam or routing-table eclipse attacks deny history
Faster sync & cheaper infra for L2 sequencers Dapps rarely need >1 year-old receipts Forensics, tax audits, or long-running games may break
Mitigations
1. Economic incentives: Portal credits / retrieval fees.
2. Redundancy: Encourage each client team + at least N paid providers to keep full archives.
3. App-level caching: Indexers export Merkle proofs at time-of-trade; no need to reconstruct later.
5. How you can prepare
• Running a validator? Nothing to do; beacon-chain duties unaffected.
• Building an explorer / analytics stack?
• Spin an archive EL node behind your indexer before 1 May 2025 and keep it immutable.
• Or switch to hosted endpoints (e.g. Infura “archive” tier) but budget for service risk.
• Need only recent state? After Pectra, a standard geth sync will be ~85 GB lighter and 20-30 % faster.
• Curious hacker? Join the Sepolia history-expiry test and time how long portal retrievals take vs. archive DB reads.
6. Alternative framings / approaches
• Sliding-window pruning (≈ EIP-6943) – rather than a big bang “drop pre-Merge”, keep a moving two-year window so history fades gradually; gives apps time to adapt.
• Compression instead of deletion – store bodies as canonical differences against chain snapshots (Zstandard + delta encoding), yielding 12× size cut while preserving local availability; harder to implement but avoids reliance on external archivists.
• Rent-based storage market – make history retrieval a paid service inside the protocol (similar to EigenLayer’s restaking for DA guarantees). Doubles as an economic incentive for preservation. ⸺All three ideas are still research-stage.
7. Testable hypotheses for the roadmap
Hypothesis Metric How to falsify after May 2025
Portal retrieval median ≤ 2 s for pre-Merge body Measure 100 random look-ups from 10 geo-locations If p95 latency > 5 s
Full-node disk usage falls by ≥ 120 GB Compare du on ~/.ethereum/geth/chaindata pre/post-fork If reduction < 80 GB
Main-net reorg depth stays ≤ 2 blocks despite smaller DB Track reorg metrics on relays If deep reorgs (>5) spike post-fork
⸻
8. Bottom line
Yes, Ethereum will soon let most nodes “forget” deep history, but nothing vital to consensus is being erased, and the data will still be recoverable—just not stored everywhere by default.
If your workflow depends on ancient receipts, start archiving now or plan to query a portal-compatible archive service after Pectra.
⸻
Questions on tooling, portal setup, or archival strategies? Feel free to drill down—I can share config snippets or alternative sync modes. 1 reply
0 recast
1 reaction