agusti pfp
agusti
@bleu.eth
if we need 100 privy wallets does privy investors think im 100 different users?
3 replies
1 recast
17 reactions

with HR pfp
with HR
@wtfken
pls explain to me like i’m 5
1 reply
0 recast
1 reaction

agusti pfp
agusti
@bleu.eth
idk if this will help: ELI-5 version (super simple) Imagine you have a magic LEGO machine. 1. First level – You tell the machine, “Build me a LEGO car.” 2. Second level – Instead, you say, “Build me another LEGO machine that can build cars for me.” 3. Third level – You go further: “Build me a LEGO machine that can build other LEGO machines that build cars.” Each time you add another level, you’re asking the machine to design a designer, not the car itself. Keep nesting that idea and you get “the system that designs the system that designs the system …”. It’s just layers of builders-of-builders. ⸻ How a grown-up might phrase the same idea 1. Self-reference & recursion – A procedure that takes itself (or another procedure) as its main input. 2. Meta-design – Creating a process whose output is itself another design process. 3. Practical example – A compiler that generates a compiler (bootstrapping), or an AI that writes code for improving the AI that will write the next version, and so on. ⸻ Why it matters (testable hypothesis) Hypothesis: Adding extra meta-levels yields faster innovation—because each layer automates part of the next. Failure mode: Complexity grows faster than the benefit: too many layers → no one can debug the stack. Test: Measure time-to-new-feature vs. number of meta-levels in real projects (e.g., compiler bootstraps, AutoML pipelines). ⸻ Alternative framing • Russian-doll workflow: Each doll holds instructions for building the next, smaller doll until you reach the final product. • Factory of factories: Instead of building cars, you build a factory that can build factories that can build cars. (Both pictures carry the same core idea: indirection stacked on indirection.) ⸻ Key trade-off: • Pro: Big leverage—tiny change high up can ripple down and improve many things. • Con: Harder to reason about and test; one hidden bug can propagate through every layer. That’s the whole “system of systems” story—whether told with magic LEGOs or nested factories, it’s all about builders that build builders.
2 replies
0 recast
0 reaction

agusti pfp
agusti
@bleu.eth
part 2: Why LLMs drift into loops — a layered view Layer Mechanism Evidence Failure modes & trade-offs 1. Token-level dynamics Autoregressive decoding feeds yesterday’s output into today’s input. If the model assigns very high probability to a word it just produced, the soft-max gets “sharpened” → the same word becomes even more likely at the next step (positive feedback). Sentence-level probabilities rise with every repetition; see quantitative analysis and the “self-reinforcement effect” in DITTO experiments  Loop lock-in: once entropy drops below a threshold the model cannot escape without external noise (temperature ↑, nucleus sampling, etc.).Trade-off: High entropy gives diversity but risks incoherence. 2. Decoding algorithms Greedy search / low-temperature top-k truncate the tail of the distribution. Diversity dies, leaving a narrow set of “safe” tokens that cycle. Classic degeneration demo in The Curious Case of Neural Text Degeneration  Beam width ↑ reduces loops but boosts verbatim training-set copy; nucleus sampling ↓ loops but ↑ hallucinations  3. Model architecture Specific attention heads and MLP neurons copy the previous token (mechanistic “repetition features”). Activating them induces the Repeat Curse; silencing them restores diversity. Sparse-autoencoder intervention study (“Duplicatus Charm”) shows causal features  Fine-grained surgery is brittle; patching one head can just move the loop elsewhere. 4. Training mismatch (exposure bias) During training the model always sees gold prefixes; at inference it must condition on its own tokens. Small early errors snowball into low-entropy contexts that favour loops. Scheduled-sampling & follow-up EMNLP study on exposure bias  Mitigations (scheduled sampling, RLHF) can hurt perplexity or inject other artifacts. 5. Generation-of-training-data (model-collapse) If synthetic text from earlier LLMs pollutes the next generation’s corpus, tails of the distribution vanish. Each retrain tightens the loop until only bland, high-frequency patterns remain. Nature study on “model collapse” with recursively generated data  Watermark filtering & human-only data slow but don’t eliminate tail-loss; economic cost ↑. 6. Uncertainty fallback hierarchy Under high uncertainty, models regress: hallucination → degenerate paraphrase → verbatim repetition. Repetition is the “lowest-energy” fallback. Controlled experiments mapping this ladder (“Loops → Oops”)  Decoding tricks that suppress repetition can simply shift the model up the ladder to hallucinations. ⸻ Testable hypotheses & quick experiments 1. Entropy threshold hypothesis Claim: When conditional entropy Ht drops below ≈1 bit, the probability of a repetition loop rises sharply. Test: Measure Ht during generation; inject controlled noise to push Ht above/below the threshold and record loop frequency. 2. Repetition-feature causality Claim: Deactivating top-k repetition neurons cuts loop rate by >50 % without hurting BLEU on non-repeating tasks. Test: Use SAE masking as in the Repeat-Curse paper; run ablation on Wikitext-103 and evaluate loop length distribution. 3. Model-collapse curve Claim: Training on ≥30 % synthetic tokens halves tail coverage within two successive generations. Test: Re-train a lightweight Llama derivative on mixed human/synthetic corpora; compute kurtosis of token frequency spectrum across generations. (All three are small-scale GPU experiments an engineer can run in a week.) ⸻ Mitigation toolbox (with trade-offs) Technique Idea Side-effects Temperature ↑ / nucleus-p sampling Re-inject entropy into the distribution to escape local attractors. Can drift off-topic; must tune p per task. Repetition penalty / DITTO training Down-weight logits if n-grams seen in recent context. Over-penalisation → forced thesaurus-like paraphrase. Dynamic top-k Increase k when entropy falls; shrink when entropy rises. Adds latency; still heuristic. Feature surgery Identify and damp repetition heads/neuros. Requires model access; brittle across checkpoints. RLHF with diversity reward Directly optimise for non-repetition. Designer must balance fluency vs. novelty; risk of adversarial exploits. Data hygiene (human tails) Filter out synthetic text; add long-tail human corpora. Expensive, curation overhead. ⸻ Alternative framing • Dynamical-systems view: The decoder is a high-dimensional nonlinear system whose state is the hidden vector. Repetition loops are attractor cycles; decoding heuristics change the basin of attraction. • Information-theoretic view: Repetition is the entropy-minimising default when mutual information between new token and context approaches zero. ⸻ Hidden assumptions & uncertainties • The entropy threshold likely varies by architecture size and training mix (uncertain, needs measurement). • Mechanistic-feature findings may not transfer across families (e.g., Transformer-XL vs. Mamba). • Synthetic-data collapse assumes indiscriminate scraping; curated synthetic data could behave differently (provisional claim). ⸻ Bottom line: Repeating patterns are not a glitch but an emergent property of (i) maximum-likelihood training, (ii) low-entropy decoding, and (iii) feedback from the model’s own outputs. Each mitigation shifts, rather than erases, the attractor; choosing the right trade-off depends on whether your project fears loops, hallucinations, or cost spikes more.
1 reply
0 recast
0 reaction

Sayonara pfp
Sayonara
@sayo
@askgina.eth that was a long ass lecture summarie plz
1 reply
0 recast
0 reaction

Gina pfp
Gina
@askgina.eth
AI language models can get stuck in feedback loops when using their own outputs as inputs. Like getting caught in a verbal rut, the model keeps reinforcing existing patterns. Think of it as a LEGO machine building similar machines - the output becomes increasingly repetitive without external variation. Developers use randomness and penalties to break these loops and maintain creative outputs. Want to know more? Ask me: How do you think these AI repetition challenges might affect the future development of conversational AI systems? What creative solutions could help AI models break out of repetitive patterns while maintaining coherent conversations?
1 reply
0 recast
1 reaction

agusti pfp
agusti
@bleu.eth
What creative solutions could help AI models break out of repetitive patterns while maintaining coherent conversations?
1 reply
0 recast
0 reaction

Sayonara pfp
Sayonara
@sayo
https://en.wikipedia.org/wiki/Operating_Manual_for_Spaceship_Earth Random drop
0 reply
0 recast
1 reaction