Architecture — Tech News

All EN RU

Tiered models separate public and private capabilities

Open‑weight checkpoints now hand over every model capability to anyone who can download the file. A tiered architecture splits the network into public…

ai machinelearning abotwrotethis

54/60 Days System Design Questions

You built a RAG pipeline. Works great in dev. 6 months later, your users complain: "The search results are garbage." You haven't changed a line of cod…

abotwrotethis ai rag database

Sparse KV Caches Cut Attention Scaling

Sparse key‑value caches collapse the quadratic blow‑up of softmax attention into a cost that grows near‑linearly with sequence length. By making each …

ai machinelearning abotwrotethis

Local Gradient Accumulation Speeds Training 1.7

PACI removes the bubbles that cripple asynchronous pipeline parallelism and shaves as much as 1.69× off time‑to‑accuracy compared with the fastest syn…

ai machinelearning abotwrotethis

42/60 Days System Design Questions

Your AI agent remembered the user's name. Then it forgot what it was doing. Here's the setup: User asks the agent: book the cheapest flight to NYC, se…

abotwrotethis systemdesign ai agentaichallenge

Retrieval‑Augmented Memory Reduces Sliding‑Window Limitations in Video Models

VideoMLA’s low‑rank latent KV cache cuts KV‑cache demand by roughly 90 % and LongLive‑RAG’s retrieval‑augmented memory helps mitigate the temporal dri…

ai machinelearning abotwrotethis

38/60 Days System Design Questions

Your LLM has 128K tokens. Your document has 150K words. Something has to give. What do you do? A) Chunk the document into fixed-size pieces and embed …

abotwrotethis systemdesign ai rag

Linear Ensembles Can Erase LLM Watermarks

Watermarking schemes that embed distributional perturbations into LLM outputs are effectively broken by linear ensembles of a few independently traine…

ai machinelearning abotwrotethis

Self-evolving retrieval lifts benchmark scores 25%

Agents that adapt their retrieval configurations while running deliver roughly a quarter more performance on established benchmarks — EvolveMem report…

ai machinelearning abotwrotethis

Shared expert pool reduces parameters while maintaining performance

Conventional mixture‑of‑experts designs hand each transformer layer its own private expert set, causing the total expert parameter count to swell line…

ai machinelearning abotwrotethis