Web — Tech News

EN

Fine-Tuning Llama 3.2 3B on Medical QA: Week 4 - When Lower Loss Meant a Worse Model

What Happened This Week Week 3 produced a working fine-tuned model: one epoch, one dataset, a clear improvement over the base model. This week 4 was s…

ai deeplearning machinelearning finetuning

EN

AI vs ML vs DL — the Nested Circles That Finally Make It Click

"AI", "machine learning", "deep learning", "GenAI" — used interchangeably every day, and it's wrong. Here's the single picture that fixes it forever. …

ai machinelearning beginners deeplearning

EN

I published two IEEE Access papers as an undergrad, one on IoT security, one on cancer detection

I'm a 6th semester CS student at COMSATS University Islamabad. Over the past few months I've been doing deep learning research alongside my coursework…

computerscience cybersecurity deeplearning iot

EN

Flash Attention: what it does and why it matters

Flash Attention: what it does and why it matters Your training job is paying for an A100 at $3/hour. The loss is going down, gradients are flowing, an…

llm ai deeplearning gpu

EN

How to Become a Data Scientist in 2026

How I got here On principle, you will never catch me parading myself as a some sort of expert data scientist. Technically, that's what I do in my day …

ai datascience machinelearning deeplearning

EN

Understanding Attention in Transformers — Intuition Before Equations

When people first hear about Transformers, they often encounter words like Query, Key, Value, and Attention Heads and feel confused. But the main idea…

beginners deeplearning machinelearning nlp

EN

PyTorch from Scratch — Part 1: Tensors, Gradients & Activations

Most people use PyTorch without really knowing what's happening underneath. This series breaks the foundations down into the simplest possible explana…

pytorch python deeplearning beginners

EN

A11: A Structural Answer to AI Collapse

Modern AI models are becoming increasingly powerful, but their growing capabilities come with rising risks of degradation: the loss of rare patterns, …

ai machinelearning deeplearning systemdesign

EN

NVIDIA Cosmos 3: Unifying Physical AI Reasoning and Generation with Two-Tower Architecture

Training a robot to pick up an object sounds simple until you realize how many separate systems are involved: a vision model to understand the scene, …

ai machinelearning deeplearning robotics

EN

NVIDIA Cosmos 3: How a Two-Tower Architecture Unifies Physical AI Reasoning and Generation

Training a robot to pick up an object sounds simple until you realize how many separate systems are involved: a vision model to understand the scene, …

ai machinelearning deeplearning robotics

EN

I distilled a 7B vision model into a 2B one for screenshots — and the 7B teacher scored worse

Code: https://github.com/P0rt/vlm-distill-screenshots Model: https://huggingface.co/p00rt/qwen2-vl-2b-screenshots-distill There's a question I keep co…

machinelearning python llm deeplearning

EN

ai, deepseek, machinelearning

title: The Rise of China's LLMs: A Complete History from 2017 to 2026 published: ture description: From Wu Dao 2.0 (1.75T params) to DeepSeek V3 ($5.6…

ai deeplearning llm machinelearning

EN

Time When More Layers Meant Worse Model ... Birth Of Residual

class TinyTransformer(nn.Module): def __init__(self): super().__init__() # setting the constructor for the initial values that we are every gonna need…

ai deeplearning firstprinciple

EN

The Machine Learning Engineering Series

Part 1: From Scratch to Systems . This machine learning series will be a real ride. It’s an interactive journey where I’ll be sharing and raising lots…

ai machinelearning deeplearning software

EN

VLA or IL? A Controlled Dataset for Testing Whether Finetuning Turns Your VLA into a Fancy Imitation Learner

Motivation Robot manipulation is the ability of a robot to interact with and manipulate objects in the physical world, such as grasping objects, movin…

ai data deeplearning machinelearning

EN

The Math Behind Neural Networks — Explained Like Nobody Did for Me 🧨

How does a neural network actually learn to be less wrong? Not the hand-wavy version. The real one. The one with the derivative, the chain rule, and t…

ai machinelearning datascience deeplearning

EN

AlphaEvolve: Google DeepMind's Gemini-Powered Evolutionary Coding Agent

Inside AlphaEvolve: How Neural Networks and Evolutionary Algorithms Are Self-Optimizing Software For several years, the role of Artificial Intelligenc…

ai machinelearning deeplearning programming

EN

Handling Non-Stationary Time Series: Building a Probabilistic Engine with XGBoost & Python

If you have ever tried to apply Machine Learning to financial time series, you know the heartbreak of the "perfect backtest." You build a model, train…

ai python datascience deeplearning

EN

SANA-WM in 5 quick facts

SANA-WM is worth watching for one reason: it combines longer video generation with explicit camera control. Five quick facts: It is an open-source 2.6…

ai deeplearning opensource performance

EN

When Chaos Wins: Adding Noise Improved My Snake AI's Stability

Greetings all! Continuing the series where I build Rainbow DQN one component at a time on Snake. The first post covered encoding, the second covered m…

machinelearning deeplearning ai chaos

EN

Chain-of-Thought and Beyond: How LLMs Actually Learn to Reason

"The ability to reason step-by-step is not just a feature. It might be the difference between a language model that sounds intelligent and one that ac…

ai llm machinelearning deeplearning

EN

XGBoost: When Gradient Boosting Meets Regularization

1. The Problem It Solves Imagine you’re a loan officer at a bank. You have thousands of past loan applications with features like income, credit score…

machinelearning ai deeplearning datascience

EN

How I Built a Free Voice AI Pipeline Using Whisper, LLaMA 3.1 & Groq

I recently built VoiceIQ — a complete Voice AI pipeline that listens to your voice, thinks using an LLM, and speaks the answer back. The best part? It…

ai python deeplearning machinelearning

EN

Why does paying more make your LLM reply faster?

Why does Claude respond faster when you pay more? And why does a longer conversation cost disproportionately more than a short one? For the longest ti…

ai discuss llm deeplearning

EN

Stop Guessing Which Weights Your Neural Network Actually Learned: Deterministic Initialization That Tracks Every Change

The Problem Nobody Talks About You've spent hours training your neural network. The loss converged, metrics look good, and you're ready to deploy. But…

machinelearning python neuralnetworks deeplearning

EN

Generation 1 — Standalone Models (2018–2022)

The Foundation of Modern AI Systems When people think of tools like ChatGPT, they often assume the intelligence comes from a single powerful system th…

ai deeplearning llm nlp

EN

The Paper That Taught Neural Networks to Learn Backwards

Last week I read the 1958 Rosenblatt paper. The one that started everything. The Perceptron, the first learning machine, the idea that memory lives in…

ai machinelearning deeplearning backpropogation

EN

How Deep Learning Architectures Evolved — From DNNs to Transformers

Deep learning architectures are not random model names. DNN, CNN, RNN, and Transformer each appeared because data has different structure. Images need…

machinelearning ai deeplearning neuralnetworks

EN

Removing PER From Rainbow DQN Set a New Snake AI World Record

Greetings all! Quick context: this is part of an ongoing series where I'm building Rainbow DQN one component at a time on Snake and measuring what eac…

ai deeplearning machinelearning cnn

EN

🔬 AI for Scientific Discovery in the Real World: What Gemma 4 Changes The Moment AI Leaves the Chat Window

This is a submission for the Gemma 4 Challenge: Write About Gemma 4 🔬 AI for Scientific Discovery in the Real World: What Gemma 4 Changes The Moment A…

devchallenge gemmachallenge gemma deeplearning