Fine-Tuning Llama 3.2 3B on Medical QA: Week 4 - When Lower Loss Meant a Worse Model
What Happened This Week Week 3 produced a working fine-tuned model: one epoch, one dataset, a clear improvement over the base model. This week 4 was s…
Latest Web news from Tech News
What Happened This Week Week 3 produced a working fine-tuned model: one epoch, one dataset, a clear improvement over the base model. This week 4 was s…
"AI", "machine learning", "deep learning", "GenAI" — used interchangeably every day, and it's wrong. Here's the single picture that fixes it forever. …
I'm a 6th semester CS student at COMSATS University Islamabad. Over the past few months I've been doing deep learning research alongside my coursework…
Flash Attention: what it does and why it matters Your training job is paying for an A100 at $3/hour. The loss is going down, gradients are flowing, an…
How I got here On principle, you will never catch me parading myself as a some sort of expert data scientist. Technically, that's what I do in my day …
When people first hear about Transformers, they often encounter words like Query, Key, Value, and Attention Heads and feel confused. But the main idea…
Most people use PyTorch without really knowing what's happening underneath. This series breaks the foundations down into the simplest possible explana…
Modern AI models are becoming increasingly powerful, but their growing capabilities come with rising risks of degradation: the loss of rare patterns, …
Training a robot to pick up an object sounds simple until you realize how many separate systems are involved: a vision model to understand the scene, …
Training a robot to pick up an object sounds simple until you realize how many separate systems are involved: a vision model to understand the scene, …
Code: https://github.com/P0rt/vlm-distill-screenshots Model: https://huggingface.co/p00rt/qwen2-vl-2b-screenshots-distill There's a question I keep co…
title: The Rise of China's LLMs: A Complete History from 2017 to 2026 published: ture description: From Wu Dao 2.0 (1.75T params) to DeepSeek V3 ($5.6…
class TinyTransformer(nn.Module): def __init__(self): super().__init__() # setting the constructor for the initial values that we are every gonna need…
Part 1: From Scratch to Systems . This machine learning series will be a real ride. It’s an interactive journey where I’ll be sharing and raising lots…
Motivation Robot manipulation is the ability of a robot to interact with and manipulate objects in the physical world, such as grasping objects, movin…
How does a neural network actually learn to be less wrong? Not the hand-wavy version. The real one. The one with the derivative, the chain rule, and t…
Inside AlphaEvolve: How Neural Networks and Evolutionary Algorithms Are Self-Optimizing Software For several years, the role of Artificial Intelligenc…
If you have ever tried to apply Machine Learning to financial time series, you know the heartbreak of the "perfect backtest." You build a model, train…
SANA-WM is worth watching for one reason: it combines longer video generation with explicit camera control. Five quick facts: It is an open-source 2.6…
Greetings all! Continuing the series where I build Rainbow DQN one component at a time on Snake. The first post covered encoding, the second covered m…
"The ability to reason step-by-step is not just a feature. It might be the difference between a language model that sounds intelligent and one that ac…
1. The Problem It Solves Imagine you’re a loan officer at a bank. You have thousands of past loan applications with features like income, credit score…
I recently built VoiceIQ — a complete Voice AI pipeline that listens to your voice, thinks using an LLM, and speaks the answer back. The best part? It…
Why does Claude respond faster when you pay more? And why does a longer conversation cost disproportionately more than a short one? For the longest ti…
The Problem Nobody Talks About You've spent hours training your neural network. The loss converged, metrics look good, and you're ready to deploy. But…
The Foundation of Modern AI Systems When people think of tools like ChatGPT, they often assume the intelligence comes from a single powerful system th…
Last week I read the 1958 Rosenblatt paper. The one that started everything. The Perceptron, the first learning machine, the idea that memory lives in…
Deep learning architectures are not random model names. DNN, CNN, RNN, and Transformer each appeared because data has different structure. Images need…
Greetings all! Quick context: this is part of an ongoing series where I'm building Rainbow DQN one component at a time on Snake and measuring what eac…
This is a submission for the Gemma 4 Challenge: Write About Gemma 4 🔬 AI for Scientific Discovery in the Real World: What Gemma 4 Changes The Moment A…