AI & ML — Tech News

EN

Tokenization under the hood: BPE, WordPiece, SentencePiece, and Unigram compared

Tokenization under the hood: BPE, WordPiece, SentencePiece, and Unigram compared You deploy a chatbot. English queries average 42 tokens each. Then a …

tokenization llm ai nlp

EN

Samiksha AI: Universal Review & comment Analyzer

Hey DEV Community! I recently participated in a hackathon and built Samiksha AI , a universal review and comment analyzer designed to turn messy custo…

ai nlp python showdev

EN

Translating 'I missed you' so it doesn't land like a form letter

I was trying to tell someone something real in her first language — not "I missed you" from a dropdown, but the version that sounds like a person said…

nlp node showdev sideprojects

EN

Is Siri AI? How Apple's Voice Assistant Really Works

Apple finally gave Siri the kind of upgrade people have been asking for, on and off, for years. The new Siri AI is not just better speech recognition …

ai ios news nlp

EN

The hard part of an AI quiz generator isn't the questions — it's the wrong answers

If you wire an LLM up to "write me 10 multiple-choice questions about photosynthesis," you'll get something that looks great in the demo and falls apa…

ai llm nlp edtech

EN

CareerPilot AI:AI Resume Analyzer

In the modern job market, hiring managers and talent acquisition teams face an overwhelming influx of job applications. For a single opening, hundreds…

machinelearning python nlp webdev

EN

Can LLMs save themselves from verbosity?

« Je n'ai fait celle-ci plus longue que parce que je n'ai pas eu le loisir de la faire plus courte. » — Blaise Pascal, Lettres provinciales , Lettre X…

ai nlp

RU

Питон по Пассову: как NLP помог сделать Python понятным гуманитариям

О том, почему вводный курс Python для гуманитариев лучше начинать не с калькуляторов и абстрактных циклов, а с текста, частотности и осмысленных иссле…

nlp python

EN

I Built the Resume-vs-JD Scorer Every ATS Uses — In 30 Lines of JavaScript

🌐 Live demo: https://dev48v.infy.uk/solve/day1-resume-jd-match.html Day 1 of SolveFromZero — pick a real hackathon problem, ship the working solution.…

javascript nlp beginners hackathon

EN

Understanding Attention in Transformers — Intuition Before Equations

When people first hear about Transformers, they often encounter words like Query, Key, Value, and Attention Heads and feel confused. But the main idea…

beginners deeplearning machinelearning nlp

EN

The Context Compression Pattern

Pattern Defined Precise Definition: Context Compression is an inference pattern that utilizes a specialized "selector" model or a ranker to distill la…

ai architecture rag nlp

EN

Bridging the Rigidity Gap: Deploying Secure Agentic RAG in Healthcare Governance

In the healthcare industry, data is both an organization's most valuable asset and its most heavily guarded liability. While industries like e-commerc…

ai rag nlp automation

EN

The Macro Failure of "One-Size-Fits-None" Reporting: Why Healthcare Providers Fail to Act on Patient Feedback - Part I

Every month, healthcare jurisdictions pool millions of dollars into collecting Patient-Reported Experience Measures (PREMs). Millions of text files an…

ai nlp visualization automation

RU

Почему WER недостаточно: Семантическая декомпозиция ошибок ASR

В продуктах, построенных поверх моделей распознавания речи (Automatic Speech Recognition models, ASR), качество распознавания речи напрямую влияет на …

wer asr ner nlp речевые технологии распознавание речи whisper машинное обучение Оценка качества моделей речь в текст

RU

Эволюция 'More Like This'

Во многих поисковых сценариях пользователь начинает не с пустой строки запроса, а с существующего результата. Пользователь открывает статью и хочет на…

nlp обработка естественного языка векторный поиск оптимизация производительности полнотекстовый поиск семантический поиск ранжирование поиска tf-idf bm25

EN

Prompting styles - Basic

Query which we ask the LLM is referred to as prompt. The way in which we provide prompt to LLM makes a difference and there are different ways to to p…

ai beginners llm nlp

EN

I built an AI contract review and reader tool for plain-language contract understanding

I recently launched SpotClause, a small AI contract review and reader tool. The idea came from a simple problem: contracts are often difficult to read…

ai nlp productivity showdev

RU

Лингвистика + статистика = NLP

Как-то так получилось, что я NLP-инженер, который закончил Московский государственный лингвистический университет. Мне нравится ковыряться в коде и мо…

nlp nlp обработка текста нлп история

EN

Dual Encoder vs Cross-Encoder: Why Your RAG Pipeline Needs Both

My RAG pipeline looked fine on paper. Fast retrieval. Decent cosine scores. But when I tested it with real queries, the top results were always a litt…

nlp python rag tutorial

EN

How I Built Semantic Discussion Clustering Without Embeddings (and Why It Was Good Enough)

The Problem I wanted to monitor discussions around products, bugs and trends across communities. Examples: Reddit Hacker News GitHub Issues Forums Mos…

machinelearning nlp showdev sideprojects

EN

I Built a Multilingual Spam Detection Dataset with 149K+ Messages Across 23 Languages

Spam detection datasets are surprisingly bad once you move outside English. Most public datasets are: tiny, outdated, English-only, SMS-only, or missi…

data machinelearning nlp showdev

RU

Современные морфоанализаторы русского языка: от словарей к нейросетям

В статье «Извлечение и обработка требований из документов с помощью NLP-инструментов » я уже показывал, как переход от LLM к NLP-библиотекам помогает …

морфологический анализ морфологический словарь nlp

EN

Day 6 - Embedding - RAG

In the previous post, we saw what chunking is and the various methdologies of chunking. In this post, we are going to see the next stage of the RAG pi…

ai nlp rag tutorial

RU

Ask.com закрылся. Что это был за сервис? Вспоминаем 90-е

На главной странице Ask.com недавно появилось короткое уведомление: сервис официально прекратил работу 1 мая 2026 года. Компания-владелец решила закры…

selectel ask.com история интернета поисковые системы nlp веб-археология ask jeeves

RU

Кастомный пайплайн BERTopic: как кластеризовать тексты и получить интерпретируемые темы с помощью LLM

Привет, Хабр! Меня зовут Антон и я занимаюсь задачами NLP в компании Ростелеком Информационные технологии. Если вам приходилось разбирать большие масс…

кластеризация bertopic llm hdbscan nlp umap

RU

Как мы пытаемся снизить возвраты животных из приютов с помощью NLP

Четыре года я была волонтером в приюте. Самое тяжелое — видеть «вернувшихся» животных. Ещё вчера у них был дом, а сегодня снова клетка. В России 3,6 м…

nlp llm животные приюты стартап машинное обучение социальные проекты волонтерство рекомендательные системы

EN

Hybrid Search Blueprint Series: Semantic Boosting

This article was written by Erik Hatcher . This is the third and final article of this hybrid search series. First, we surveyed the (hybrid) search la…

ai algorithms mongodb nlp

RU

Извлечение и обработка требований из документов с помощью NLP-инструментов

Приветствую всех читателей Хабр. Думаю, многим знаком этот сценарий: появляется задача — и первая мысль: «скормлю все LLM, она разберётся». Поначалу п…

nlp spacy

EN

Day 4 - Chunking continued - RAG

Semantic Chunking Lets Consider two paragraphs A and B, focussing on strings in python. para A focus on typecasting and para B focus on accessing char…

ai nlp python rag

EN

Generation 1 — Standalone Models (2018–2022)

The Foundation of Modern AI Systems When people think of tools like ChatGPT, they often assume the intelligence comes from a single powerful system th…

ai deeplearning llm nlp