Veratama Tech

How to use SmolLM3 ?

Photo by Aedrian Salazar (URL: https://unsplash.com/@aedrian) on Unsplash (URL: https://unsplash.com/)

So, Hugging Face just dropped SmolLM3. And no, the name isn’t ironic. It’s a 3-billion parameter model, but it behaves like it’s twice that beating models with 7B params as well.

My new book on “Model Context Protocol” is out now

Model Context Protocol: Advanced AI Agents for Beginners (Generative AI books)

Amazon.com: Model Context Protocol: Advanced AI Agents for Beginners (Generative AI books) eBook : Gupta, Mehul, Sen…

www.amazon.com

(URL: https://www.amazon.com/gp/product/B0FCCF348X)

What’s New?

Training Setup

SmolLM3 saw 11.2 trillion tokens. That’s serious volume. The data mix was layered: web content, codebases, math problems, reasoning tasks.
Then they did what most skip: midtraining. 140B tokens purely on reasoning. After that: supervised fine-tuning and APO (Anchored Preference Optimization) for alignment.

This matters because the model doesn’t just repeat patterns. It can follow logic, work through multi-step problems, and stay on topic longer than usual at this size.

Extended Thinking Mode

You can switch on a “thinking” mode. It makes the model reason step by step instead of jumping to an answer. It actually affects performance on tasks like math or graduate-level QA.

If you want fast answers: leave it off.
If you want clarity on how the answer came to be: turn it on.

Multilingual in nature

Six languages are officially supported: English, French, Spanish, German, Italian, Portuguese. The model was trained on real data in each, not just “token-aligned” like many others.

It also saw smaller amounts of Arabic, Chinese, and Russian. Performance drops a bit there, but still better than most at this size.

Long, Very Long Context

This thing handles 64k tokens natively. And with YaRN extrapolation, it stretches to 128k without breaking. That’s enough for long documents, transcripts, books and even more.

Llama3 8B still struggles with long-context consistency. SmolLM3, at 3B, doesn’t.

Tool Calling Support

SmolLM3 can call tools. You give it tools either as JSON or code-style. It can invoke functions, handle parameters, and give back structured responses.

This makes it usable in agent workflows basic automation, retrieval systems, API calling bots without needing wrappers or third-party glue code.

Benchmarks
It is ruling the benchmarks, beating some models double its size quite easily

GSM-Plus (math) : 83.4
GPQA (graduate reasoning) : 41.7
Tool Calling Benchmark (BFCL) : 88.8
Global MMLU (multilingual QA) : mid-60s in top languages
LiveCodeBench v4 (programming) : 30.0
AIME 2025 (math olympiad level) : 36.7 with reasoning on

In a lot of these, it holds up against Qwen 4B and Llama3 8B. That’s not marketing. The raw numbers are in the model card.

Deployment Options

SmolLM3 runs on:

transformers
vLLM
SGLang
llama.cpp , ONNX , MLC for local/edge

Quantized versions are up on Hugging Face. Doesn’t need 80GB of VRAM to run. Good for laptops, edge devices, or single-GPU setups.

What It’s Good For

Many things

People building agents
Summarization at scale
Long-doc QA
Code reasoning
Anyone needing a small model with actual depth

Not for You If

You need GPT-4-level depth across everything
You want flashy conversation or polished small talk
You’re doing hardcore multilingual generation beyond the supported six

How to use SmolLM3?

The weights are open-sourced and available on HuggingFace

HuggingFaceTB/SmolLM3-3B · Hugging Face

We're on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

(URL: https://huggingface.co/HuggingFaceTB/SmolLM3-3B)

The codes are also available here

from transformers import AutoModelForCausalLM, AutoTokenizer

checkpoint = "HuggingFaceTB/SmolLM3-3B"

tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint)

tools = [
    {
        "name": "get_weather",
        "description": "Get the weather in a city",
        "parameters": {"type": "object", "properties": {"city": {"type": "string", "description": "The city to get the weather for"}}}}
]

messages = [
    {
        "role": "user",
        "content": "Hello! How is the weather today in Copenhagen?"
    }
]

inputs = tokenizer.apply_chat_template(
    messages,
    enable_thinking=False, # True works as well, your choice!
    xml_tools=tools,
    add_generation_prompt=True,
    tokenize=True,
    return_tensors="pt"
)

outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

Wrap Up

SmolLM3 is a real small model. No games. Full training stack open. Real reasoning. Long context. Tool calling.

No API lock. No paywall. No fake “open-weight” nonsense.

If you’re looking for a model that just does the work, it’s one of the few in this range that actually delivers.

Source URL: https://medium.com/data-science-in-your-pocket/smollm3-the-best-small-llm-for-everything-3fa53713ebb7

Detail Blog

SmolLM3 : The best small LLM for everything

How to use SmolLM3 ?

What’s New?

Training Setup

Extended Thinking Mode

Multilingual in nature

Long, Very Long Context

Tool Calling Support

Deployment Options

What It’s Good For

How to use SmolLM3?

Wrap Up

‘0’ Komentar

Tinggalkan Komentar

Kategori

Post Populer

Tag

Perusahaan

Layanan Kami

Newsletter