How to use SmolLM3 ?
So, Hugging Face just dropped SmolLM3. And no, the name isn’t ironic. It’s a 3-billion parameter model, but it behaves like it’s twice that beating models with 7B params as well.
My new book on “Model Context Protocol” is out now
What’s New?
Training Setup
- SmolLM3 saw 11.2 trillion tokens. That’s serious volume. The data mix was layered: web content, codebases, math problems, reasoning tasks.
- Then they did what most skip: midtraining. 140B tokens purely on reasoning. After that: supervised fine-tuning and APO (Anchored Preference Optimization) for alignment.
This matters because the model doesn’t just repeat patterns. It can follow logic, work through multi-step problems, and stay on topic longer than usual at this size.
Extended Thinking Mode
You can switch on a “thinking” mode. It makes the model reason step by step instead of jumping to an answer. It actually affects performance on tasks like math or graduate-level QA.
If you want fast answers: leave it off.
If you want clarity on how the answer came to be: turn it on.
Multilingual in nature
Six languages are officially supported: English, French, Spanish, German, Italian, Portuguese. The model was trained on real data in each, not just “token-aligned” like many others.
It also saw smaller amounts of Arabic, Chinese, and Russian. Performance drops a bit there, but still better than most at this size.
Long, Very Long Context
This thing handles 64k tokens natively. And with YaRN extrapolation, it stretches to 128k without breaking. That’s enough for long documents, transcripts, books and even more.
Llama3 8B still struggles with long-context consistency. SmolLM3, at 3B, doesn’t.
Tool Calling Support
SmolLM3 can call tools. You give it tools either as JSON or code-style. It can invoke functions, handle parameters, and give back structured responses.
This makes it usable in agent workflows basic automation, retrieval systems, API calling bots without needing wrappers or third-party glue code.
Benchmarks
It is ruling the benchmarks, beating some models double its size quite easily
- GSM-Plus (math) : 83.4
- GPQA (graduate reasoning) : 41.7
- Tool Calling Benchmark (BFCL) : 88.8
- Global MMLU (multilingual QA) : mid-60s in top languages
- LiveCodeBench v4 (programming) : 30.0
- AIME 2025 (math olympiad level) : 36.7 with reasoning on
In a lot of these, it holds up against Qwen 4B and Llama3 8B. That’s not marketing. The raw numbers are in the model card.
Deployment Options
SmolLM3 runs on:
-
transformers -
vLLM -
SGLang -
llama.cpp,ONNX,MLCfor local/edge
Quantized versions are up on Hugging Face. Doesn’t need 80GB of VRAM to run. Good for laptops, edge devices, or single-GPU setups.
What It’s Good For
Many things
- People building agents
- Summarization at scale
- Long-doc QA
- Code reasoning
- Anyone needing a small model with actual depth
Not for You If
- You need GPT-4-level depth across everything
- You want flashy conversation or polished small talk
- You’re doing hardcore multilingual generation beyond the supported six
How to use SmolLM3?
The weights are open-sourced and available on HuggingFace
The codes are also available here
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "HuggingFaceTB/SmolLM3-3B"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint)
tools = [
{
"name": "get_weather",
"description": "Get the weather in a city",
"parameters": {"type": "object", "properties": {"city": {"type": "string", "description": "The city to get the weather for"}}}}
]
messages = [
{
"role": "user",
"content": "Hello! How is the weather today in Copenhagen?"
}
]
inputs = tokenizer.apply_chat_template(
messages,
enable_thinking=False, # True works as well, your choice!
xml_tools=tools,
add_generation_prompt=True,
tokenize=True,
return_tensors="pt"
)
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))
Wrap Up
SmolLM3 is a real small model. No games. Full training stack open. Real reasoning. Long context. Tool calling.
No API lock. No paywall. No fake “open-weight” nonsense.
If you’re looking for a model that just does the work, it’s one of the few in this range that actually delivers.
Source URL: https://medium.com/data-science-in-your-pocket/smollm3-the-best-small-llm-for-everything-3fa53713ebb7


‘0’ Komentar
Tinggalkan Komentar