Internal AI

Choosing open-source models & commercial licenses

Choosing open-source models and checking commercial licenses for internal AI

Choosing a model for internal AI is a three-way problem: capability (does it do your job well), hardware (does it fit your RAM/VRAM) and license (are you allowed to use it commercially). The pragmatic rule: pick the smallest size that works, prefer a multilingual model with good Vietnamese, and always read the latest license on the model card yourself before going to production. This is Part 3/8 in the Build Internal AI series.

Quick summary

  • Popular families: Qwen, Gemma, Llama, Mistral, DeepSeek — each with multiple sizes and variants.
  • Licenses differ: some are Apache 2.0/MIT (permissive), some use the publisher's own license with attached conditions. Don't guess — read the model card.
  • Model size: pick the smallest that works; larger sizes need more memory and run slower.
  • Vietnamese: prefer multilingual models, but test on your real questions — don't trust marketing.
  • How to choose: download a few candidates, run them on a real question set, compare quality and speed.

Popular open-source model families

The open-model market moves fast, but a few "families" show up repeatedly across internal-AI deployments. Here are short descriptions — no benchmark scores, because those numbers depend on version and time and go stale quickly:

  • Qwen (Alibaba): a multilingual family with many sizes, often rated well for Vietnamese and Asian languages. Has instruct, coder and multiple size options.
  • Gemma (Google): Google's "lightweight" open family, several sizes, good docs and tooling. Uses its own license ("Gemma Terms") — read it carefully.
  • Llama (Meta): one of the oldest and most popular open families, with a very large tooling ecosystem. Uses its own community license with some conditions.
  • Mistral (Mistral AI): a group of compact, efficient models from France. Some releases are under Apache 2.0; others have their own terms — check each release.
  • DeepSeek: a family strong at reasoning and coding that has drawn a lot of attention recently. License and terms vary by release — read the model card before use.

This list is not exhaustive and will keep changing. Your job isn't to "pick the one perfect model" but to shortlist candidates that fit your hardware and license, then test them for real.

Licenses — make-or-break for commercial use

This is the most overlooked part but carries the highest legal risk. "Open source" does not automatically mean "free to use commercially however you like." Each model family has a different license, and even within the same family, different versions may carry different terms.

  • Apache 2.0 / MIT: permissive licenses that generally allow commercial use, modification and redistribution with few conditions. Some Mistral releases ship under Apache 2.0.
  • Publisher's own license: Gemma uses Google's "Gemma Terms of Use"; Llama uses Meta's "Llama Community License." These may allow commercial use but come with conditions (e.g. use restrictions, scale thresholds, attribution requirements…). The exact conditions change by version.
Important — verify it yourself

Model licenses change over time and per version. This article describes only the general picture; it is not legal advice and makes no claim about the specific license of any particular model release. Before going to production, you must read the latest model card and license text on the publisher's official page, and seek legal advice if needed.

Model familyLicense (described cautiously)Notes
QwenVaries by release — many are permissive, some have specific terms. Check the model card.Multilingual, many sizes, often good for Vietnamese
GemmaGoogle's "Gemma Terms of Use" — has use conditions. Check the model card.Lightweight, good docs & tooling
LlamaMeta's "Llama Community License" — commercial use with conditions. Check the model card.Very large tooling ecosystem
MistralSome releases Apache 2.0; others have their own terms. Check the model card.Compact, efficient
DeepSeekVaries by release. Check the model card.Strong at reasoning & coding

The table deliberately does not state an absolute license for each release, because they change. Treat the "License" column as a reminder to go read the original, not as a conclusion.

Model size vs capability vs hardware

Models usually come in several sizes, measured by parameter count (e.g. 4B, 7B, 14B, 32B, 70B…). The general rule: bigger sizes are usually smarter but need more memory and run slower. For internal AI, the goal isn't "the biggest model" but the smallest size that works for your task.

  • Start with a mid size (say the 7B–14B range) — often enough for document Q&A, drafting, summarizing.
  • Only go bigger when truly needed — when testing shows the small size falls short and the hardware has room.
  • Quantization (compressing weights) lets a larger model fit smaller memory, at a possible slight quality cost — details in the Serving article.

Model size must match the hardware you chose in the Hardware article. Choosing a model too big for the machine will make the system slow or unable to run.

Vietnamese support

If your users mostly ask in Vietnamese, prefer a multilingual model with good Vietnamese. Many modern families (like Qwen, Gemma, Llama) claim multilingual support, but real-world Vietnamese quality varies quite a lot across versions and sizes.

The only reliable way is to test on your own questions: take 20–50 real questions your staff actually ask, run them through a few candidate models, and judge the answers yourself (correct grammar, correct meaning, correct business context). Don't pick a model just because of a general leaderboard — it may not reflect your organization's kind of questions.

How to test & compare candidates

A tidy, repeatable model-selection process:

  1. Filter by license first: immediately drop releases whose license doesn't fit your commercial needs.
  2. Filter by size: keep only sizes that fit the hardware you already have.
  3. Prepare a real question set: 20–50 questions/tasks representative of actual work.
  4. Download a few candidates (e.g. via Ollama) and run the same question set.
  5. Compare: score answer quality + measure speed (tokens/sec, latency) on your hardware.
  6. Pick the smallest size that clears the quality bar — that's the candidate for a wider pilot.
For the IT team

Downloading and trying a model takes only minutes with Ollama:

# download an open-source model to the machine
ollama pull qwen2.5:7b
ollama pull gemma2:9b # pull another candidate to compare

# run it right in the terminal, 100% offline
ollama run qwen2.5:7b

View the model card & license on Hugging Face: open the model page (e.g. huggingface.co/<org>/<model>), read the description (model card) for supported languages, sizes and usage; check the "License" tag at the top of the page and open the "Files" tab to read the LICENSE / attached terms. Always read the latest version before production.

Model selection flow for internal AI Needtask · language Fitshardware?within RAM/VRAM Commerciallicense?read model card GoodVietnamese?test real questions Choose modelsmallest that works
Model selection flow: need → fits hardware → commercial license → good Vietnamese → choose the smallest size that works. Diagram: Namtech.

The Namtech view

In internal-AI deployments, Namtech favors commercially-safe open-source families such as Qwen and Gemma — good Vietnamese quality plus a range of sizes to balance against Apple Silicon hardware. But the final choice always rests on testing on the client's real questions and re-reading the latest license at deployment time — because both model capability and license terms keep changing. We don't "lock in" to a single model; the approach is to pick the smallest size that clears the quality bar, then scale as needs grow.

Frequently asked questions

Does "open source" mean I can use it commercially however I like?

Not automatically. Each model has its own license: some are very permissive (Apache 2.0/MIT), some are the publisher's own license with attached conditions. You must read the latest model card and license text yourself before commercial use — and seek legal advice if needed.

Which model is best for Vietnamese?

There's no fixed answer — it depends on the version and your kind of questions. Many multilingual families (Qwen, Gemma, Llama…) support Vietnamese to varying degrees. The reliable way is to take 20–50 real questions from your organization, run them through a few candidates and judge them yourself.

What model size should I pick?

Pick the smallest size that clears the quality bar for your task. Bigger sizes are smarter but need more memory and run slower. Match the model size to the hardware you chose in the Hardware article, and consider quantization in the Serving article.

How do I check a model's license?

On Hugging Face, open the model page and check the "License" tag at the top; open the "Files" tab to read the LICENSE file and attached terms; read the model card carefully. For Gemma see Google's official page, for Llama see Meta's license page. Always check the latest version.

Not sure which model to choose?

Namtech helps you pick a commercially-safe open-source model that fits your hardware and handles Vietnamese well — running 100% on your own infrastructure, data never leaving the organization.

Book a free consultation

Note: This is a general guide, updated 02/07/2026; it is not legal advice. Licenses and models change fast — read the latest model card and license when you deploy.

Get started

Start with a free assessment

To define the right package and detailed scope, Namtech offers a short, no-cost assessment.

We reply within 1 business day. No spam, we never share your info.