Arabic open models: when a Saudi business should self-host

The GCC now has capable, openly-licensed Arabic models. The real question is not which one is best, but whether to self-host at all.

Field notePOS, invoicing, and compliance workflows

Takeaway

ALLaM has downloadable open weights, while Falcon-H1-Arabic shows where regional Arabic models are heading. A practical test for when self-hosting an Arabic model beats a cloud API.

For the first time, a Saudi operator who wants an Arabic-capable AI feature inside their software has a real choice: call a cloud API, or run an Arabic-first model on infrastructure they control. Two GCC models make that choice concrete. ALLaM, built in Saudi Arabia, has open weights on Hugging Face and is also distributed through HUMAIN and the major clouds. Falcon-H1-Arabic, from the UAE, was released in January 2026 as an open Arabic model family with smaller variants, but teams should confirm access and deployment terms before treating it as self-hostable. The decision this raises is not academic. After PDPL enforcement turned cross-border data transfer into a live compliance question, where your Arabic model runs is now an operational and legal decision, not just a technical one.

What happened

ALLaM is a family of Arabic-first language models developed by the National Center for Artificial Intelligence at SDAIA. SDAIA listed the 7-billion-parameter instruction model on Hugging Face on 6 March 2025, making the weights downloadable, after earlier integrating it into IBM watsonx and Microsoft Azure. The model card lists it under the Apache-2.0 license, which permits commercial use. It was pretrained on 5.2 trillion tokens of Arabic and English, has a 4,096-token context window, and runs on the standard open-source serving stack: vLLM, SGLang, llama.cpp, Ollama, and LM Studio, with community 4-bit quantizations available. It had roughly 15,900 downloads in the last month.

The model's commercial side now sits with HUMAIN, the AI company wholly owned by the Public Investment Fund and launched in May 2025. HUMAIN Chat went live in the Kingdom on 26 August 2025, powered by the larger ALLaM 34B, which the company says was trained on one of the largest Arabic datasets collected and refined by more than 600 domain experts. ALLaM is also offered through Microsoft Azure AI Foundry and IBM watsonx for teams that prefer a managed API.

The regional picture filled in on 5 January 2026, when the UAE's Technology Innovation Institute released Falcon-H1-Arabic. It uses a hybrid Mamba-Transformer architecture, comes in 3B, 7B, and 34B sizes, handles context windows up to 256,000 tokens, and was trained on native Arabic rather than machine-translated text, covering Modern Standard Arabic plus Gulf, Levantine, North African, and Egyptian dialects. TII reports it leads the Open Arabic LLM Leaderboard, with the 34B scoring 75.36% and beating some 70B-plus models. It ships under the TII Falcon License, an Apache-2.0-based license that allows commercial use subject to an acceptable use policy.

Why it matters for operators

The benchmarks are not the point for a business. Two practical things changed. First, Arabic quality from open models is now good enough for real internal work, on ALLaM's own evaluations it clears the older Jais generation and competes with much larger general-purpose models on Arabic tasks, and Falcon-H1-Arabic was built for dialects from the start. Second, and more important, ALLaM's 7B open weights are licensed and sized so you can run them yourself, while Falcon-H1-Arabic gives teams another regional Arabic model family to evaluate once access, license, and deployment terms are clear.

That second point is what connects to compliance. Under PDPL, sending a customer record or employee file to an external model is processing, and if that model runs outside the Kingdom it is a regulated cross-border transfer. A self-hosted Arabic model changes the data path: the text never leaves your server. For a workflow like summarizing Arabic support tickets, drafting replies, tagging CRM notes, or reading uploaded documents, that can be the difference between a feature you can defend to SDAIA and one you cannot.

Where it helps

Self-hosting earns its place in a few specific operations. A restaurant or recreation venue handling Arabic reviews and WhatsApp messages can run sentiment tagging and reply drafts locally, keeping guest data in-house. An HR team can summarize Arabic CVs or policy documents without shipping employee data to a third party. A retail or services operator can add an Arabic summary layer over dashboard and POS exports, so a manager reads a short Arabic brief instead of scrolling raw rows. For these jobs, a 7B model is often enough, and a 7B model is small: at full precision its weights are around 14 GB, which fits on a single 24 GB GPU, and a 4-bit version drops to roughly 4-5 GB, enough to run on modest hardware or even a CPU for low volumes.

Where to be careful

Self-hosting is not free, it trades a per-call API bill for the work of running a service: a GPU, monitoring, updates, and someone accountable for uptime. For low or spiky volume, a managed API like ALLaM on Azure or watsonx is usually cheaper and simpler, and keeping data in-Kingdom can still be addressed through region settings and contracts rather than your own servers. The benchmark scores are vendor-reported, so test on your own Arabic data and dialects before trusting them; a leaderboard number does not predict how a model handles your menu items, ticket categories, or local terms. Licenses need a real read, not a glance: ALLaM-7B shows Apache-2.0 on its model card, but the card also points to a LICENSE file, and Falcon's license carries an acceptable use policy, so confirm the exact terms for your use before production. And the freely downloadable open weight is the 7B ALLaM preview; the 34B that powers HUMAIN Chat is reached through HUMAIN and the clouds, not the same as self-hosting it.

Cicada Solutions view

The honest answer for most operators is that open weights matter even if you never host them yourself. They give you leverage: a credible in-Kingdom, Arabic-first option that makes the build-vs-buy conversation real instead of theoretical. Our default is to start with the workflow, not the model. If the Arabic task is high-volume, touches personal data, or needs to stay in the Kingdom, prototype with a self-hosted 7B model such as ALLaM on a single GPU and measure quality on your own data; evaluate Falcon-H1-Arabic separately once access, license, and deployment terms are clear. If the volume is low and the data is not sensitive, a managed Arabic API is the cheaper path, and you can revisit later. Either way, decide where the data goes before you decide which model is cleverest. The models are ready for that conversation now in a way they were not a year ago. None of this is legal advice, confirm your PDPL obligations with SDAIA or qualified counsel, but the engineering reality has shifted: running Arabic AI on your own terms is now a normal option, not a research project.

Sources

Hugging Face model card: ALLaM-7B-Instruct-preview (NCAI / SDAIA) - accessed 2026-06-22.
Saudi Press Agency: SDAIA Lists ALLaM 7B Arabic Language Model on Hugging Face - published 2025-03-06, accessed 2026-06-22.
Arab News: Saudi-owned Arabic app HUMAIN Chat launches in Kingdom - published 2025-08-26, accessed 2026-06-22.
TII / Falcon LLM: Falcon-H1-Arabic - accessed 2026-06-22.
TII: Abu Dhabi's TII Launches Falcon-H1 Arabic - published 2026-01-05, accessed 2026-06-22.
SDAIA: Laws and Regulations (Personal Data Protection Law) - accessed 2026-06-22.