Teaching Gemma-2-2B to Actually Speak Turkish

Machine Learning & AI

Emincan Tetik

•

June 11, 2026

Jun 11, 2026

Around 100 million people speak Turkish. So you'd think the open-source language models everyone's building on would handle it reasonably well. They don't. Ask most of them a question in Turkish and you get back something that's almost right but clearly off: a suffix in the wrong place, a verb that doesn't agree, and every so often the model just gives up and answers half in English.

We got tired of working around that, so we decided to fix it ourselves.

What follows is the story of how we took Google's Gemma-2-2B model and trained it to handle Turkish properly: why it was broken to begin with, what we did about it, and how much it actually improved. There's a technical layer here, but the short version is that you don't need a giant model or a giant budget to make a real difference for a language like Turkish.

Why Turkish Trips Models Up

The short version is that Turkish is agglutinative. You take a root word and keep stacking suffixes onto it, each one adding tense, possession, case, plurality, and so on. Arkadaşlarımızınkilerin — very roughly, "of those belonging to our friends" — is one word, and nobody who speaks Turkish would blink at it. If your model learned language mostly from English text, this is exactly the kind of thing it has never really had to deal with.

When we ran the base Gemma-2-2B model before doing anything to it, the problems were consistent. It botched verb and noun endings constantly. It would fall back on English word order (Turkish puts the verb at the end; English doesn't). It switched into English when it should have stayed in Turkish almost a quarter of the time. It lost the thread in longer conversations, and it read idioms literally instead of understanding them.

On ARC-TR, the Turkish version of a standard reasoning benchmark that tests comprehension rather than just grammar, the base model scored 0.188. That number told us what we already felt from poking at it: technically it spoke Turkish, but you wouldn't ship it.

What We Did, Without the Jargon

Training a model from scratch was never on the table. Too expensive, too slow, and unnecessary for what we were trying to do. Instead we used an approach called LoRA, which lets you take an existing model and adapt it without retraining the whole thing. In plain terms: we left the original model mostly untouched and only trained an adaptation layer that teaches it Turkish. Compared with training from scratch, this is far faster and more resource-efficient, and the original model is preserved as-is.

For the technically curious: we used LoRA at rank 128 with 4-bit quantization, applied across both the attention and MLP layers, trained in BF16 mixed precision on 8×NVIDIA H200 GPUs. Full configs are in the open-source release.

The Data

We trained on an open Turkish dataset from TÜBİTAK BİLGEM. We used a slice of it (36 of 95 files) that gave us enough variety without dragging training on forever.

The mix is roughly: news (30%), encyclopedic content (25%), literary text (20%), social media and forums (15%), and academic or technical writing (10%). That spread matters because it's what stops the model from sounding like it only ever read one kind of text.

None of it went in raw. We cleaned out HTML and links, filtered anything that wasn't clearly Turkish, removed duplicates, dropped low-quality text, and masked personal information. After all that we were left with about 3.19 million training examples.

What Changed

Here's the headline comparison on the ARC-TR reasoning benchmark:

Metric	Base	Fine-tuned	Change
Accuracy	0.188	0.224	+19.1%
Normalized accuracy	0.244	0.277	+13.5%

The benchmark moved, but the more telling improvements showed up in the model's actual output: grammar accuracy went from 67.2% to 92.4% on our Turkish grammar test, the English code-switching problem fell from 23.7% to 4.2%, Turkish paraphrasing quality more than doubled on standard scoring, and the model stayed coherent over longer passages far more often (71.5% to 87.3%).

What that adds up to in practice: the model finishes Turkish sentences the way a person would, stops bailing into English, writes summaries that hold together, and handles the kind of suffix gymnastics that used to break it.

What This Says About Smaller Languages

Turkish sits in an awkward middle ground. There's a decent amount of data out there, but nothing close to what English or Chinese have. What our results suggest is that you don't need a giant model or a giant budget to close the gap. A compact model, a sensible approach, and a dataset that's been cleaned up and covers a range of topics will get you a long way.

We think the same recipe would work for plenty of other languages that are underserved — where the distance between "the model technically works" and "the model sounds native" is still wide.

Where We're Taking It

This is a prototype. It's good, but it's not done.

We only used part of the dataset, so the obvious next step is training on the full thing. We'd also like to do instruction tuning, which is what turns a raw model into something that can follow instructions and hold a proper conversation. On the product side, we want to adapt it specifically for the analytics, customer data, and BI use cases that matter for the B2Metric platform.

Open Source

The training code, configs, and benchmark results are going out publicly. The point is to give the Turkish NLP community something to build on and to leave a reference for anyone tackling a similar low-resource language problem.

If you're working on Turkish NLP, want to extend any of this, or just have questions about how we did it, get in touch — we'd genuinely like to hear from you.

B2Metric is a product analytics and customer data platform. We took this on as part of building AI infrastructure that can actually make sense of Turkish-language data.

Table of contents

Teaching Gemma-2-2B to Actually Speak Turkish

Recent blog posts

Emincan Tetik

•

Jul 9, 2026

The Omnichannel Blind Spot: Why Most Retailers Still Don't Know Their Own Customers

Technical Insights

Technical Insights

4.8/5

4.9/5

4.6/5

5.0/5

Product

Customer Data Platform - CDP

A/B Test, Experiment

Flowly : Customer Journey

Orchestration

SignalOne - Server Side

Tracker

Customer Churn Predictor

AI-Agent ChurnShield

CRM Campaign Revenue Booster

Session Replay

Anomaly Detection

Embedded Analytics

Product Funnel Analytics

Segmentation

App Push Notification

Web SDK

IOS App SDK

Android App SDK

Reco AI Shopping Assistant

Company

Resources

Information Security Policy

Academy

Comparison

Top Blogs

What Is a Customer Data Platform and Why It Matters?

What Are AI Agents Really Doing And Why Should You Care?

How Predictive Analytics Is Revolutionizing the Food & Beverage Industry (And Why It's Just the Beginning

The Beginner's Guide to Startup Analytics

Product Metrics You Should Track to Drive Business Success

Subscribe to our newsletter!

Term of Use

4,8/5

4,9/5

4,6/5

5.0/5

Subscribe to our newsletter!

Term of Use

Recent blog posts

The Omnichannel Blind Spot: Why Most Retailers Still Don't Know Their Own Customers

A shopper buys in your store on Saturday. On Monday she gets an email promoting the same product at 20% off. It's a data unification failure. Here's how B2Metric fixes it.

Why Complex Event Processing Is the Missing Engine in Telecom Campaign Management

Most telecom campaign systems batch overnight and send yesterday's signal to a customer who decided this morning. Here's why Complex Event Processing — and how B2Metric Flowly sits on top of it

The Data Was Always There. Your Analysts Just Couldn't Talk to It.

We deployed B2Metric Asky at a large fintech — turning Oracle databases and BI dashboards into something analysts can simply talk to. No SQL, no 40 open tabs. Just a question and a validated memo

Related Blogs

The Omnichannel Blind Spot: Why Most Retailers Still Don't Know Their Own Customers

A shopper buys in your store on Saturday. On Monday she gets an email promoting the same product at 20% off. It's a data unification failure. Here's how B2Metric fixes it.

The Omnichannel Blind Spot: Why Most Retailers Still Don't Know Their Own Customers

A shopper buys in your store on Saturday. On Monday she gets an email promoting the same product at 20% off. It's a data unification failure. Here's how B2Metric fixes it.

Why Complex Event Processing Is the Missing Engine in Telecom Campaign Management

Most telecom campaign systems batch overnight and send yesterday's signal to a customer who decided this morning. Here's why Complex Event Processing — and how B2Metric Flowly sits on top of it

Why Complex Event Processing Is the Missing Engine in Telecom Campaign Management

Most telecom campaign systems batch overnight and send yesterday's signal to a customer who decided this morning. Here's why Complex Event Processing — and how B2Metric Flowly sits on top of it

The Data Was Always There. Your Analysts Just Couldn't Talk to It.

We deployed B2Metric Asky at a large fintech — turning Oracle databases and BI dashboards into something analysts can simply talk to. No SQL, no 40 open tabs. Just a question and a validated memo

The Data Was Always There. Your Analysts Just Couldn't Talk to It.

We deployed B2Metric Asky at a large fintech — turning Oracle databases and BI dashboards into something analysts can simply talk to. No SQL, no 40 open tabs. Just a question and a validated memo

Related Blogs

Related Blogs

The Omnichannel Blind Spot: Why Most Retailers Still Don't Know Their Own Customers

A shopper buys in your store on Saturday. On Monday she gets an email promoting the same product at 20% off. It's a data unification failure. Here's how B2Metric fixes it.

Why Complex Event Processing Is the Missing Engine in Telecom Campaign Management

Most telecom campaign systems batch overnight and send yesterday's signal to a customer who decided this morning. Here's why Complex Event Processing — and how B2Metric Flowly sits on top of it

The Data Was Always There. Your Analysts Just Couldn't Talk to It.

We deployed B2Metric Asky at a large fintech — turning Oracle databases and BI dashboards into something analysts can simply talk to. No SQL, no 40 open tabs. Just a question and a validated memo

The Omnichannel Blind Spot: Why Most Retailers Still Don't Know Their Own Customers

A shopper buys in your store on Saturday. On Monday she gets an email promoting the same product at 20% off. It's a data unification failure. Here's how B2Metric fixes it.

Why Complex Event Processing Is the Missing Engine in Telecom Campaign Management

Most telecom campaign systems batch overnight and send yesterday's signal to a customer who decided this morning. Here's why Complex Event Processing — and how B2Metric Flowly sits on top of it

The Data Was Always There. Your Analysts Just Couldn't Talk to It.

We deployed B2Metric Asky at a large fintech — turning Oracle databases and BI dashboards into something analysts can simply talk to. No SQL, no 40 open tabs. Just a question and a validated memo

Product

Company

Resources

Top Blogs

Subscribe to our newsletter!

Product

Company

Resources

Top Blogs

Subscribe to our newsletter!

Product

Top Blogs

Company

Resources

Subscribe to our newsletter!

No internet connection