Languages Report in Google Analytics


The Languages report is among the most cryptic of the reports in Google Analytics. It looks like you need a secret decoder ring to figure out what it’s telling you. Here’s some guidance on what those codes are and what they mean.

Screen Shot 2013-02-11 at 11.06.48 AM

What are they?

These language codes represent a language and optional country variant. (We’ll look at where GA gets these momentarily.)

Screen Shot 2013-02-11 at 11.07.57 AM

The codes aren’t specific to Google Analytics; in fact, they’re based on two ISO standard specifications:

In most cases, languages in the Google Analytics report have a 2-letter language code (for example, “en” for English).

That may be all you see. Optionally, though, there may also be a 2-letter country code, with a hyphen separating the two parts (for example, “en-us” for US English).

Where does GA get them?

Google Analytics takes these values from the web browsers of visitors. Language is actually a user-selectable setting in most web browsers, generally defaulting to the language of the operating system. However, users can change the setting to reflect their preferences. Here’s an example of the setting in Chrome:

Chrome Language Preference

You can actually set several languages and rank them in order of preference. This preference can be used by sites to automatically choose a localized version through a process called content negotiation, giving you a translated version of the site in your most preferred language. Google Analytics simply reports the first preference in the browser.

Usage of country codes

In our experience, usage of country codes in the data in the Languages report varies widely. To quantify this, we took a sample of a recent 30-day period for a global organization with a website localized into several languages, containing data on approximately 1.4 million visits. Although every site will differ based on its audience, this sample gives us a wide set on which to make generalizations.

We found that usage of country codes does indeed vary by language. For the top ten languages for this site, English and Portuguese usually used a country designator like en-us or pt-br (95.0% and 92.5% of the time, respectively), and Chinese virtually always used one (99.9%). Other languages rarely use one, such as in simply fr or es (French only 12.7%, and Spanish, Arabic, Russian, and Japanese all below a third of the time).

Usage of Country Codes by Language

Overall, we found that the median of the language country code usage was only 14.8%. In general, less populous languages rarely used country codes (with some exceptions). The overall average, however, was 71.8% because of heavy representation of English and Chinese in this sample. The proportion of languages for which country codes are present for any given site will vary with the language composition of the visitors, obviously.

Data issues

If you page through your Languages report, you may find unusual language labels that don’t match the patterns above. Here’s what we found in our sample.

Missing data: (not set). Data for language was missing in few visits (0.02%). The primary culprits were Opera Mini and certain BlackBerry browsers.

Strange long hex labels. In 0.01% of cases, we saw strange long hexadecimal numbers: *30775594307752e1307755a430775578307753f0, for example. This stems from certain Blackberry browsers.

Misformatted labels. Labels misformatted in a variety of ways, including the following:

  • using a separating hyphen even when no country code is included: en-
  • using a separating underscore instead of a hyphen: en_us
  • accidental capture of additional parameters: en;q=1.0
  • inclusion of character encoding or other information: sr-latn-rs

The underscore was the most common of these, occurring in 0.14% of visits, and occurred in certain Blackberry browsers. The others are rare.

Invalid codes. We found a handful of codes that just don’t seem to exist: c, qcv, etc.

Undefined country code. There was one code with an anomalous country code: es-419. Although countries can be specified by a 3-digit numerical code under ISO 3166, 419 doesn’t correspond to any country. This actually accounted for a significant number of visits (0.95%) and was due to a particular version of Chrome that seems to have had a bug.

Three-letter language codes. The vast majority of languages we found in this sample of data were represented by 2-letter codes. However, ISO 639 allows for an expanded set of 3-letter codes for languages that do not have 2-letter representations. The only 3-letter language code we saw with frequency was fil (Filipino) at 0.14%, but this is an indicator that we need to be careful in filtering the language field because 3-letter languages may occur, and could vary depending on the site’s audience.

Filtering and segmenting language data

In most cases, if we’re interested in language data, we’re interested only in the language code and don’t care so much about the country. After all, we have the Locations report to tell us about where a visitor is physically located, and the differences in most cases between language variants in two countries are not so different (en-us vs. en-gb, for example). (That’s not always the case, however; Chinese variants may be not be mutually intelligible.)

To create filters or advanced segments, you can use “begins with” as a matching criterion:

Begins with filter

(Note that you don’t want to include the hyphen, because sometimes it may not be there (no country code), and sometimes it may be an underscore instead.)

However, based on our findings above, we recommend instead that you filter on two-letter languages where the string ends or continues with a hyphen or underscore using a regular expression:

Filter by RegEx

Copy-and-paste-able: en($|[-_])

This ensures that you don’t run into any problems with 3-letter language codes. (For example, begins with “fi” for Finnish would also include “fil” for Filipino. The regular expression excludes that.)


Jonathan Weber is our Data Evangelist, focusing on bringing the strategic value of data analysis to our customers. He spreads the principles of analytics through our training seminars and even wrote a book on Google Analytics & Tag Manager. Before he caught the analytics bug, he worked in information architecture. Away from the computer, you can find him as a flower farmer and plant geek.

  • Great post! One of the thing I do is create a “simplified language” advanced filter as follow:
    Field A: Language Settings, ^(.+)[-_]
    Output: Language Settings: $A1

    This way, only get the 1st part of the locale (like EN, FR, etc.) instead of EN-US, EN-CA, EN-UK, etc. This gives me a better appreciation of language distribution without having to use advanced segments or filters, and I can always use the demographic location to segment.

  • Jonathan Weber

    Stephane – yes, even better to consolidate the data before you see it in the report. Thanks!

  • NicoDavila

    I thought that es-419 is about “latin america”. It’s not about any country but it’s about latin america speakers. Am I right?

  • Jonathan Weber

    Thanks for the catch on “419” — it’s not in the list of country codes because it’s… not a country! Nevertheless, the usage seems to be unusual; only certain versions of Chrome seem to make use of it.

  • Is there a way to find out what the language code “C” designates–you mention in your post that it is an invalid code? The “C” language is coming in from a not set location (city or country)and I can’t determine anything else. Thanks!

  • andrew


    I have few sites targeting to italy and french.

    One site which is targeting to italy,when i check the google analytics of this site, shows the traffic from as follows


    Same thing with french sites


    So where i want to know what is the different between it and it-it, fr and fr-fr.

    Please help me, looking for your help. I have bookmarked your post will check again it for your reply..


  • Interesting stuff. So I am in Scotland on business, with an American computer but I live in France but am originally from the UK. My firefox browser shows en-Gb as it is a preference. I wonder when this was set. But is I fill in a form they will think I am a UK user but I am not.
    I am interested in targetting Expats so they will probably appear as users from where they were originally from.

  • Andrew — that was covered in the original post. The country parts are optional and some browsers include them, some don’t. There is no meaningful difference between it and it-it, or fr and fr-fr (except you can be sure fr-fr is not fr-ca).

    Rumble — keep in mind we also have the locations report, which reports where you are. So the combination of language and location may be helpful to you in sorting things out. If a user travels about, your location will update (whereas your language most likely will not).

  • Alex

    Hi Johanthan,

    Thanks for this article.. very in time..
    This reg expression very handy..

    I could create 3 filters for my 3 main languages Est, Eng and Rus

    However that’ gives me separate from each other results..

    Is there any way to create a custom report that would display e.g a pay chart, that calculated only by Totals of each language filter?

    What I wanna to achieve is to be able to see what’s the ratio or better say relationships between this three languages..
    hope that makes sense 🙂
    Could you suggest me where I could learn about it?

    Looking forward to hearing from you


  • Bill Schlack

    I have many client accounts set up that are showing in the neighborhood of 90% (not set) for language. 1. Am I doing something wrong. 2. is there anything I can do to better capture this info 3. is there anything this may tell them further.

  • Ravi ST

    Hello Jonathan,
    I tried the suggestion given in the above article to compile the language data but the problem which is coming is when I apply the filter the data instead of compiling disappears.
    When it is removed then the entire data comes.up as it is originlly shown.
    Looking forward for your valuable suggestions on this

  • Ravi

    Hello Mr. Hamel,

    May I ask for a help with regards to compile the data shown in Google Analytics for Language like en-us, en-gb, eb-in, fr, it etc.

    I wanted to combine all English data, Chinese, French etc.
    I didn’t understand where to apply the codes given by you. If you can guide the steps for the same then it would be of great help.

    Looking forward for your valuable feedback.


  • Pablo Canedo Lema

    Hi Jonathan,

    Great article!
    What would you suggest for those who want to separate the data comming from different languages of the site? Like (English) and (German).


  • help

    i actually have text in the language. the visitors are coming from Russia and the text says “Vote for Trump”. What in the world?

  • Adam Davies

    I’ve had 5,610 visits from es-419. Shame as it’s a significant number and makes analysis a bit tricky.

  • Clarisse Cespedes

    Exactly the same happened to me!

Contact Us.

Follow Us



We'll get back to you
in ONE business day.
Our Locations
THE FOUNDRY [map] LunaMetrics

24 S. 18th Street
Suite 100

Pittsburgh, PA 15203


4115 N. Ravenswood
Suite 101
Chicago, IL 60613


2100 Manchester Rd.
Building C, Suite 1750
Wheaton, IL 60187