# More on GA Visitor Loyalty (and unique visitors)

/

That was ultimately the question that faced me when I sat down at my computer today. A few weeks ago, I wrote about GA Visitor Loyalty, and today, read this comment/question from a reader:

I looked at the following:

Unique Visitors (39811 for the last month on my site))
minus
# of 1 times visits (37037)

Wouldn’t this mean that about 2800 different people visited my site more than once?

But no …

For this year, I had 306,810 unique visitors and 305,025 # of 1 times visits. Following my above logic, only 1800 visited my site more than once for the year, but 2800 did so for the month.

Any explanations?

So here is my answer, for this reader, and for anyone else who is interested:

Let’s start with your first question: If you take unique visitors and subtract the one time visits from that, won’t the difference be about equal to the number of people who visited your site more than once in the period?

Answer: No, it won’t. In fact, to make this easier, let’s reformulate the question: Why aren’t unique visitors in the period equivalent to one time visits?

“Unique visitors” measures how many times unique browsers (and we will just call them “people”) visited in a specific time period. But the Visitor Loyalty does something different. It says, “For every visit during this period, tell me the visit history for ALL TIME.” So the first time that I did the testing in this post (the testing that ensured I was the only visitor), I visited once, I went back the next day, and I was shocked to see that in the Visitor Loyalty chart, there was only one visit (no surprise there) and it was in the 201+ visits category (that was the shock.)

Unique visits for any time period are sprinkled throughout the chart. Maybe they visited once this month, but last year, they visited 35 times, and so this one visit, unique this month, or unique this year, is their 36th.

Ultimately, the Loyalty Chart is the wrong place to do this work — you need filters to include just returning visitors, or a filter to exclude new visitors. You might create a new profile with an exclude filter on “Visitor Type” and use the word new— that way, you will only get returning visitors in your profile. And then you can start to learn more. (Need to learn more about creating custom filters?)

– Robbin

Our founder, Robbin Steif, started LunaMetrics in 2004. She is a graduate of Harvard College and the Harvard Business School, and has served on the Board of Directors for the Digital Analytics Association. Robbin is a winner of a BusinessWomen First award, as well as a Diamond Award for business leadership. In 2017, Robbin sold her company to HS2 Solutions and has since retired from LunaMetrics.

• Not sure if I agree with you. Consider the # of 1 times visits. Presumably each one of those visits is from a unique visitor, and it should represent the number of people who visited the site for their first time in the relevant period. Subtracting that number from the total number of unique visitors should leave you with the number of unique visitors who had been on the site before the relevant period.

Does that make sense?

That might explain why the number for the full year is lower than for the past month. My traffic court site has been growing and didn’t have nearly as much traffic in 2006 as it has in 2007. The full year “1 times” number includes all the people who found the site in 2007. Perhaps the leftover 1800 represents users who had been on the site in 2006, so their first visits in 2007 were recorded as 2nd or higher visits.

One thing about our traffic court site is that it’s not designed for repeat visitors. Someone gets a speeding ticket and searches for information on the court, and that leads them to our site. Normally they get the information they need and should not need to return unless they get another ticket. Lawyers are repeat users, however, but that’s a small share.

• “Consider the # of 1 times visits. Presumably each one of those visits is from a unique visitor, and it should represent the number of people who visited the site for their first time in the relevant period.”

Hmm. Let me illustrate why I disagree with this. If I have no visit history with your site, I can visit five times in the period and show up as a 1 visit, a 2 visit, a 3 visit, a 4 visit and 5 visit. Like in this picture. So I was not a unique visitor in the #1 bucket because I have four other visits in the period, but it did represent the first time that I visited your site (ever. Not in the relevant time period.)

Having said that, the rest of your comments are interesting about your site in general. And I think it is awesome that you push the data through and think about the implications.

• David

Hi,
thank you for the clear explanation on how Visitor Loyalty works.
I am producing some stats on the visits to a web site and I ended up in a dead-end relating the accuracy of this report.
The stats I am compiling are related to the 2007 year and one of the tables I have included was the visitor loyalty. After analysing this stat it was decided that it was somewhat irrelevant for our purposes. What we really needed was a stat of visitors and not a stat of visits.
The way the table is designed did not allow us to produce the visitor statistic. So, what we tried to do was to create an estimate of this number by collecting the data for the whole period when the data was collected by Google Analytics (from 20/06/2006 till yesterday). With this data we could get an aproximation of the number of visitors who visited the site X number of times by dividing the number of visits by the number of times the visitor visited the site (I hope I am not being confusing but it’s kind of hard to explain…). For the intervals we used the average number in the interval to get an estimate.
We got this results (the columns on the visitors and the % of visitors were calculated, according to the rationale exposed above):
Number of visits Visits % of visits Visitors % of visitors
1 times 30.234 29,79% 30.234 73,9%
2 times 9.962 9,82% 4.981 12,2%
3 times 5.772 5,69% 1.924 4,7%
4 times 3.985 3,93% 996 2,4%
5 times 2.943 2,90% 589 1,4%
6 times 2.337 2,30% 390 1,0%
7 times 1.987 1,96% 284 0,7%
8 times 1.737 1,71% 217 0,5%
9-14 times 6.942 6,84% 604 1,5%
15-25 times 6.673 6,58% 334 0,8%
26-50 times 7.292 7,18% 192 0,5%
51-100 times 6.072 5,98% 80 0,2%
101-200 times 4.582 4,51% 30 0,1%
201+ times 10.972 10,81% 55 0,1%
40.909 100,0%

The problem is that when we look at the Number of Unique Visitors on the visitor overview tab we get the number of 30.661 unique visitors. Why do the calculations obtain a number of 40.909? I’m not saying this is an exact number because it is an estimate, but where did those 10.000 visitors have gone?
Could you please give me a hint on how this works, or on how I can obtain this statistic?
David

• David – notice how you can add the visits in your chart from just 1 times and 2 times, and you already have more than the actual unique visitors.

This is because if a visitor visits only twice, he is in the one time bar (once) and in the two time bar (once) . It doesn’t make sense to take all your two bar visitors and divide by two and call those unique visits. You might want to go back to the very original article, http://www.lunametrics.com/blog/2007/12/08/reading-reports-in-ga-loyalty/ and find this bold headline in it (sorry that there is no anchor)

GA does not aggregate an individual visitorâ€™s visits.

… and read the stuff right after that. Then you will understand why your math doesn’t work….

Your question was incredibly interesting to me and made me think some more about the aggregation problem. I don’t think I am done working on this….

• David

Oooops, I guess I misunderstood that part… 🙂

But there is one more doubt then: If the 1 time tab accounts for the number of visits by 1 time visitors how is it possible that the 1 time visits column returns a number that is smaller than the number of unique visitors. The idea is that for each visit there must be at least one visitor, so even if GA is not perfect, the number of 1 time visits should be bigger than the number of unique visitors during the period, right? Or am I missing something here?

“Your question was incredibly interesting to me and made me think some more about the aggregation problem. I donâ€™t think I am done working on thisâ€¦.”
Well I’m glad I could launch an interesting question. This statistic could be much improved and much more interesting if data on visitors was added. I hope GA adresses this issue on future improvements.

David

• steve

Robbin, David,
Is is possible to look at a “simple” fudge factor? – Externally as in. Not within GA itself.

Robbin, I know you’re aware I’ve written my own tools for some of these numbers – mainly the “visitors” stat, as that is the one that senior management and above want.
My reporting of this “number” has a fudge factor built in when I’d aggregate over longer periods of time.

Because the tools would be run over a weekly period, you couldn’t simply add # of visitors over 52 entries to get # of visitors for a year. It’d be too high, as the individuals would get counted multiple times. Once this week, once last week and so on.

But this problem is more around the Repeating visitors.
And I also track how many of those we get.
So I do a simply fudge downwards to account for them.

eg. 20% Repeat Visitors? Fudge the figures for a yearly report down by 20%.
Sure it isn’t perfect.But when I have run a full years worth of logs through the same tool. The numbers are close enough (by luck or accuracy I don’t know… 😉 ) that I’ve lived with the simply fudging for several years now.

The bonus with the fudge is that it’s *fast*. And my time can be at a premium. Shrug. YMMV. 🙂

HTH?
Cheers!
– Steve

• David – the first part of your latest comment and the second part are the same thing. You made me start to wonder the about the same issue.

Now, I definitely caught a version of that in my screen shot on the original post, where I showed how my 16 visits were all in the 201+ category. But your questions made me start to thing about this issue. In fact, GA only caught that snapshot because my cookies had me at lots of visits already, and my profile was new. (So the best it could do was say, she visited 16 times during the life of this week-old profile, and they were her 255th, 256th etc visit. Or wherever they actually were in the 201+ neighborhood.) But that really isn’t the problem for you, because you captured the entire life of your Google Analytics.

So I checked out a whole bunch of sites, using their best, all-inclusive profile, for the life of their experience with GA. And over and over again, I find the same thing: their one-time visits and their absolute unique visitors are almost identical. Just like you would expect them to be if everyone has to be a one-time visitor the first time they visit. In fact, yours are not that different – your absolute unique visitor count is only 400 visitors higher than your one-time visits, less than a 2% deviation.

Anyway, thanks for your thoughts. They certainly helped my thinking. STEVE, I will have to go back and reread your stuff when my eyes are more wide open.

• David

Robbin:

Your explanations related to the cookies are correct and could explain this problem if there were more visits than unique visitors. But that is not what’s happening. There are more unique visitors than visits which is not coherent (30.661>30.234) . I believe there is a bug on the collection or presentation of this data.

Anyway, I thought of another idea to get the results on the estimate number of visitors for each row in the table. If the number of visits is “inversely cumulative”, as to say that the “1 time” row includes all visitors throughout the period, the “2 times” row includes all the visitors that visited at least 2 times the site, and so on…
So I have collected the data related to the entire life of my Google analytics which was shown in the table above. The diference is in the calculations of the visitors figures: now I have displayed a column which calculates the “net visits” on each row. These “net visits” are calculated by subtracting the number of visits on the row above by the number of visits on the row below.
This method is not perfect, firstly because there is the “interval problem”. As we have intervals above the 9th times, it is difficult to calculate the number of “net visits” above the “8 times” row. I still have to figger a way to contour this problem on the intervals and an immediate solution would be to create an interval on the 8+ times. Anyway, these are the calculations on the non-intervalled rows:
Number of visits Visits % of visits Net visits
1 times 30.234 29,79% 20.272
2 times 9.962 9,82% 4.190
3 times 5.772 5,69% 1.787
4 times 3.985 3,93% 1.042
5 times 2.943 2,90% 606
6 times 2.337 2,30% 350
7 times 1.987 1,96% 250

As you can notice the calculations are made only on the rows till the 8th one, because of that “interval problem”. With this data on “net visits” I can estimate the number of visitors the same way I was calculating above, by dividing the number of visits by the number of times visited. This way I’ll get an estimate of the visitors for all GA statistics period and I can calculate a % of visitors for each row, getting a profile of use for the whole period.
Now I have a second problem: how can I extrapolate this results for the period I am studying (the whole 2007)?
I came up with a solution that, despite not being perfect, can give an aproximation. Using the absolute unique visitors number for the whole year displayed at the visitor overview tab you can extrapolate the results for each row, by multiplying that number by the percentage obtained previously for each row.
And that’s basically it. I still have to find a way to go over the intervals, but these will be the figures I will use.

Steve:
I didn’t get what you meant by fudge factor, nor how you calculate the 20% number. Could you give some explanations, please?

Thanks.

• David – you have more unique visitors than visits. Just look at the first line, because everyone has to visit the first time.

• Well, OK, maybe they don’t visit the first time (that is the point of my earlier comment – look at how I didn’t visit the first time in my original post, where my visits showed up as 201+. But that is the issue, how are a very small number of people able to become visitors and not visits the first time?)

• steve

Assuming nothing… 🙂
Fudge Factor? A made up number to give an answer closer to reality than that otherwise calculated.
http://en.wikipedia.org/wiki/Fudge_factor

As I understand, your original request is to get a measure of the number of VISITORS who returned X (1, 2, 3 etc) times to the site over the 2007 period?
vs the number of VISITS which is what the Loyalty report gives.

Your solution is the exactly the sort of thing I was talking about. Pick a method that gets close to the answer, knowing it’s not perfect, and use that.
And try and validate via some other method.

As for the 20%? In our case we have “83.21% New Visits”, it goes up and down marginally on a month by month basis, but is around 82-84% consistently.
Which when rephrased is 16-18% repeat visitors, or rounding for simplicity, 20%.

Currently we only have GA data from mid 2007. But we still need to report to seniormost management, and higher, visitor numbers back to 2003. So we rely on the tool I wrote for getting visitor numbers out of … funkier… log files. (Use Apache’s mod_usertrack module, and log the unique id cookie. Some additional funky algorithms around that fwiw.)

I automatically produce weekly numbers. Say we had 10 visitors last week. And 10 this week. Well, we know 20%, or 2 visitors THIS week, were included in last weeks numbers – giving 18 visitors for the combined two week period, vs a simple addition *ONLY* giving 20.
ie: 10 + (10 – 2).

So in the interests of *SPEED*, I simply add up all 52+ weeks of VISITORS in my master spreadsheet and subtract 20% to give a NUMBER for each year. Note this can also include parts of Jan “next year” (eg 2008), or miss parts of Dec “this year” (eg 2007) – weeks rarely start/ending on a year boundary. Shrug – it’s a fudge.
Which having *just* compared with the same period GA numbers is so close as to make no significant difference. 🙂

ie. The fudge is, once again, validated. 😀
To put in perspective on the Speed issue, reprocessing the logs from scratch would take around 1-2 hours on the hardware we currently have, for each year. Vs my fudge which is pre-calculated, ie 5-20 seconds, to open the spreadsheet. 🙂

On the Uniques vs 1st Time Visits difference? Same here.
In our case (~ 1.4M for 6 months), the diff was ~ 920. Or ~0.07%, if I’ve done the math right. 🙂

Make sense?
Cheers!
– Steve

• David

Makes perfect sense, indeed!
I used the method I’ve explained above which, despite not being perfect, might give us an approximate number of visitors.
It is a fudge factor that can solve the problem while GA decides to present a statistic of loyalty of visitors and not visits.

• Pingback: www.onlineglobalbiz.com()

• Thanks for this information – it’s interesting when you have very strong bi or tri-modal peaks in your loyalty graphs, trying to interpret them can drive you bonkers.

• I was doing research for an article that I am preparing. I found your article and posts quite interesting. thanks so much for sharing.

1.877.220.LUNA

1.412.381.5500

getinfo@lunametrics.com

Questions?
We'll get back to you
Our Locations
THE FOUNDRY [map]

24 S. 18th Street
Suite 100

Pittsburgh, PA 15203

THE STUDIO [map]

4115 N. Ravenswood
Suite 101
Chicago, IL 60613

THE LODGE [map]

2100 Manchester Rd.
Building C, Suite 1750
Wheaton, IL 60187