Are Strava heatmaps reliably indicative of general cycling activity?

User avatar
geomannie
Posts: 1099
Joined: 13 May 2009, 6:07pm

Are Strava heatmaps reliably indicative of general cycling activity?

Post by geomannie »

Hi All

Tomorrow I am giving a short presentation at the Cycling UK Scotland AGM on how Strava cycle usage correlates to cycling surveys. In short, I have found the relationship good and that Strava heatmaps are an excellent proxy for cycle activity. While I think what I want to say is uncontroversial, I often get a lot of kick back when I mention it.

What do you all think? I would be interested in comments supportive or not, prior to going public with this with a live audience. Here is the link to the pdf https://glasgowcycleman.files.wordpress ... final3.pdf

If you are in Scotland, come and hear me (and everyone else). Registration at https://www.eventbrite.co.uk/e/cycling- ... 0739330704


Thanks
geomannie
User avatar
meic
Posts: 19355
Joined: 1 Feb 2007, 9:37pm
Location: Caerfyrddin (Carmarthen)

Re: Are Strava heatmaps reliably indicative of general cycling activity?

Post by meic »

It ignores cyclists that dont use Strava.
You can assume that the users dont really differ from non-users, so it all balances out.
Or you can assume that Strava users are a very biased sample and completely misrepresent cycling activity.
If you look at Cambridge city centre you will assume the latter case.

Edit: I typed this before seeing that your study was about using Strava data rather than just you using it in something else.
So my comment is stating the obvious to you on a subject that you have clearly looked into much more deeply than I have.
Yma o Hyd
Vorpal
Moderator
Posts: 20717
Joined: 19 Jan 2009, 3:34pm
Location: Not there ;)

Re: Are Strava heatmaps reliably indicative of general cycling activity?

Post by Vorpal »

I have some questions, and also would like to point out some possible issues..

Who performed the cycle surveys? DfT and some local authorities do not include cyclists who are not in the main carriageway.
Were any segregated facilities included in the comparison? I certainly donæt believe that roadside surveys are necessarily the best measure of cyclist activity. They need to done right, and many loacl authorities do not do so. DfT are even worse.


Potential problems...
1) Comparing averages to averages is always going to be problematic. All of the other statistical parameters need to be included. Without understanding the statistical distribution, you could be comparing apples to oranges without knowing it. Also women may be more likely to be utlity cyclists, in which case the bias toward men on strava may be more problematic that you have indicated.

2) I think that the correlation for a road like the A77 is always going to be good between strava and surveys. The cyclists who use a main road are more likely to be strava users. Does the correlation also hold on a route through a park? On a segregated path to a school? Were these compared separately? I saw only one 'local path' on the included information. I'm not convinced that the correlation will hold for all heavily cycled routes, but it is likely to hold for many. This should be considered in the presentation.

3) You don't include any statistical error

4) There may be other sources of possible error that are not discussed?
“In some ways, it is easier to be a dissident, for then one is without responsibility.”
― Nelson Mandela, Long Walk to Freedom
User avatar
mjr
Posts: 20334
Joined: 20 Jun 2011, 7:06pm
Location: Norfolk or Somerset, mostly
Contact:

Re: Are Strava heatmaps reliably indicative of general cycling activity?

Post by mjr »

Many good points are made above and I'd suggest it negates the conclusion - Strava may be a cheaper proxy for roadside cycle counts, but we've been pointing out for years that they're flawed in several ways.

I'd also point out that page 7 refers to the National Travel Survey figures for cycling "trips". I suspect it would be better to compare with cycling "stages" in this context, if it's available.

It would be interesting to compare Strava with the Sustrans route point counts, if you can get hold of them. From looking at heatmaps of Cambridge and King's Lynn, I suspect Strava will completely underrepresent them. I think it's completely untrue that "most cyclists use major roads" in mass cycling areas, except where those are the desire line.

It may also be interesting to compare the maps with what http://PCT.bike suggests for improvement.
MJR, mostly pedalling 3-speed roadsters. KL+West Norfolk BUG incl social easy rides http://www.klwnbug.co.uk
All the above is CC-By-SA and no other implied copyright license to Cycle magazine.
User avatar
geomannie
Posts: 1099
Joined: 13 May 2009, 6:07pm

Re: Are Strava heatmaps reliably indicative of general cycling activity?

Post by geomannie »

Vorpal wrote:I have some questions, and also would like to point out some possible issues..

Who performed the cycle surveys? DfT and some local authorities do not include cyclists who are not in the main carriageway.
Were any segregated facilities included in the comparison? I certainly donæt believe that roadside surveys are necessarily the best measure of cyclist activity. They need to done right, and many loacl authorities do not do so. DfT are even worse.


Potential problems...
1) Comparing averages to averages is always going to be problematic. All of the other statistical parameters need to be included. Without understanding the statistical distribution, you could be comparing apples to oranges without knowing it. Also women may be more likely to be utlity cyclists, in which case the bias toward men on strava may be more problematic that you have indicated.

2) I think that the correlation for a road like the A77 is always going to be good between strava and surveys. The cyclists who use a main road are more likely to be strava users. Does the correlation also hold on a route through a park? On a segregated path to a school? Were these compared separately? I saw only one 'local path' on the included information. I'm not convinced that the correlation will hold for all heavily cycled routes, but it is likely to hold for many. This should be considered in the presentation.

3) You don't include any statistical error

4) There may be other sources of possible error that are not discussed?


Hi Vorpal

You make many good points.
1) I accept that the point about comparing disparate datasets. What I think I have shown is that at a gross level the datasets correlate well. In themselves they tell us nothing about who is cycling, men or women, but as far as I can tell, the Strava demographics and the demographics of the cycling poplation as a whole are similar. I can go further, the cycling officer in East Renfrewshire tells me that their own data show that men comprise 87% of the cycling population, women 13%. This is very close to the Strava demographic of 88:12. The sex problem is thus contained.

2) You might say that the correlation for a road like the A77 is always going to be good, but without eveidence that would be a bold statement. In a park conversely, when cycling traffic is a few 10's of cyclists per day, the randomness of cycling activity will render it difficult to be sure if a true "average" has been captured over 2-3 days of monitoring. I would argue that Strava might be the better metric. Indeed, one cannot prove it isn't.

3) Stratictical error, fair comment and I will be looking at this once I can get my pet statistician to look at it (my son).

4) Other sources of error. Of course. This is the real world :D
geomannie
User avatar
mjr
Posts: 20334
Joined: 20 Jun 2011, 7:06pm
Location: Norfolk or Somerset, mostly
Contact:

Re: Are Strava heatmaps reliably indicative of general cycling activity?

Post by mjr »

geomannie wrote:2) You might say that the correlation for a road like the A77 is always going to be good, but without eveidence that would be a bold statement. In a park conversely, when cycling traffic is a few 10's of cyclists per day, the randomness of cycling activity will render it difficult to be sure if a true "average" has been captured over 2-3 days of monitoring. I would argue that Strava might be the better metric. Indeed, one cannot prove it isn't.

Where did that assumption that park route traffic will only be "a few 10's of cyclists per day" come from? I don't know your geography, but in my local area, cycle traffic on National Route 1 (550 per day, automatic counter) through a park is more than double that of the nearest DfT count point on the parallel section of A148 (180, manual count reported by DfT).
MJR, mostly pedalling 3-speed roadsters. KL+West Norfolk BUG incl social easy rides http://www.klwnbug.co.uk
All the above is CC-By-SA and no other implied copyright license to Cycle magazine.
User avatar
meic
Posts: 19355
Joined: 1 Feb 2007, 9:37pm
Location: Caerfyrddin (Carmarthen)

Re: Are Strava heatmaps reliably indicative of general cycling activity?

Post by meic »

To what end will this data be used?
The Strava users in ways mirror the predominant group of cyclists.
This will amplify their importance and render the other users invisible. This could be used to justify making cycle provision that fortifies this highly skewed use of cycles, rather than making provision for other groups of cyclists who are both struggling to cycle on the roads and having their existence ignored by the planners.

Are the runners, swimmers and other Strava users on different heat maps?
To the audience members who are not statistically literate the obvious flaws like that (and average speed of cyclists in Ceredigion being 20mph!) will carry more weight than the statistics which they have to take on trust.
I see local roads highlighted on Strava which I (despite being pretty hardcore) do not frequent, I know they are used by local race clubs as training rides.
These obvious biases may be dwarfed out by a much larger area of overlap between Strava users and non-users. Yet without any statistical literacy to fall back on, I would be among the people in the audience who are not happy to accept Strava users as being representative.
I would be willing to accept Strava data being used on a "we have nothing better" basis.
Yma o Hyd
User avatar
Wanlock Dod
Posts: 577
Joined: 28 Sep 2016, 5:48pm

Re: Are Strava heatmaps reliably indicative of general cycling activity?

Post by Wanlock Dod »

I think that it is quite a convincing analysis, and I am reasonably persuaded to believe it. I have a couple of comments.

I think that your data sets are not normally distributed, and that this compromises the (presumably) linear regression analysis to some extent because a lot of the data tends to cluster close to the origin and the few samples with big numbers really drive the results. This would probably be most important for the Glasgow 2014 analysis. You should be aware of how this affects the analysis, and could check this quite easily by trying to fit a model to log transformed datasets, if you can still fit an OK model then you can probably ignore it as not having an important effect on your analysis. The model with log transformed data simply puts more emphasis on the average or typical situation than the extremes. Perhaps the important message there is that the very small numbers of cyclists at most sampling sites make the analysis more uncertain that it would be if there were generally higher levels of cycling.

I'm a keen cyclist myself, and an enthusiastic Stravaist, and I reckon that these days virtually all of my rides by distance are logged on Strava, but that fewer than half of my rides by number of journeys are logged on Strava. This is because when I am just hopping on my bike to the shop I just don't bother, even if I have a race with the kids at the BMX track, but I will record my recreational rides pretty religiously. That makes me wonder if the Strava derived estimates should not be treated with the expectation that, at least in some cases there will be additional activity, although I would hesitate to suggest that the Strava predicted levels would be a lower limit the original data that they are derived will never be higher than the actual number of journeys, and this must impart a kind of one sided error to your model, such that he upper limit of actual cycle journeys will probably never be definable.
JimL
Posts: 200
Joined: 5 Nov 2013, 11:42am

Re: Are Strava heatmaps reliably indicative of general cycling activity?

Post by JimL »

Strava surely misses too many cyclists (and cycle use - utlity and some commuting ) to be a replacement for cycling surveys ( if that is the question) but obviously provides information of certain cycle usage ( the fitness oriented and maybe some commuting and touring). I suppose if there were large numbers of cyclists things may average out but the numbers are so few the details are important. In short I don't know.

The cycle traffic on the A77 I'm guessing refers (mainly) to the cycle traffic on the segregated cycle path that runs alongside, used by racers, families and long distance commuters, so the correlation will be good
Vorpal
Moderator
Posts: 20717
Joined: 19 Jan 2009, 3:34pm
Location: Not there ;)

Re: Are Strava heatmaps reliably indicative of general cycling activity?

Post by Vorpal »

geomannie wrote:1) I accept that the point about comparing disparate datasets. What I think I have shown is that at a gross level the datasets correlate well. In themselves they tell us nothing about who is cycling, men or women, but as far as I can tell, the Strava demographics and the demographics of the cycling poplation as a whole are similar. I can go further, the cycling officer in East Renfrewshire tells me that their own data show that men comprise 87% of the cycling population, women 13%. This is very close to the Strava demographic of 88:12. The sex problem is thus contained.
My point was that you cannot tell (or at least cannot demonstrate) that strava demographics and cycling demographic are similar. All you can show is that they have similar averages. You would need other data to demonstrate that they are similar. (ask you friendly local statistician)
geomannie wrote:2) You might say that the correlation for a road like the A77 is always going to be good, but without eveidence that would be a bold statement. In a park conversely, when cycling traffic is a few 10's of cyclists per day, the randomness of cycling activity will render it difficult to be sure if a true "average" has been captured over 2-3 days of monitoring. I would argue that Strava might be the better metric. Indeed, one cannot prove it isn't.
I dont' have have any evidence other than what you have posted, but it certainly seems likely, and furthermore, I think that survey data are more likely to be correct for an A road. I can discuss in detail the problems with how survey data are taken, and how usage levels are extrapolated, but I don't think that's necessary here. But the biggest problem, again, is that we need to know what 'average' is. Because numbers of cyclists flucuate a great deal according to the time of year. Strava captures this. Data available from DfT and loacl authorities estimate differences in a model that is not... error free.

IMO, a few 10s of cyclists here and there can add up to alot. We don't have any way to know how much, and using strava data will miss most or all of them. I'm not suggesting that you shouldn't do it, or use the correlation. Only that you should so with caution, especially when it come to local routes to shops and schools, and I think it important to note this.
“In some ways, it is easier to be a dissident, for then one is without responsibility.”
― Nelson Mandela, Long Walk to Freedom
Richard Fairhurst
Posts: 2035
Joined: 2 Mar 2008, 4:57pm
Location: Charlbury, Oxfordshire

Re: Are Strava heatmaps reliably indicative of general cycling activity?

Post by Richard Fairhurst »

Fascinated to read this as I've followed a few of your earlier postings on the subject, and traffic stats interest me greatly for cycle.travel's route-planning purposes.

The Strava heatmap has never particularly passed the "smell test" for me. There are rural roads round here that are remarkably popular with leisure cyclists, yet they don't show up any stronger than other surrounding routes. (Adlestrop to Moreton-in-Marsh is one example.)

The same applies in urban areas. Oxford is an interesting example. Cornmarket Street shows up much stronger than New Inn Hall Street, just to the west.That might seem unremarkable (Cornmarket is, after all, the main north-south street) except that Cornmarket is closed to bikes for most of each day, including the peak evening commute period. Queen Street is stronger still, and that too is closed during the day. You can find similar smell-test anomalies in Cambridge.

So in order to properly ground-truth the data, I think you need to look at figures for somewhere where everyday cycling is more than a trace activity. In Glasgow 1.6% of commuting is by bike; in Oxford it's ten times that. Where everyday cycling is marginalised, as in Glasgow (and, I'm guessing, East Renfrewshire still more so), then the few commuters are more likely to be confident cyclists who are happy on main roads and may well use Strava. In Oxford, however, cyclists will be on average less blasé about main road dangers, and less likely on average to use Strava.

On a technicality:

I wouldn't refer to "Strava and Other Public Domain Data Sources". In a machine-readable data context, "public domain" usually takes its US meaning of "free of copyright", and Strava Metro very definitely isn't that.
cycle.travel - maps, journey-planner, route guides and city guides
User avatar
geomannie
Posts: 1099
Joined: 13 May 2009, 6:07pm

Re: Are Strava heatmaps reliably indicative of general cycling activity?

Post by geomannie »

Thanks to everyone for their thoughts. You have made me think.

I very much take on board Vorpal's caution about statistical analysis. I certainly need to look at this. I am in the process of getting additional 2 years of Strava and survey count data so these will certainly help in removing uncertainty. Once I get these, I will entice my pet statistician to get involved.

I also take on board Vorpal's questioning of the cycle surveys. I have tracked down the reports specifying the methodology for the Glasgow data and am clear what is included. Reports for the East Renfrewshire surveys seem to be missing. I keep hunting.

With respect to Vorpal's comment that a few 10's of cyclists here and there add up to a lot, if you look at my 12th slide, I show that the 11 survey points of the main roads account for 80-85% of the cycling activity, both on Strava and on survey. On my dataset the survey points with <50 cyclists a day account for only about 15-20% of the cycle activity. Not insignificant, but relatively minor. Whether this is true more generally, it’s hard to be sure. I can infact do stats on the raw Strava data. I think I need to dig into this.

Wanlock Dod's point about lack of normal distribution is very valid. I had the Glasgow data a couple of years ago and noted that while the correlation was positive, the survey points were very skewed towards to quieter routes. I had parked the correlation as interesting but inconclusive. I received the East Renfrewshire data much more recently and was pleased to note that while it is somewhat skewed towards quieter locations, it included survey points over wider ranges of cycling activity. I plan to look at this further.

On mjr's cavil about park routes only having 10's of cyclists a day, I am well aware that some are much busier. Having said that, these are very much the exception.

meic raises an interesting point. Firstly swimmers and other Strava users are on different heat maps. I am only interested in cyclists. I am not suggesting that these heatmaps necessarily be used to justify making cycle provision solely along these routes, but if a local authority is to build hard infrastructure, it is imperative that they start with good models of the cycling status quo. These data will help planners to better answer the questions of where cyclists are starting from and where they are going to.

Thanks again and I look forward to continuing this study.

Cheers
geomannie
User avatar
geomannie
Posts: 1099
Joined: 13 May 2009, 6:07pm

Re: Are Strava heatmaps reliably indicative of general cycling activity?

Post by geomannie »

I wouldn't refer to "Strava and Other Public Domain Data Sources". In a machine-readable data context, "public domain" usually takes its US meaning of "free of copyright", and Strava Metro very definitely isn't that.


Strava Metro heatmap is public domain in that Strava put it in the public domain. "Public-facing source" may be a better description.
geomannie
User avatar
Si
Moderator
Posts: 15191
Joined: 5 Jan 2007, 7:37pm

Re: Are Strava heatmaps reliably indicative of general cycling activity?

Post by Si »

I will repeat my oft used anecdote.

In Birmingham we used to have lots of BC sky/HSBC rides - perhaps typically the sort of people likely to use strava?

We also have lots of rides put on by Community Cycle Clubs and BC's programme for getting people riding in deprived areas. And we gave away 5000 free bikes to people in deprived area which we could track.

We put all of the above on a map.....guess what: very little overlap between the two groups. This would tend to suggest that Strava is a bit middle class oriented.
User avatar
Mick F
Spambuster
Posts: 56366
Joined: 7 Jan 2007, 11:24am
Location: Tamar Valley, Cornwall

Re: Are Strava heatmaps reliably indicative of general cycling activity?

Post by Mick F »

meic wrote:I would be willing to accept Strava data being used on a "we have nothing better" basis.
True, but it's not very good anyway.
It cannot tell the whole truth, because there must be a million cyclists - like me - who have no intention of using Strava. Maybe most of those have never heard of it and I wouldn't have heard of it had I not been on an internet cycling forum.

I tried Strava once and didn't see that it gave me anything that I didn't already know, and it seems to me that it's aimed at the sporting competitive types of cyclist ..................... and not me in the slightest, so you won't see me on a Strava heat map ........... and a million others as well.
Mick F. Cornwall
Post Reply