Both of the underlying distributions (i.e. of Strava users and Cycle count survey locations) are inherently log-normal.
The sampling approach applied to both is considerably skewed toward the higher tail (as per Vorpal's figure)
Whilst this might not be good statistics does it actually matter for the purpose of the analysis? I suspect not because we are only really interested in the places where people cycle. The message from the Strava analysis is loud and clear and comes across in all of the different cities:
Strava data wrote:Build high quality routes along main roads
The uncertainty issue is probably one which can be addressed easily though because the Strava result is simply a minimum:
Total Cyclists = Strava Cyclists + Other Cyclists
The thing that is of interest once you have the Strava data (and have already built high quality routes along main roads) is how the value of Other Cyclists varies according to stuff like traffic density and Strava Cyclists.
So as well as using the Strava data to tell you to
Build high quality routes along main roads you could also use it to identify the best locations for installing automatic cycle counters at the best locations for validation of Strava data and quantification of the actual values of Other Cyclists at those locations.
There is probably a good reason why all the cars use the main road, I can't help wondering if it might be similar for cyclists.