Today, I spent all day “Hacking with Cloudera on CDH3” at the Cloudera Hackathon understanding people’s location-based activities using Yelp, Foursquare, and Twitter. By analyzing the data algorithmically, I got strikingly similar results to those shown on Going.com, which are based on user-generated and hand-moderated content. I was able to retrieve more data regarding users’ whereabouts each day, even hour. Here are my results:
As expected, most people still go to the park over the weekend. Does this mean that unemployment isn't that bad, considering that no one has time for nature Monday thru Friday?
People generally like the park during the middle of the day. Once again -- expected.
Ahah! People start their weeks going to the gym... and then they get collectively lazier.
Surprisingly enough, Thursday is the most popular day for eating out.
More people travel on Friday, Sunday and Monday more so than on Wednesday and Saturday.
While some people hit the bars early at noon, most people go at night...and then to a lounge.
Remember all those people who gave up on the gym towards the end of the week? They're at the bars.
Note (reply to drc1912): The data is from March 15th to May 1st (~45 days). It is a noisy estimation based on foursquare and twitter location check-ins (estimated samples from a population of about 100k users in San Francisco, CA with a technology bias). I haven’t analyzed user-specific data, such as who went to both venues. I am open to suggestions on future analysis.
Wooha, this is amazing. Forgive my ignorance, but how is it possible I have never heard of a business that uses this kind of information to suggest people less crowded places. Or charges food chains (Starbucks, McDonald, etc) for intelligence about places where masses of people are at lunch/dinner time with a low restaurant density. I think there could be a lot of money in this kind of data…
Very interesting. I would really like to see what time people are going to the gyms. I’m thinking about hitting the gym again but hate having to stand in line for equipment. Knowing what times are the least busy would really help. Thanks.
Regarding the unemployment comment based on checkins at the park, I’d like to remind the author that the population sample is basically the subset of the population who owns a smartphone and is actively paying their contract.
[…] User-mining: Start at the gym, end up at a bar? Instead of sitting at my desk, I spent all day “Hacking with Cloudera on CDH3″ at the Cloudera Hackathon […] […]
Note (reply to drc1912): The data is from March 15th to May 1st (~45 days). It is a noisy estimation based on foursquare and twitter location check-ins (estimated samples from a population of about 100k users in San Francisco, CA with a technology bias). I haven’t analyzed user-specific data, such as who went to both venues. I am open to suggestions on future analysis.
Hey Ryan,
pretty interesting. Are you thinking of open sourcing the code to this? I would love to see the implementation.
I’m working on something similar for the Japan market.
Cheers for the cool post
Wooha, this is amazing. Forgive my ignorance, but how is it possible I have never heard of a business that uses this kind of information to suggest people less crowded places. Or charges food chains (Starbucks, McDonald, etc) for intelligence about places where masses of people are at lunch/dinner time with a low restaurant density. I think there could be a lot of money in this kind of data…
@ノートパソコン購入: That’s what Business Intelligence (BI) is all about. It’s a huge field and that’s pretty much all they do (and much more).
Great post though.
Is the underlying data publicly available? I’d love to try some queries against the raw data.
Very interesting work.
Very interesting. I would really like to see what time people are going to the gyms. I’m thinking about hitting the gym again but hate having to stand in line for equipment. Knowing what times are the least busy would really help. Thanks.
Hi Todd,
I just added the graph, it’s pretty interesting. https://simplyryan.files.wordpress.com/2010/07/gymhour.png.
-R
Regarding the unemployment comment based on checkins at the park, I’d like to remind the author that the population sample is basically the subset of the population who owns a smartphone and is actively paying their contract.
So, very true.
-R
Very good point
Is this data publically available somewhere? I’d like to do some digging myself.
Hi John,
I aggregated the data myself by scanning twitter periodically and matching the information with foursquare and yelp.
-R
Can you provide the dataset and the code?
Thanks for compiling, found the restaurant one the most surprising.
Cheers
I too was surprised by the restaurant data…
I think the restaurants peaking out on Thursday is people tweeting about their weekend plans, not so much actually eating out on Thursday.
True?
The data shows that they are actually checking in on Thursday night.
-R
[…] User-mining: Start at the gym, end up at a bar? Instead of sitting at my desk, I spent all day “Hacking with Cloudera on CDH3″ at the Cloudera Hackathon […] […]
Very nice analyses.
wow, nice analysis. Great feedback, too.