Tuesday, March 4, 2014

Current design strategy

Get the rooting interest of the person who Tweeted.

Get the most frequently referenced team names, player names, etc.
Those are strong indicators of passion for a team.

Word associations
Lakers - Kobe Bryant
Lakers - Lakeshow
Lakers - lake show
Lakers - showtime
etc.

Some of these are basically synonyms, but that might not be worth exploring.

---
Another way to get a a list of good users is the followers of certain Twitter accounts like @NFL, @NBA, @Lakers, etc.

---
Algorithm:

For each team:
Finding the keywords that are related to a team
Team name, acronym, nicknames
Also get the most commonly associated keywords
(which might include player names)

For the followers of each team (like @Lakers, @Clippers, etc)
Find the followers to mention the keywords the most (maybe like 100)
   For each follower
   Get a collection of tweets
      Perhaps a certain number
      Perhaps a collection of tweets around the time of a game event
      see how many of the keywords associated with each team they have
     
      Have a confidence score for each team that this person might root for.

Collect the highest scoring individuals for a certain team
   Manually read their Twitter feeds, determine if they are a fan of the team or not.

This generates a statistic: how many of the people our system found were actually fans of the given team?

Then, to compare:
From the followers of a certain team, randomly pick users.
Manually inspect these users, determine if they are a fan of the team they follow.

Get statistic for this set.

Compare statistics.

No comments:

Post a Comment