We switched from twitter-python to twython. It worked out of the box, and Tomer quickly generated a sample script to use the API. This was around Super Bowl time, so it was fun to look at tweets making fun of the Broncos.
I created a word frequency counter fairly trivially. At the moment it might not be the most useful thing.
We also have a text file with all the NBA team names, their cities, the three letter acronym, etc. It's a good start for searching for relevant tweets.
We've put all relevant code on my github page:
https://github.com/lawrencechang/twitter-events
Possibly worried because I have my Twitter API keys in the files, just plain text. I might move them to a separate file, and not upload those files.
Presently I'm working on creating a database (using python and sqlite) schema and API to store the putative facts from the tweets. For example, if a tweet talks about the Lakers winning, they'll get an entry into the table. Later analysis will compare these "facts" to real truth data. By verifying or refuting, we hope to glean some information about the author of the tweet.
In terms of creating a middleware style API, I'm having a difficulty with figuring out the right way to do it. I want to make manipulating the database as brainless as possible. I have a create and delete function, which create or delete pre-defined table (called Tweets) from a database of your choice (by default, default.db, which is a file that'll be saved to your working directory). However, once you've already created and started working with you database and table, I'm not sure how best to "connect" to it again, with my API level. My next blog post will probably talk about what I did.
I imagine truth data being gathered in two ways. First, and most obvious, is to use a reputable source like ESPN or NBA or Yahoo, scraping their sites. The other idea was to do sort of a popularity regression on the facts table. The more popular a fact is, the more likely it is to be true. Hopefully.
No comments:
Post a Comment