Tuesday, February 25, 2014

More database updates, researching other papers

I updated some of the code for the database API, where it now supports getting tweets based on the "tweetId" field and the team name fields.

Actually, I may have not completely done it, but its like 99% there. I have to make sure I'm returning the object from the function.

I wanted to start working on our truth data in regards to scores. Apparently ESPN has APIs now, but they don't expose the one for scores, presumably because it costs them money and there are licensing issues and what not. Looking for other sources, I didn't find anything. There were some paid things, but that's not happening.

I found some interesting and somewhat on topic papers regarding Twitter analysis and live sports events. In particular, I've started reading a paper from 2011 by some people at Rice University. One of their basic mechanisms involves using a sliding window of Tweets. They detect when a new event occurs by measuring the rate at which the Tweets are coming in, comparing it to the beginning of the sliding window. I haven't finished reading the paper, but they also do some lexicon analysis I think. The same group also attempted to do some sentiment analysis as a follow on. This could all be useful.

Some links:
http://arxiv.org/pdf/1106.4300v1.pdf
http://ceur-ws.org/Vol-720/Zhao.pdf

As an aside, there doesn't seem to be anything out there that tries to turn Tweets into actual score information. Of course there's no guarantee to such information, but it would probably be pretty good if the system was designed with the right flexibilities and what not.

---
Tomer got a lot of the mechanics of the Tweet parser tool he found working. He's ready to start shoving stuff into the database. This led to some discussion about the database schema. Things like:

What if someone tweets about more than two teams?
No teams, only players?
etc

---
It looks like we'll do our presentations on Thursday.

No comments:

Post a Comment