Language and thought project: February 2014

Thursday, February 27, 2014

We presented today

It went pretty well. I didn't get asked any tough questions regarding the Watson presentation.

Dyer was less happy about our project presentation though. He was especially upset when we mentioned "sentiment analysis", saying that it was low hanging fruit that was easy and not interesting. Generally I tend to agree. I figured we'd use the sentiment analysis as one aspect of our system.

Sentiment - happy
Likely rooting interest - winning
Increases evidence of rooting interest

Sentiment - happy
Likely rooting interest - losing
Sarcasm?
Team is tanking? (losing on purpose for better draft picks next year)

Tuesday, February 25, 2014

More database updates, researching other papers

I updated some of the code for the database API, where it now supports getting tweets based on the "tweetId" field and the team name fields.

Actually, I may have not completely done it, but its like 99% there. I have to make sure I'm returning the object from the function.

I wanted to start working on our truth data in regards to scores. Apparently ESPN has APIs now, but they don't expose the one for scores, presumably because it costs them money and there are licensing issues and what not. Looking for other sources, I didn't find anything. There were some paid things, but that's not happening.

I found some interesting and somewhat on topic papers regarding Twitter analysis and live sports events. In particular, I've started reading a paper from 2011 by some people at Rice University. One of their basic mechanisms involves using a sliding window of Tweets. They detect when a new event occurs by measuring the rate at which the Tweets are coming in, comparing it to the beginning of the sliding window. I haven't finished reading the paper, but they also do some lexicon analysis I think. The same group also attempted to do some sentiment analysis as a follow on. This could all be useful.

Some links:
http://arxiv.org/pdf/1106.4300v1.pdf
http://ceur-ws.org/Vol-720/Zhao.pdf

As an aside, there doesn't seem to be anything out there that tries to turn Tweets into actual score information. Of course there's no guarantee to such information, but it would probably be pretty good if the system was designed with the right flexibilities and what not.

---
Tomer got a lot of the mechanics of the Tweet parser tool he found working. He's ready to start shoving stuff into the database. This led to some discussion about the database schema. Things like:

What if someone tweets about more than two teams?
No teams, only players?
etc

---
It looks like we'll do our presentations on Thursday.

Friday, February 21, 2014

Lots of news to cover

We switched from twitter-python to twython. It worked out of the box, and Tomer quickly generated a sample script to use the API. This was around Super Bowl time, so it was fun to look at tweets making fun of the Broncos.

I created a word frequency counter fairly trivially. At the moment it might not be the most useful thing.

We also have a text file with all the NBA team names, their cities, the three letter acronym, etc. It's a good start for searching for relevant tweets.

We've put all relevant code on my github page:
https://github.com/lawrencechang/twitter-events

Possibly worried because I have my Twitter API keys in the files, just plain text. I might move them to a separate file, and not upload those files.

Presently I'm working on creating a database (using python and sqlite) schema and API to store the putative facts from the tweets. For example, if a tweet talks about the Lakers winning, they'll get an entry into the table. Later analysis will compare these "facts" to real truth data. By verifying or refuting, we hope to glean some information about the author of the tweet.

In terms of creating a middleware style API, I'm having a difficulty with figuring out the right way to do it. I want to make manipulating the database as brainless as possible. I have a create and delete function, which create or delete pre-defined table (called Tweets) from a database of your choice (by default, default.db, which is a file that'll be saved to your working directory). However, once you've already created and started working with you database and table, I'm not sure how best to "connect" to it again, with my API level. My next blog post will probably talk about what I did.

I imagine truth data being gathered in two ways. First, and most obvious, is to use a reputable source like ESPN or NBA or Yahoo, scraping their sites. The other idea was to do sort of a popularity regression on the facts table. The more popular a fact is, the more likely it is to be true. Hopefully.

Sunday, February 2, 2014

Starting off

Tomer and I are working on a project that will take Twitter feeds and extract event information from them.

We decided to use one of many existing Twitter API wrappers for python, this one called python-twitter: https://github.com/bear/python-twitter

The instructions described installation using pip, which I've heard of before. I decided to suck it up and go through linux-installing-stuff hell and see how it went. As a side note, I'm using UCLA's SEAS linux server, so I don't have admin privileges to install whatever I want wherever I want. This is a good restriction and provides a nice learning opportunity about using linux systems the right way.

I downloaded pip from here: https://pypi.python.org/pypi/pip#downloads . Because I couldn't just install pip on the global packages location, I had to install it in my local user account. This article helped explain it, particularly the section about using the --user option: http://docs.python.org/2/install/

I think the command I ran was something like this (in the pip folder):

python setup.py install --user

(I might have had to build first, python setup.py build)

It put itself in my home directory, under .local/bin. I added this to my path in my .profile file.

export PATH=~/.local/bin/:$PATH

I was able to run the instructions on python-twitter's page, which talked about installing the necessary software using a requirements.txt file. When doing this, I also added the --user option.

Finally, I built and installed python-twitter, using the --user option of course.

I ran both the test scripts. Most of the tests seem to output an ERROR or FAIL. One test does seem to pass, namely GetStatus. Test also takes over 6 minutes.

In addition, I'm unable to follow the simple API calls. I'm getting errors that read "AttributeError: 'Api' object has no attribute '_Api__auth'. Do I need to enter some credentials before using this?