Planet Banshee: from the great minds of our community

October 13, 2016

Importing Team Data into NFLPool

Last weekend I discovered how to pretty print the five JSON files I get from MySportsFeeds. This was helpful to understand just how much data is nested within each file. I also spent a good chunk of the weekend writing in a notebook. I mostly did some data modeling on what each table in the database should store and what their primary keys would be. I also captured things I need to research and started breaking the project into chunks. As I tweeted out over the weekend:

Monday was a holiday so I did the first four courses of Python Jumpstart. I took a break and went back to the JSON files I had worked with. My goal was to build with what should be the easiest table and pull the team data out. This is a dictionary that includes the team name (Texans), city (Houston), abbreviation (HOU) and id (64). The ID number is supplied in the JSON feed and is unique, so I will use that as the primary key. There will be two more columns in the table for conference and division, but I wanted to deal with that later.

I wrote a for loop to try and pull out each team’s information. I quickly got stuck and nothing was working. At one point, the loop I had written worked, but only pulled out the data for the first ranked team. I showed my wife my code and she pointed out that it wasn’t iterating in a loop.

I was stuck for two nights working on this after dinner. I finally stepped back and modified my pretty print Python program and started breaking down all of the information in the JSON file again. I figured out what was a list and what was a dictionary and what was nested where. (It looks like I didn’t commit this to the git repo, oops! Will have to fix that.)

After doing this last night, I found the list I needed to work with. I then re-wrote my for loop and I was able to iterate through all 16 teams in the AFC:

for afc_team_list in teamlist:

afc_team_name = data["conferenceteamstandings"]["conference"][0]["teamentry"][x]["team"]["Name"]

afc_team_city = data["conferenceteamstandings"]["conference"][0]["teamentry"][x]["team"]["City"]

afc_team_id = data["conferenceteamstandings"]["conference"][0]["teamentry"][x]["team"]["ID"]

afc_team_abbr = data["conferenceteamstandings"]["conference"][0]["teamentry"][x]["team"]["Abbreviation"]


x = x + 1

I then copied and pasted and did it again for the NFC. I did try, unsuccessfully, to modify the conference list – “conference”0 – so I could just write one for loop instead of one for each of the two conferences. But it was working, so I’ll leave it for now. (I’m sure my code is ugly, but hey, I’m just starting).

After that it was all about writing the SQL insert statements to put this into a SQLite3 database. (For now, later it will go into MySQL). That took me a an hour, but at the end, I got it working and was even able to add the conference name to each row.

Next up, I need to take the data in the Division standings JSON file. In it is stored the division name for each division in a conference: AFC/AFC-East. I’ll need to write a for loop to grab it, slice it to remove the “AFC/“ and then stick that in the Division field for each team in the Teams table. I’ll also need to stop dropping and re-creating the table each time I insert data, but it’s working.


October 10, 2016

Building the NFLPool webapp – Starting with JSON

I’m glad I started with the Python for Everybody specialization at Coursera before jumping into Python Jumpstart by building 10 Python Apps by Michael Kennedy. Mr. Kennedy moves fast. I’ve completed the first four apps and it’s good to get a refresher on the information I learned in Python for Everybody.

I also spent part of the weekend sketching in a notebook. I did some brainstorming about the database design I’ll need for NFLPool. I learned one of the bigger differences between MySQL and Postgresql is that MySQL does not have the ability to use foreign keys but MySQL is much faster. The lack of foreign keys may make the design a bit tougher, but more on that later in a different blog post.

I also sketched out some ideas for the functions I’m going to need to write so I’m not writing the same bit of code over and over again. From there, I created a to-do list of things to start working through. I find this whole process of building an app overwhelming. I never thought I’d be using paper and pencil so much, but I’ve found it helpful to break this into smaller chunks and attack them one at a time.

Then I started working on the import process for the JSON. This quickly derailed as I realized just how many stats MySportsFeeds captures from an NFL game. That quickly turned in to writing a JSON pretty print statement so I could see how the five different JSON files nested their dictionaries.

I currently download five JSON files every Tuesday via a cron job with all the statistics. I know my app won’t be ready for the 2016 season and my hope is by having 17 weeks of data, I can re-create the season to test my app to make sure it’s scoring each player correctly as we move through the season week by week. When I download the JSON via curl, it includes all the web headers, such as:

HTTP/1.1 200 OK
Date: Wed, 21 Sep 2016 12:16:07 GMT
Server: Apache-Coyote/1.1
Cache-Control: must-revalidate, no-store, s-maxage=0, max-age=0, private
Access-Control-Allow-Headers: Origin, Content-Type, Accept, Accept-Encoding, Accept-Language, Authorization
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
Content-Encoding: gzip
Access-Control-Allow-Methods: GET, OPTIONS
Content-Type: application/json
Set-Cookie: JSESSIONID=B7548F2309747418749B5421282A5E08; Path=/leaguemanager-web/; HttpOnly
Vary: User-Agent
Connection: close
Transfer-Encoding: chunked

And then the JSON starts right after that with curly braces. I was proud of myself as I wrote an if statement to load the file, read the lines, and load the JSON when finding the curly braces. Then I wrote code to first print out all the statistics categories (commented out below) and pretty print all the JSON:

import json
import pprint
import os

#Open the JSON file that includes headers

#Change the name of the file to open to match the query below:
with open('json/20160921-division-team-standings.json') as file:
    alltext = file.readlines()  #Put each line into a list

# division-team-standings.json
for lines in alltext:
    if lines.startswith('{'):
        rawdata = lines
        data = json.loads(rawdata)
#        for stat_categories in data["divisionteamstandings"]["division"][0]["teamentry"][0]["stats"]:
#            pprint.pprint(stat_categories)   #Print all the categories in "stats"
        pprint.pprint(data)  #Print the JSON

I had five files to review and I just manually changed the code to the file I wanted and had a code block for each of the files. I know I probably should have just wrote a function, but I was in the zone. (My code probably isn’t very Pythonic either, but I have to start somewhere on this journey). I also know that when it comes time to build the real app I’ll be loading the JSON across the network and not from a local file, but future Paul gets to deal with that.

I also spent some time playing around with the nflgame and mlbgame Python modules. I need to spend some more time with them and I’ll share some thoughts on those in another blog post.

October 06, 2016

Next class up: Python Jumpstart by Building 10 Apps

I’ve completed the Python For Everybody course taught by Dr. Charles Severance at the University of Michigan on Coursera. All that’s left is the capstone project to put into practice what I’ve learned, but as I’m doing this to learn Python and not for the official certificate, I’m going to skip it. The course is taught in Python 2.7 and I want to shift to Python 3.x.

Python For Everybody was great. The pace and the exercises were perfect for the class. I wish I had realized sooner that there were additional exercises in the textbook that were not part of the required Coursera class. The fourth class, Python and Databases, was intense. The speed of the class was accelerated with teaching you SQL and how Python connects to databases (SQLlite specifically). The homework was much more simple in Python for Databases compared to the first three sessions. You usually had to only make some minor changes in the SQL syntax to complete the grade.

The two things I’m going to need to focus on to have success in building the two apps I want to build are dictionaries (from importing statistics via JSON) and databases. If I walked away from one thing from the Python for Databases class is that I’m going to need to spend some time with paper and pencil and plan my information architecture and database models if I’m going to be successful.

Next I’m going to start Python Jumpstart by Building 10 Apps by Michael Kennedy of the Talk Python podcast. I supported the Kickstarter earlier this year and am excited now that I hope I have enough of a base understanding of Python to tackle this. This will be taught in Python 3.x (yay!) and I’m hoping now that I have that base knowledge, building these apps along with the tutorials included will give the practice I need to later build a real app. It’s also going to go into a little more detail than what I’ve learned so far on list comprehension (which makes my head hurt), BeautifulSoup for web scraping, and Classes.

I also supported Mr. Kennedy’s next Kickstarter, Python for Entrepreneurs. This also has me excited as the second phase of building my fantasy sports app will be deploying it on the web. The description looks perfect for what I’ll need, in addition to learning the web framework Pyramid:

You will learn to build and design your web app

This course will teach you how to build a data-driven web application in Python.

We will:

• Build our web app with the Pyramid web framework, "the Python web framework that supports your decisions, by artisans for artisans."

Create and connect to our database using SQLAlchemy, the most popular data access layer in Python

• Learn the core elements of web design including CSS and front-end frameworks such as Bootstrap.

Time to get to work.

October 03, 2016

Modernizing blam's autotools (or shaving the yak to move out from GoogleReader...)

Before focusing my spare time completely on the GSoC* (as I have mentoring responsibilities this year \o/ ), I wanted to solve a problem that cannot wait after July...

Yes, I've been victim of Google's cuts too... And I was wondering, where should I move? Feedly? ThingyBob? Well, I shouldn't make the same mistake twice, right?

Actually, some time ago I was using a desktop app to avoid relying on software that I cannot control (yes, vendor lock-in, the most important thing that open source tries to solve, right?): Thunderbird. But somehow the convenience of a web app (that I can access from any computer) and the hassle of using my mail client for RSS reading made me move to the web.

I should be able to find a replacement that no company or individual can "take down", and which feels less clunky than Thunderbird for reading RSS. So, enter blam (in the future I'll figure out how to sync its state between computers, maybe using SparkleShare?, to achieve that same convenience that a web-app provides), that Gnome app that has strangely managed to not catch my eye until now...

Well, maybe because if I install it from debian sid and I try to import my very first RSS feed from my GoogleReader list it doesn't work? Well, apparently it is a bug that is already fixed upstream, thanks to Carlos which has modernized the way that the program deals with XML and serialization.

Then I went ahead and tried to compile master myself... and guess what, the execution fails. Here the yak shaving begins, when I feel like this when trying to fix the autotools stuff:

Fortunately, after some tinkering (and some copy&paste from banshee's build scripts), I managed to fix the problem, and also modernized a bit some things (like using the brand new ".ac" extension instead of ".in" for the configure script, or using properly the AC_INIT and AM_AUTOMAKE_INIT macros,...).

Anyway, the real thing to highlight here is that while I was fixing this stuff and pushing to the repository...

... I saw some really good stuff committed by Carlos: using the new .NET 4.5 C# async patterns to get rid of those ugly callbacks! Kudos to him.

And if you're willing to help more with our autotools housekeeping, please do, I still feel this is way too long and needs some ironing.

* And if you're wondering what's up with GSoC (aka Google Summer of Code):

  • I had Nicholas Little lined up to work on Rygel+Banshee integration, but sadly he couldn't apply due to work commitments (hopefully he will still work with me on it in his spare time).
  • I had Rashid Khan lined up to work on Cydin+Banshee integration, but sadly there were not enough GSoC spots for him :( (fortuntately he told me he still wanted to work on it with me in his spare time).
  • I had Tomasz Maczyński lined up to work on Banshee integration with more REST APIs, and fortunately he was selected! So expect some nice FanArt.TV and SongKick plugins soon!

September 30, 2016

Web Scraping and Python

I’m flying along in the Coursera course Python for Everybody, from the University of Michigan taught by Dr. Charles Severance. I’ve completed the first two of four courses which give you an introduction to Python.

I’m now on the third course, Using Python to Access Web Data. This and the fourth course focused on databases, are the two key foundations for the web app I want to build. I just finished Chapter 12, which introduces the BeautifulSoup library for scraping web pages. This is going to be huge – I’ll be able to scrape ESPN to find which MLB or NFL teams lead their divisions or leading in the wild card races.

Being on vacation this week, I’ve been able to complete a few chapters and am now a couple weeks ahead of schedule. I’m tempted to pause and see if I can take what I’ve learned with BeautifulSoup and actually write some small Python programs to actually scrape and print the results. It might be good practice to reinforce what I’ve learned.

The next two chapters are key as well. XML and then the one I’m most looking forward to: JSON. I’ve already signed up for a developer account with MySportsFeeds and am receiving JSON data for player stats, teams and conference standings. I’ve spoken in the past with one of their lead developers and they don’t currently keep statistics for wildcard or playoff standings, so I’m going to need to use BeautifulSoup in my app to get those. I’ll also need to make a decision if I’m going to use that JSON data for player stats and query against it myself or just use the nflgame or nfldb libraries that have already been built. The biggest challenge their is that both of those libraries are written in Python 2.7 and I really want to write my apps in Python 3.x.

I know I’m getting ahead of myself. Every time I learn something that will be applicable to the app I want to build and I talk to my wife about it, she tells me to slow down. My mind is always racing with how I can apply what I’m learning and how it will affect the architecture of the app. Some people say the best way to learn a programming language is to build something and learn as you go. I can’t wait to put all this Python learning to practice.