iNZight into Baseball Nov. 10th, 1:00-2:00pm, TLC
Joshua Korenblat
Assistant Professor of Graphic Design

Want to celebrate the World Series? Interested in learning how to make graphs and charts?

Come learn how to use iNZight lite and Google Sheets to analyze baseball statistics, including batting and pitching data, from 1871-2015.


 

 

What we’ll learn today

  • Today we’ll  learn how to survey, sift, summarize, sort, filter,  and visualize your data.
  •  We’ll learn what things to look out for during the visual analysis process of histograms, boxplots and scatterplots/ bubbleplots: two visualization types that rely upon position in space to help us compare distributions and variables with greater nuance and clarity.
  • We’ll learn how to leverage the power of free tools: Google Sheets and iNZight.
  • We’ll learn how to export your visualizations into art and design applications, such as Adobe Illustrator, for further crafting, creating a delightfully informative read.

Why Baseball?

Recently, I ran across a news article listing the world’s largest recorded human gatherings.

A pilgrimage in Northern India summoned ~30 million people, and Mecca is always a destination.

screen-shot-2016-11-07-at-8-12-13-pm

Looking around, I noticed a few surprise entries, notably a Rod Stewart Concert in Rio (~4 million people)
and the 2016 World Series Celebration for the Chicago Cubs.

screen-shot-2016-11-07-at-12-01-04-am

Here are the breakdowns. Each person = 1 million people.

screen-shot-2016-11-07-at-10-06-23-pm

Why was the Chicago Cubs World Series such a big deal? Well, this was the last Cubs World Series Championship team:

1908-chicago-cubs-wikimedia-george-r-lawrence-2

The Cubs hadn’t won a World Series in 108 years. To quote some stage direction from Shakespeare, “Exit, pursued by bear…”

Baseball thrives on statistics. 108 years parallels the 108 stitches in a baseball. There’s a poem in that idea.

Although we’re using baseball data here, this simply serves as an analogy for your data: any collection of information with many observations, and many variables (categories) within those observations.

Let’s take a look at data from baseball history, find out more about that 1908 season, and create some visualizations using Google Sheets and iNZight, a free package developed at the University of Auckland that runs through R, the world’s most popular open source stats application. No worries: no coding required!


 

Resources for this workshop

iNZight

iNZight Lite

Lanham Baseball Database

Baseball Reference 

Retrosheet


To follow along with the data, use Google Chrome for your browser. Go here and make sure to log into Google Sheets —>
iNZight into Baseball

 

Go to File —> Make A Copy to bring this into your own Google Drive.


TeamsFranchises: First, let’s explore the first worksheet. Check out the Explore button in the lower right corner. We can find a lot of information about our data simply by clicking in a cell on a worksheet.

To get more specific data, let’s write a COUNTIF formula in some empty cells.

Teams: Let’s make a Pivot Table. Click on the corner between A & 1. Then go to Data —> Pivot Table

screen-shot-2016-11-07-at-10-47-06-pm

 

 

 

 

 

For Rows,  select name. For values, select  W and L. Filter the year to 1908.

screen-shot-2016-11-07-at-10-59-17-pm

Now, select the data frame this creates. Copy. Create a new worksheet. Go to Paste —> Paste Special —> Paste Values Only.

Let’s make a chart from this data. Highlight the data you want to chart. Then go to Insert Chart.

screen-shot-2016-11-07-at-11-05-06-pm

Now, here’s a trick to download your chart for further customization in Adobe Illustrator or Inkscape.

Make sure you are using Google Chrome for your browser. Go here: http://nytimes.github.io/svg-crowbar/

Once SVG Crowbar is installed. You can download your chart to your favorite art and design application.

baseball-teams

screen-shot-2016-11-07-at-11-10-44-pm

Feel free to experiment with the other 1908 tabs. See if you can explore them and create meaningful charts.


Hall of Fame Batters

Go to File —> Make a Copy

Download the worksheet as a CSV file.

Go to Inzight Litehttp://lite.docker.stat.auckland.ac.nz/

Go to File —> Import Dataset

*Histogram: See distribution of a single variable: Count of players per era.
*Compare distributions of Home Runs per Era: this changes to a dotplot, with a boxplot underneath

scannable-document-on-aug-5-2015-4_45_11-pm

 

screen-shot-2016-11-07-at-11-24-44-pm

*Code more variables (colors and style)
*Advanced —> Explore: Let iNZight give you a tour!


What can’t iNZight Lite do? It’s just not quite as robust as running it as package through R. I use R Studio, an  IDE for the R language.

First, download iNZight to your desktop: https://www.stat.auckland.ac.nz/~wild/iNZight/getinzight.php

Follow the installation instructions.

Next, open up R Studio. Create a new script. Copy-and-paste this text into the script editor:


 

install.packages(c(“vit”, “iNZightMR”, “iNZightTS”, “iNZightModules”, “iNZightRegression”, “iNZightPlots”, “iNZight”, “iNZightTools”),
                 repos = c(“http://r.docker.stat.auckland.ac.nz/R”,
                           “http://cran.stat.auckland.ac.nz”))    # or your preferred CRAN Mirror
library(vit)
iNZightVIT()


screen-shot-2016-11-07-at-11-32-36-pm
screen-shot-2016-11-07-at-11-32-13-pm

Today we learned how to survey, sift, summarize, sort, filter,  and visualize your data.

We learned what things to look out for during the visual analysis process of boxplots and scatterplots/ bubbleplots.

We learned how to leverage the power of free tools: Google Sheets and iNZight.

We learned how to export your visualizations into art and design applications, such as Adobe Illustrator, for further crafting, creating a delightfully informative read.

Thank you!


Questions?