Joshua Korenblat Lightning Talk
Read more about the Python library we used.
Read more about the Python library we used.
iNZight into Baseball Nov. 10th, 1:00-2:00pm, TLC
Assistant Professor of Graphic Design
Want to celebrate the World Series? Interested in learning how to make graphs and charts?
Come learn how to use iNZight lite and Google Sheets to analyze baseball statistics, including batting and pitching data, from 1871-2015.
Recently, I ran across a news article listing the world’s largest recorded human gatherings.
A pilgrimage in Northern India summoned ~30 million people, and Mecca is always a destination.
Looking around, I noticed a few surprise entries, notably a Rod Stewart Concert in Rio (~4 million people)
and the 2016 World Series Celebration for the Chicago Cubs.
Here are the breakdowns. Each person = 1 million people.
Why was the Chicago Cubs World Series such a big deal? Well, this was the last Cubs World Series Championship team:
The Cubs hadn’t won a World Series in 108 years. To quote some stage direction from Shakespeare, “Exit, pursued by bear…”
Baseball thrives on statistics. 108 years parallels the 108 stitches in a baseball. There’s a poem in that idea.
Although we’re using baseball data here, this simply serves as an analogy for your data: any collection of information with many observations, and many variables (categories) within those observations.
Let’s take a look at data from baseball history, find out more about that 1908 season, and create some visualizations using Google Sheets and iNZight, a free package developed at the University of Auckland that runs through R, the world’s most popular open source stats application. No worries: no coding required!
To follow along with the data, use Google Chrome for your browser. Go here and make sure to log into Google Sheets —>
iNZight into Baseball
Go to File —> Make A Copy to bring this into your own Google Drive.
TeamsFranchises: First, let’s explore the first worksheet. Check out the Explore button in the lower right corner. We can find a lot of information about our data simply by clicking in a cell on a worksheet.
To get more specific data, let’s write a COUNTIF formula in some empty cells.
Teams: Let’s make a Pivot Table. Click on the corner between A & 1. Then go to Data —> Pivot Table
For Rows, select name. For values, select W and L. Filter the year to 1908.
Now, select the data frame this creates. Copy. Create a new worksheet. Go to Paste —> Paste Special —> Paste Values Only.
Let’s make a chart from this data. Highlight the data you want to chart. Then go to Insert Chart.
Now, here’s a trick to download your chart for further customization in Adobe Illustrator or Inkscape.
Make sure you are using Google Chrome for your browser. Go here: http://nytimes.github.io/svg-crowbar/
Once SVG Crowbar is installed. You can download your chart to your favorite art and design application.
Feel free to experiment with the other 1908 tabs. See if you can explore them and create meaningful charts.
Go to File —> Make a Copy
Download the worksheet as a CSV file.
Go to Inzight Lite: http://lite.docker.stat.auckland.ac.nz/
Go to File —> Import Dataset
*Histogram: See distribution of a single variable: Count of players per era.
*Compare distributions of Home Runs per Era: this changes to a dotplot, with a boxplot underneath
*Code more variables (colors and style)
*Advanced —> Explore: Let iNZight give you a tour!
What can’t iNZight Lite do? It’s just not quite as robust as running it as package through R. I use R Studio, an IDE for the R language.
First, download iNZight to your desktop: https://www.stat.auckland.ac.nz/~wild/iNZight/getinzight.php
Follow the installation instructions.
Next, open up R Studio. Create a new script. Copy-and-paste this text into the script editor:
Today we learned how to survey, sift, summarize, sort, filter, and visualize your data.
We learned how to leverage the power of free tools: Google Sheets and iNZight.
We learned how to export your visualizations into art and design applications, such as Adobe Illustrator, for further crafting, creating a delightfully informative read.
1. go to drive.google.com
2. Go to the red NEW button
3. Click on it, go down to More > Connect to Apps
4. Search for “Fusion Tables”
5. Then follow the same steps, but this time you will see Fusion Tables added to your list of apps.
6. You can search public datasets here and export them to Google Sheets.
7. Search for “Cherry blossoms Japan”
8. The first table The Bloom of Cherry Blossoms 2016 that comes up looks good. These tables are often Wikipedia, so of course, you’ll need to verify what you scrape is okay to use for academic work.
9. Export this table to Google Sheets.
10. To do this, go to the row number and right-click on it, and select Hide row.
11. Next, control-click on row 2 and insert 1 row above it. This will be the header row.
12. Label your headers: City; First bloom; Full bloom
13. Next, we need to Split our City column, which also has in it the Prefecture for Japan.
14. Control-click on Column A and Insert three Columns to the right of it.
15. In cell C3, type in “=split(A2,“(“, TRUE)”
16.Select this cell. Grab the blue handle in the lower corner. Double click on it or drag down to copy and apply the formula conditionally.
17. Next, go to Edit > Find and Replace and find all of the )’s. Replace with nothing.
18. Copy-and-paste Cols. B through C into Col. I and Paste Special > Paste Values Only. Delete Col. A.
19. Select the pasted elements and move them into B through C.
20. Now we need to add a Column that tells us the time span for the peak bloom, and another one that gives us the day of the first bloom, from 1 to 365, so we can measure that against latitude (the question is, do flowers first bloom later in northern climates)?
21. Hide rows that don’t have values in them for our key variables. You can always fill these in later with more research.
20. Select the City Col. Go to Add-Ons > Get Add Ons > Search for Awesome Geocode. Use this Add-on to get Latitude and Longtitude data. Select the City Col. and run the Awesome Geocode.
These steps are getting pretty detailed. Fortunately, you can find the formulas pre-written for you on the spreadsheet. You can try writing your own formula and copying it on adjacent empty columns. Follow along with me for the rest as make a chart, merge it with more data about the Japanese prefectures in Fusion Tables, and then bring this into Tableau to do things we can’t do in Google Sheets.
Tableau workbook: http://tabsoft.co/1VX51rG
Tableau workbook: http://tabsoft.co/1SOgBSo
Mohonk Preserve Weather
Highest Grossing Films