### Joshua Korenblat Lightning Talk

Example of a Victorian poem from the collection

Example of a Modernist poem from the collection

Sources:

Oxford Book of English Verse, Project Gutenberg

Some Imagist Poets, Project Gutenberg

iNZight into Baseball Nov. 10th, 1:00-2:00pm, TLC
Joshua Korenblat
Assistant Professor of Graphic Design

Want to celebrate the World Series? Interested in learning how to make graphs and charts?

Come learn how to use iNZight lite and Google Sheets to analyze baseball statistics, including batting and pitching data, from 1871-2015.

### What we’ll learn today

• Today we’ll  learn how to survey, sift, summarize, sort, filter,  and visualize your data.
•  We’ll learn what things to look out for during the visual analysis process of histograms, boxplots and scatterplots/ bubbleplots: two visualization types that rely upon position in space to help us compare distributions and variables with greater nuance and clarity.
• We’ll learn how to leverage the power of free tools: Google Sheets and iNZight.
• We’ll learn how to export your visualizations into art and design applications, such as Adobe Illustrator, for further crafting, creating a delightfully informative read.

### Why Baseball?

Recently, I ran across a news article listing the world’s largest recorded human gatherings.

A pilgrimage in Northern India summoned ~30 million people, and Mecca is always a destination.

Looking around, I noticed a few surprise entries, notably a Rod Stewart Concert in Rio (~4 million people)
and the 2016 World Series Celebration for the Chicago Cubs.

Here are the breakdowns. Each person = 1 million people.

Why was the Chicago Cubs World Series such a big deal? Well, this was the last Cubs World Series Championship team:

The Cubs hadn’t won a World Series in 108 years. To quote some stage direction from Shakespeare, “Exit, pursued by bear…”

Baseball thrives on statistics. 108 years parallels the 108 stitches in a baseball. There’s a poem in that idea.

Although we’re using baseball data here, this simply serves as an analogy for your data: any collection of information with many observations, and many variables (categories) within those observations.

Let’s take a look at data from baseball history, find out more about that 1908 season, and create some visualizations using Google Sheets and iNZight, a free package developed at the University of Auckland that runs through R, the world’s most popular open source stats application. No worries: no coding required!

### Resources for this workshop

iNZight

iNZight Lite

Lanham Baseball Database

Baseball Reference

Retrosheet

iNZight into Baseball

Go to File —> Make A Copy to bring this into your own Google Drive.

TeamsFranchises: First, let’s explore the first worksheet. Check out the Explore button in the lower right corner. We can find a lot of information about our data simply by clicking in a cell on a worksheet.

To get more specific data, let’s write a COUNTIF formula in some empty cells.

Teams: Let’s make a Pivot Table. Click on the corner between A & 1. Then go to Data —> Pivot Table

For Rows,  select name. For values, select  W and L. Filter the year to 1908.

Now, select the data frame this creates. Copy. Create a new worksheet. Go to Paste —> Paste Special —> Paste Values Only.

Let’s make a chart from this data. Highlight the data you want to chart. Then go to Insert Chart.

Make sure you are using Google Chrome for your browser. Go here: http://nytimes.github.io/svg-crowbar/

baseball-teams

Feel free to experiment with the other 1908 tabs. See if you can explore them and create meaningful charts.

Hall of Fame Batters

Go to File —> Make a Copy

Go to Inzight Litehttp://lite.docker.stat.auckland.ac.nz/

Go to File —> Import Dataset

*Histogram: See distribution of a single variable: Count of players per era.
*Compare distributions of Home Runs per Era: this changes to a dotplot, with a boxplot underneath

*Code more variables (colors and style)
*Advanced —> Explore: Let iNZight give you a tour!

What can’t iNZight Lite do? It’s just not quite as robust as running it as package through R. I use R Studio, an  IDE for the R language.

Next, open up R Studio. Create a new script. Copy-and-paste this text into the script editor:

install.packages(c(“vit”, “iNZightMR”, “iNZightTS”, “iNZightModules”, “iNZightRegression”, “iNZightPlots”, “iNZight”, “iNZightTools”),
repos = c(“http://r.docker.stat.auckland.ac.nz/R”,
“http://cran.stat.auckland.ac.nz”))    # or your preferred CRAN Mirror
library(vit)
iNZightVIT()

Today we learned how to survey, sift, summarize, sort, filter,  and visualize your data.

We learned what things to look out for during the visual analysis process of boxplots and scatterplots/ bubbleplots.

We learned how to leverage the power of free tools: Google Sheets and iNZight.

We learned how to export your visualizations into art and design applications, such as Adobe Illustrator, for further crafting, creating a delightfully informative read.

Thank you!

Questions?

### Please note these steps below are not always completely rote instructions. Instead, they should offer you the broad contours to acclimate to new terrain.

1. Go to Google Sheets and File > Make a Copy (make sure you have a google account and are signed in)
3. Go to Overpass Turbo: https://overpass-turbo.eu/
4. Search for Copenhagen, Denmark
5. Go to Wizard. Search for cafe. Run. Export as a kml file.
6. Go to Wizard. Search for park. Run. Export as a kml file.
7. Go to carto.db (create an account).
8. Go to Datasets > Connect to Dataset > Import the CSV you downloaded
9. In Map Wizard, change marker type to IMG > Maki icons > choose bicycle icon
10. Add layer, connect to dataset, and choose cafe kml export. In Map Wizard, change marker type to IMG > Maki icons > choose coffee cup
11. Add layer, connect to dataset, and choose park kml export. This is a polygon instead of a point. Make the polygons light green to show how green Copenhagen is!
12. Other possible overpass searches: trees, flower shops, & other springtime favorites.

### Spreadsheet Two: Birds of New York, April 2016

1. Go to Google Sheets and File > Make a Copy (make sure you have a google account and are signed in)
2. Select  Row 1. Go to View > Freeze 1 row.
3. Try out some conditional formatting. Select a column with variables in it. Go to Format > Conditional Formatting > Format Cells If.. > for text or conditional matches or Color Scale for sequential or diverging numerical highlighting.

### Spreadsheet Three:  Summer Olympics Medals Winners

1. Go to Google Sheets and File > Make a Copy (make sure you have a google account and are signed in)
2. Select  the space between Column A and Row 1 to select the entire spreadsheet.
3. Go to Data > Filter. Filters now appear along each column in your header row. Click on the the downward blue arrow indicating Filter. Clear the selection and choose some an item to filter, such as on sport or gender.
4. Once you have your filter applied, go back to the space between Column A and Row 1, click on it to select your entire spreadsheet, and go back to Data > Sort Range > Click on “Data has header row” > and choose a column that has numerical values in it. Sort Z to A for descending order

### Spreadsheet Four: Cherry Blossoms in Japan

2. Go to the red NEW button
3. Click on it, go down to More > Connect to Apps
4. Search for “Fusion Tables
5. Then follow the same steps, but this time you will see Fusion Tables added to your list of apps.
6. You can search public datasets here and export them to Google Sheets.
7. Search for “Cherry blossoms Japan”
8. The first table The Bloom of Cherry Blossoms 2016 that comes up looks good. These tables are often Wikipedia, so of course, you’ll need to verify what you scrape is okay to use for academic work.
9. Export this table to Google Sheets.
10. To do this, go to the row number and right-click on it, and select Hide row.
11. Next, control-click on row 2 and insert 1 row above it. This will be the header row.
13. Next, we need to Split our City column, which also has in it the Prefecture for Japan.
14. Control-click on Column A  and Insert three Columns to the right of it.
15. In cell C3, type in “=split(A2,“(“, TRUE)”
16.Select this cell. Grab the blue handle in the lower corner. Double click on it or drag down to copy and apply the formula conditionally.
17. Next, go to Edit > Find and Replace and find all of the )’s. Replace with nothing.
18. Copy-and-paste Cols. B through C into Col. I and Paste Special > Paste Values Only. Delete Col. A.
19. Select the pasted elements and move them into B through C.
20. Now we need to add a Column that tells us the time span for the peak bloom, and another one that gives us the day of the first bloom, from 1 to 365, so we can measure that against latitude (the question is, do flowers first bloom later in northern climates)?
21. Hide rows that don’t have values in them for our key variables. You can always fill these in later with more research.
20. Select the City Col. Go to Add-Ons > Get Add Ons > Search for Awesome Geocode. Use this Add-on to get Latitude and Longtitude data. Select the City Col. and run the Awesome Geocode.

These steps are getting pretty detailed. Fortunately, you can find the formulas pre-written for you on the spreadsheet. You can try writing your own formula and copying it on adjacent empty columns. Follow along with me for the rest as make a chart, merge it with more data about the Japanese prefectures in Fusion Tables, and then bring this into Tableau to do things we can’t do in Google Sheets.

Link to cherry blossom image (Search for My Tableau Repository/ Shapes/ ..):

Tableau workbook: http://tabsoft.co/1VX51rG

Tableau workbook: http://tabsoft.co/1SOgBSo

Data-Visualization-Resources

## Video of Workshop:

### Poster for Workshop

Data Vis.dig.sign

### Files for Workshop

Mohonk Preserve Weather

2. Make font condensed
3. Observe data types
4. Apply conditional formatting: categorical and diverging

********************************

Highest Grossing Films

1. How does Titanic stack up?
3. Make font condensed
4. Observe data types
5. Apply conditional formatting: categorical and diverging (optional)
6. Data > Filter > genre > Drama
7. Data > Sort > Worldwide gross
8. Make pivot table: genre for rows; budget for values
9. Copy and paste in new worksheet > paste values only
10. Make a chart of this data

********************************

Titanic passengers

1. Who was most likely to survive the Titanic? Who was least likely to survive the Titanic?
3. Make font condensed
4. Observe data types
5. Apply conditional formatting: categorical and diverging (optional)
6. Data > Filter > column attribute > ?
7. Data > Sort > column attribute > ?
8. Make pivot table: 1, 2, and 3 class for rows (count); survived (sum) for values; filter by demographic category
9. Duplicate worksheet and make a new pivot table
10. Copy into new worksheet and paste values only. Sort and clean data. Transform into percents if need be (normalize data
11. Make a chart of this data