Build a table and explore it

We will use our class.csv dataset as an example.

class <- read.csv("http://math.slu.edu/~clair/stat1300/data/class-survey.csv")
str(class)
## 'data.frame':    25 obs. of  12 variables:
##  $ Timestamp                                   : Factor w/ 24 levels "2016/08/24 10:10:41 AM EST",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ Year                                        : Factor w/ 3 levels "Junior","Senior",..: 1 3 3 3 1 3 1 3 3 3 ...
##  $ Gender                                      : Factor w/ 2 levels "Female","Male": 2 2 1 1 2 1 1 1 1 2 ...
##  $ Height..inches.                             : num  67 68 66 65 74 63 70 59 60 68 ...
##  $ Shoe.size                                   : num  10 11.5 8 6.5 11 7 8.5 3 6 8 ...
##  $ Eye.Color                                   : Factor w/ 5 levels "Black","Blue",..: 2 3 5 3 4 1 3 3 3 3 ...
##  $ How.many.first.cousins.do.you.have.         : int  11 13 1 8 30 8 5 48 8 16 ...
##  $ Can.you...Roll.your.tongue..                : Factor w/ 2 levels "No","Yes": 1 2 1 1 2 2 2 2 1 2 ...
##  $ Can.you...Touch.your.nose.with.your.tongue..: Factor w/ 2 levels "No","Yes": 2 1 2 1 1 1 1 1 1 1 ...
##  $ Can.you...Wiggle.your.ears..                : Factor w/ 2 levels "No","Yes": 2 1 1 1 1 2 1 1 1 1 ...
##  $ Can.you...Raise.one.eyebrow..               : Factor w/ 2 levels "No","Yes": 2 2 1 1 1 2 1 2 2 1 ...
##  $ How.long.is.your.piece.of.string..in.inches.: num  19 2.3 9.1 10.2 10 ...
yg <- table(class$Year,class$Gender)
yg
##            
##             Female Male
##   Junior         5    5
##   Senior         3    1
##   Sophomore      7    4
plot(yg)

barplot(yg,beside=TRUE,legend=TRUE)

You can use t to transpose the table (switch rows with columns)

t(yg)
##         
##          Junior Senior Sophomore
##   Female      5      3         7
##   Male        5      1         4
plot(t(yg))

barplot(t(yg),beside=TRUE,legend=TRUE)

margin.table can compute marginal distributions, which are the sum of the entire table, the rows, or the columns:

margin.table(yg)
## [1] 25
margin.table(yg,1)
## 
##    Junior    Senior Sophomore 
##        10         4        11
margin.table(yg,2)
## 
## Female   Male 
##     15     10

prop.table can compute overall or conditional proportions. Notice either the entire table, the rows, or the columns sum to 1.

prop.table(yg)
##            
##             Female Male
##   Junior      0.20 0.20
##   Senior      0.12 0.04
##   Sophomore   0.28 0.16
prop.table(yg,1)
##            
##                Female      Male
##   Junior    0.5000000 0.5000000
##   Senior    0.7500000 0.2500000
##   Sophomore 0.6363636 0.3636364
prop.table(yg,2)
##            
##                Female      Male
##   Junior    0.3333333 0.5000000
##   Senior    0.2000000 0.1000000
##   Sophomore 0.4666667 0.4000000

The chi-square test

We will test the hypotheses

Both chisq.test and summary perform the test, given a table:

chisq.test(yg)
## Warning in chisq.test(yg): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  yg
## X-squared = 0.85227, df = 2, p-value = 0.653
summary(yg)
## Number of cases in table: 25 
## Number of factors: 2 
## Test for independence of all factors:
##  Chisq = 0.8523, df = 2, p-value = 0.653
##  Chi-squared approximation may be incorrect

Here, \(\chi^2 = .8523\), with 2 DF. The chi-squared distribution looks like this:

x <- seq(0,15,.1)
plot(x,dchisq(x,2),type="l")

and we see that .8523 is not far along the tail. The area to the right of .8523 is:

1-pchisq(.8523,2)
## [1] 0.6530184

which is the P-value for the test. We fail to reject \(H_0\), there is not significant evidence for a relationship between gender and grade level.

Manipulating tabular data

Data sets from our book come in a row summary form, like so:

load(url("http://math.slu.edu/~clair/stat1300/data/ex2532sleepq-fixed.rda"))
ex2532sleepq
##   OTC.Rx      Sleep Count
## 1    yes    optimal    37
## 2    yes borderline    53
## 3    yes       poor    84
## 4     no    optimal   266
## 5     no borderline   186
## 6     no       poor   245

To transform it into a crosstable, use xtabs. The following gives a table of Count with rows OTC.Rx and columns Sleep.

sleep <- xtabs(Count ~ OTC.Rx + Sleep, data=ex2532sleepq)
sleep
##       Sleep
## OTC.Rx optimal borderline poor
##    no      266        186  245
##    yes      37         53   84

To use all other variables for rows and columns, this shorthand works:

sleep <- xtabs(Count ~ ., data=ex2532sleepq)
sleep
##       Sleep
## OTC.Rx optimal borderline poor
##    no      266        186  245
##    yes      37         53   84