1. Examine the structure of the iris data set. How many observations and variables are in the data set?
There are 5 total variables with 150 observations
  1. Create a new data frame iris1 that contains only the species virginica and versicolor with sepal lengths longer than 6 cm and sepal widths longer than 2.5 cm. How many observations and variables are in the data set? 63 Observations and 5 Variables
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.1.2
## Warning: package 'tibble' was built under R version 4.1.2
## Warning: package 'tidyr' was built under R version 4.1.2
## Warning: package 'readr' was built under R version 4.1.2
## Warning: package 'stringr' was built under R version 4.1.2
## Warning: package 'forcats' was built under R version 4.1.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.4
## ✔ ggplot2   3.4.4     ✔ stringr   1.5.0
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
data(iris)
iris1 <- filter(iris, Species != "setosa" & Sepal.Length >= 6 & Sepal.Width >= 2.5)
iris1
##    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 1           7.0         3.2          4.7         1.4 versicolor
## 2           6.4         3.2          4.5         1.5 versicolor
## 3           6.9         3.1          4.9         1.5 versicolor
## 4           6.5         2.8          4.6         1.5 versicolor
## 5           6.3         3.3          4.7         1.6 versicolor
## 6           6.6         2.9          4.6         1.3 versicolor
## 7           6.1         2.9          4.7         1.4 versicolor
## 8           6.7         3.1          4.4         1.4 versicolor
## 9           6.1         2.8          4.0         1.3 versicolor
## 10          6.3         2.5          4.9         1.5 versicolor
## 11          6.1         2.8          4.7         1.2 versicolor
## 12          6.4         2.9          4.3         1.3 versicolor
## 13          6.6         3.0          4.4         1.4 versicolor
## 14          6.8         2.8          4.8         1.4 versicolor
## 15          6.7         3.0          5.0         1.7 versicolor
## 16          6.0         2.9          4.5         1.5 versicolor
## 17          6.0         2.7          5.1         1.6 versicolor
## 18          6.0         3.4          4.5         1.6 versicolor
## 19          6.7         3.1          4.7         1.5 versicolor
## 20          6.1         3.0          4.6         1.4 versicolor
## 21          6.2         2.9          4.3         1.3 versicolor
## 22          6.3         3.3          6.0         2.5  virginica
## 23          7.1         3.0          5.9         2.1  virginica
## 24          6.3         2.9          5.6         1.8  virginica
## 25          6.5         3.0          5.8         2.2  virginica
## 26          7.6         3.0          6.6         2.1  virginica
## 27          7.3         2.9          6.3         1.8  virginica
## 28          6.7         2.5          5.8         1.8  virginica
## 29          7.2         3.6          6.1         2.5  virginica
## 30          6.5         3.2          5.1         2.0  virginica
## 31          6.4         2.7          5.3         1.9  virginica
## 32          6.8         3.0          5.5         2.1  virginica
## 33          6.4         3.2          5.3         2.3  virginica
## 34          6.5         3.0          5.5         1.8  virginica
## 35          7.7         3.8          6.7         2.2  virginica
## 36          7.7         2.6          6.9         2.3  virginica
## 37          6.9         3.2          5.7         2.3  virginica
## 38          7.7         2.8          6.7         2.0  virginica
## 39          6.3         2.7          4.9         1.8  virginica
## 40          6.7         3.3          5.7         2.1  virginica
## 41          7.2         3.2          6.0         1.8  virginica
## 42          6.2         2.8          4.8         1.8  virginica
## 43          6.1         3.0          4.9         1.8  virginica
## 44          6.4         2.8          5.6         2.1  virginica
## 45          7.2         3.0          5.8         1.6  virginica
## 46          7.4         2.8          6.1         1.9  virginica
## 47          7.9         3.8          6.4         2.0  virginica
## 48          6.4         2.8          5.6         2.2  virginica
## 49          6.3         2.8          5.1         1.5  virginica
## 50          6.1         2.6          5.6         1.4  virginica
## 51          7.7         3.0          6.1         2.3  virginica
## 52          6.3         3.4          5.6         2.4  virginica
## 53          6.4         3.1          5.5         1.8  virginica
## 54          6.0         3.0          4.8         1.8  virginica
## 55          6.9         3.1          5.4         2.1  virginica
## 56          6.7         3.1          5.6         2.4  virginica
## 57          6.9         3.1          5.1         2.3  virginica
## 58          6.8         3.2          5.9         2.3  virginica
## 59          6.7         3.3          5.7         2.5  virginica
## 60          6.7         3.0          5.2         2.3  virginica
## 61          6.3         2.5          5.0         1.9  virginica
## 62          6.5         3.0          5.2         2.0  virginica
## 63          6.2         3.4          5.4         2.3  virginica
  1. Now, create a iris2 data frame from iris1 that contains only the columns for Species, Sepal.Length, and Sepal.Width. How many observations and variables are in the data set? 63 Observations and 3 Variables
iris2 <- select(iris1, -(3:4))
iris2
##    Sepal.Length Sepal.Width    Species
## 1           7.0         3.2 versicolor
## 2           6.4         3.2 versicolor
## 3           6.9         3.1 versicolor
## 4           6.5         2.8 versicolor
## 5           6.3         3.3 versicolor
## 6           6.6         2.9 versicolor
## 7           6.1         2.9 versicolor
## 8           6.7         3.1 versicolor
## 9           6.1         2.8 versicolor
## 10          6.3         2.5 versicolor
## 11          6.1         2.8 versicolor
## 12          6.4         2.9 versicolor
## 13          6.6         3.0 versicolor
## 14          6.8         2.8 versicolor
## 15          6.7         3.0 versicolor
## 16          6.0         2.9 versicolor
## 17          6.0         2.7 versicolor
## 18          6.0         3.4 versicolor
## 19          6.7         3.1 versicolor
## 20          6.1         3.0 versicolor
## 21          6.2         2.9 versicolor
## 22          6.3         3.3  virginica
## 23          7.1         3.0  virginica
## 24          6.3         2.9  virginica
## 25          6.5         3.0  virginica
## 26          7.6         3.0  virginica
## 27          7.3         2.9  virginica
## 28          6.7         2.5  virginica
## 29          7.2         3.6  virginica
## 30          6.5         3.2  virginica
## 31          6.4         2.7  virginica
## 32          6.8         3.0  virginica
## 33          6.4         3.2  virginica
## 34          6.5         3.0  virginica
## 35          7.7         3.8  virginica
## 36          7.7         2.6  virginica
## 37          6.9         3.2  virginica
## 38          7.7         2.8  virginica
## 39          6.3         2.7  virginica
## 40          6.7         3.3  virginica
## 41          7.2         3.2  virginica
## 42          6.2         2.8  virginica
## 43          6.1         3.0  virginica
## 44          6.4         2.8  virginica
## 45          7.2         3.0  virginica
## 46          7.4         2.8  virginica
## 47          7.9         3.8  virginica
## 48          6.4         2.8  virginica
## 49          6.3         2.8  virginica
## 50          6.1         2.6  virginica
## 51          7.7         3.0  virginica
## 52          6.3         3.4  virginica
## 53          6.4         3.1  virginica
## 54          6.0         3.0  virginica
## 55          6.9         3.1  virginica
## 56          6.7         3.1  virginica
## 57          6.9         3.1  virginica
## 58          6.8         3.2  virginica
## 59          6.7         3.3  virginica
## 60          6.7         3.0  virginica
## 61          6.3         2.5  virginica
## 62          6.5         3.0  virginica
## 63          6.2         3.4  virginica
  1. Create an iris3 data frame from iris2 that orders the observations from largest to smallest sepal length. Show the first 6 rows of this data set.
iris3 <- arrange(iris2, by = desc(Sepal.Length))
head(iris3, 6)
##   Sepal.Length Sepal.Width   Species
## 1          7.9         3.8 virginica
## 2          7.7         3.8 virginica
## 3          7.7         2.6 virginica
## 4          7.7         2.8 virginica
## 5          7.7         3.0 virginica
## 6          7.6         3.0 virginica
  1. Create an iris4 data frame from iris3 that creates a column with a sepal area (length * width) value for each observation. How many observations and variables are in the data set? 63 Observations and 4 Variables
iris4 <- mutate(iris3, Sepal.Area = Sepal.Length*Sepal.Width)
iris4
##    Sepal.Length Sepal.Width    Species Sepal.Area
## 1           7.9         3.8  virginica      30.02
## 2           7.7         3.8  virginica      29.26
## 3           7.7         2.6  virginica      20.02
## 4           7.7         2.8  virginica      21.56
## 5           7.7         3.0  virginica      23.10
## 6           7.6         3.0  virginica      22.80
## 7           7.4         2.8  virginica      20.72
## 8           7.3         2.9  virginica      21.17
## 9           7.2         3.6  virginica      25.92
## 10          7.2         3.2  virginica      23.04
## 11          7.2         3.0  virginica      21.60
## 12          7.1         3.0  virginica      21.30
## 13          7.0         3.2 versicolor      22.40
## 14          6.9         3.1 versicolor      21.39
## 15          6.9         3.2  virginica      22.08
## 16          6.9         3.1  virginica      21.39
## 17          6.9         3.1  virginica      21.39
## 18          6.8         2.8 versicolor      19.04
## 19          6.8         3.0  virginica      20.40
## 20          6.8         3.2  virginica      21.76
## 21          6.7         3.1 versicolor      20.77
## 22          6.7         3.0 versicolor      20.10
## 23          6.7         3.1 versicolor      20.77
## 24          6.7         2.5  virginica      16.75
## 25          6.7         3.3  virginica      22.11
## 26          6.7         3.1  virginica      20.77
## 27          6.7         3.3  virginica      22.11
## 28          6.7         3.0  virginica      20.10
## 29          6.6         2.9 versicolor      19.14
## 30          6.6         3.0 versicolor      19.80
## 31          6.5         2.8 versicolor      18.20
## 32          6.5         3.0  virginica      19.50
## 33          6.5         3.2  virginica      20.80
## 34          6.5         3.0  virginica      19.50
## 35          6.5         3.0  virginica      19.50
## 36          6.4         3.2 versicolor      20.48
## 37          6.4         2.9 versicolor      18.56
## 38          6.4         2.7  virginica      17.28
## 39          6.4         3.2  virginica      20.48
## 40          6.4         2.8  virginica      17.92
## 41          6.4         2.8  virginica      17.92
## 42          6.4         3.1  virginica      19.84
## 43          6.3         3.3 versicolor      20.79
## 44          6.3         2.5 versicolor      15.75
## 45          6.3         3.3  virginica      20.79
## 46          6.3         2.9  virginica      18.27
## 47          6.3         2.7  virginica      17.01
## 48          6.3         2.8  virginica      17.64
## 49          6.3         3.4  virginica      21.42
## 50          6.3         2.5  virginica      15.75
## 51          6.2         2.9 versicolor      17.98
## 52          6.2         2.8  virginica      17.36
## 53          6.2         3.4  virginica      21.08
## 54          6.1         2.9 versicolor      17.69
## 55          6.1         2.8 versicolor      17.08
## 56          6.1         2.8 versicolor      17.08
## 57          6.1         3.0 versicolor      18.30
## 58          6.1         3.0  virginica      18.30
## 59          6.1         2.6  virginica      15.86
## 60          6.0         2.9 versicolor      17.40
## 61          6.0         2.7 versicolor      16.20
## 62          6.0         3.4 versicolor      20.40
## 63          6.0         3.0  virginica      18.00
  1. Create iris5 that calculates the average sepal length, the average sepal width, and the sample size of the entire iris4 data frame and print iris5.
iris5 <- summarize(iris4, meanLength = mean(Sepal.Length), meanWidth = mean(Sepal.Width), SampleSize = n())
print(iris5)
##   meanLength meanWidth SampleSize
## 1    6.64127  3.012698         63
  1. Finally, create iris6 that calculates the average sepal length, the average sepal width, and the sample size for each species of in the iris4 data frame and print iris6.
iris6 <- group_by(iris4, Species)

iris6 <- summarize(iris6, meanLength = mean(Sepal.Length), meanWidth = mean(Sepal.Width), SampleSize = n())

print(iris6)
## # A tibble: 2 × 4
##   Species    meanLength meanWidth SampleSize
##   <fct>           <dbl>     <dbl>      <int>
## 1 versicolor       6.40      2.97         21
## 2 virginica        6.76      3.04         42
  1. In these exercises, you have successively modified different versions of the data frame iris1 iris2 iris3 iris4 iris5 iris6. At each stage, the output data frame from one operation serves as the input fro the next. A more efficient way to do this is to use the pipe operator %>% from the tidyr package. See if you can rework all of your previous statements (except for iris5) into an extended piping operation that uses iris as the input and generates irisFinal as the output.
irisFinal <- iris %>%
   filter(Species != "setosa" & Sepal.Length >= 6 & Sepal.Width >= 2.5) %>% 
   select(-(3:4)) %>% #iris2
   arrange(by = desc(Sepal.Length)) %>% #iris3
   mutate(Sepal.Area = Sepal.Length*Sepal.Width) %>% #iris4
  group_by(Species) %>% #iris6
  summarize(meanLength = mean(Sepal.Length), meanWidth =     mean(Sepal.Width), SampleSize = n())
print(irisFinal)
## # A tibble: 2 × 4
##   Species    meanLength meanWidth SampleSize
##   <fct>           <dbl>     <dbl>      <int>
## 1 versicolor       6.40      2.97         21
## 2 virginica        6.76      3.04         42
  1. Create a ‘longer’ data frame using the original iris data set with three columns named “Species”, “Measure”, “Value”. The column “Species” will retain the species names of the data set. The column “Measure” will include whether the value corresponds to Sepal.Length, Sepal.Width, Petal.Length, or Petal.Width and the column “Value” will include the numerical values of those measurements.
irisLong <- iris %>%
  pivot_longer(cols = c(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width),
          names_to = "Measure",
          values_to = "Value")
print(irisLong)
## # A tibble: 600 × 3
##    Species Measure      Value
##    <fct>   <chr>        <dbl>
##  1 setosa  Sepal.Length   5.1
##  2 setosa  Sepal.Width    3.5
##  3 setosa  Petal.Length   1.4
##  4 setosa  Petal.Width    0.2
##  5 setosa  Sepal.Length   4.9
##  6 setosa  Sepal.Width    3  
##  7 setosa  Petal.Length   1.4
##  8 setosa  Petal.Width    0.2
##  9 setosa  Sepal.Length   4.7
## 10 setosa  Sepal.Width    3.2
## # ℹ 590 more rows