1 The Panes of RStudio
Below are the four main panes in Rstudio, along with their default positions. They are also shown in Figure 1.
- Source (top left)
- Console, Terminal (bottom left)
- Workspace/Environment, History, Connections (top right)
- Files, Plots, Packages, Help, Viewer (bottom right)
The four panes of RStudio
Let’s go through them one by one.
1.1 Source
The source is where your R ‘scripts’ go. Your script is simply a text document in which you write most of your code. Open a new script under the File menu (File -> New File -> R Script). Now, into your script, type the following piece of code.
Great! Now you need to run your code piece of code. To do this, make sure your cursor is somewhere on the line of code you just typed, then hit ‘Command + Enter’ on MacOS, or ‘Control + Enter’ on WindowsOS. There is also a ‘Run’ button at the top right of the source panel, but you should not need this too often.
To run a chunk of code, you can highlight the relevant code, then use the ‘Command + Enter’ keys.
You can open your script in a new window. To do this, click on the third icon from the left in the toolbar directly above the script.
You can change the theme of the editor. To do this, select the ‘Tools’ menu. Then select ‘Global Options’. Choose the ‘Appearance’ tab. Pick a sexy theme.
Data tables (spreadsheets) are also viewed in the source pane. To see this, type and run the code below. (cars is the name of a dataset that comes loaded with R.)
Question: How many rows and columns are there in the cars dataset?
1.2 Console
The console, the bottom left pane, shows the lines of code that you have run as well as the output generated by those lines of code.
You can also type code directly into the console. Try doing that now. Type a random piece of code and press ‘Enter’.
Question: How is code written in the console different from that in the script?
1.3 Environment, History, Connections etc.
The Environment tab shows datasets that are loaded. To explore this tab, let’s load one of R’s built-in datasets. Run the following code, either from your script or your console:
Now that the dataset is loaded, you can click on the blue dropdown icon to the left of the dataset to reveal a summary. Clicking directly on the dataset opens it in a ‘View’ tab.
The History tab gives you a history of past code that has been run.
The Connections tab allows you to connect to external databases. You will not need this.
Question: Are all US states included in this dataset?
You can remove an object from the workspace with the rm() function:
That last command is pretty difficult to remember. Just Google “how to clear workspace in R” when you forget it.
1.4 Files, Plots, Packages, Help, Viewer etc.
Let’s look now at the bottom right pane. We will consider each tab there in order.
First the Files tab. You can load data into your R workspace from the ‘Files’ panel but this is not recommended. We will look at better ways to load in data soon.
Next, the Plots tab. Here is where your plots show up. Try generating a simple plot with the following code:
You can zoom into the plot, and save it as well, by clicking on the respective icons above the plot. Try this.
Question: How many different image formats are available for exported plots?
Next, the Packages tab is where you can install and load R packages. Most of the functionality of R comes from its huge library of packages. Let’s install the ‘ggplot2’ package, which is a popular graphics package that we will need later. Click on the ‘Install’ icon and type in ggplot2.
Now, we need to load the package into our library in order to use it. Do this by ticking the checkbox to the left of the package name.
We can also install and load a package in the R script or console, completely avoiding the dialogue boxes. This is shown below:
Next up, The Help tab shows the documentation for different functions and data frames. Try running each line below to see what documentation looks like:
Study the format of these help files. You will need to use them often.
2 R as a Calculator
R works as a simple calculator. It follows the expected order of operations. Type in the following expressions and observe their output.
3 Functions in R
R has many built-in functions. To ‘call’ a function, you type the function, then put the function’s ‘input’ into parentheses, that is, ‘function(input)’.
3.1 Arithmetic functions
Let’s look first at some arithmetic functions. Run these and observe their output.
Your turn: 1. Write code to round the number 3.56 to its nearest whole number. 2. Write code to find the log, base 7, of the number 117649. Use Google to find out what the needed functions are.
3.2 Statistical functions
R has several built-in statistical functions. Run the code lines below.
Hmmm the answers to those probably do not look right. Welcome to your first syntax problem! Most of your time learning R (or any other programming language) will be spent trying to figure what went wrong with your syntax.
Here, the problem is this: when you want to refer to a bunch objects as a group in R, you need to ‘concatenate’ them into a ‘vector’. A vector is a fancy name for a list.
Let’s try that again.
That’s more like it!
Note that R is ‘space-insensitive’. So you can use as many or as few spaces as you like. In the above code, we used more white space than necessary.
Your turn: Write code to calculate the variance and standard deviation of the following set of numbers: 3,4,7,18,22,78. Use Google to find out what the functions for variance and standard deviation are.
3.3 Text functions
Now, some text functions. When you want something to be treated as text, you enclose it in single or double quotation marks. A piece of text enclosed in quotation marks is called a ‘string’. Let’s look at some simple text/string functions. Run the following pieces of code.
# change case
toupper('He unfollowed me!')
tolower('WOW')
# paste a bunch of text together
paste("cat", "and", "dog")
# Find and substitute
gsub('Celeste', 'Ceci', 'My name is Celeste. Celeste is my name')
This last command does the following:
- Takes the sentence ‘My Name is Celeste’
- Then substitutes all instances of ‘Celeste’ with ‘Ceci’
Your turn: 1. Use the gsub function to replace the numerals in the paragraph below with the spelled-out numbers:
“So far we have seen 3 cases in Orange County. In San Francisco, that number is 5. Although 3 and 5 are small numbers, health officials are worried”
3.4 Writing your own functions
As you advance in programming expertise, you will eventually need to write your own functions. Let’s look at this very cursorily. Below, we write a function to calculate the roots of a quadratic equation.
# Define the function 'getroots' that takes in values a, b and c
getroots <- function(a,b,c) {
root1 <- (-b + sqrt(b^2 - 4*a*c)) / 2*a # then applies the quadratic formula
root2 <- (-b - sqrt(b^2 - 4*a*c)) / 2*a
return(c(root1, root2)) # then returns both roots
}
Now, you should test that function on the following quadratic to see if it works. \[x^2 + 3x - 4 \]
Your turn : Write a function that takes in a set of numbers, then outputs the mean, median and mode, in that order, of the set of numbers.
4 Creating objects in R
To do most things in R, we need to assign values to objects.
We can use the ‘assignment operator’, <-, to assign a value to an ‘object’ in R. We can then call that object later.
For example:
# Create the object Aus, and store some text in it
Aus <- "A large continent"
# Now, call the object Aus
Aus
In Rstudio you can type the assignment operator with a shortcut. For Macs, the default shortcut is Option + the minus key. On Windows, it is Alt + the minus key. There are a few rules, and some style conventions, for naming objects in R. See this page for details on those.
Below, we use the assignment operator to store the result of the gsub function. We can then call that object at a later time.
# store the output of the gsub function in the object called newname
newname <- gsub('Celeste', 'Ceci', 'My name is Celeste. Celeste is my name')
newname
For example, let’s say we want the old name back.
Your turn: 1. Store the numbers 1 to 100 in an object called ‘numbers’ 2. Print the square of each of these numbers.
5 Vectors
For data analysis in R, you will be working mostly with vectors and vectorized operations.
As we noted before, a vector is simply a list of things in R. To vectorize a set of values, we concatenate the values with the c() command. Functions will generally accept vectors as input.
As an example, let’s create a vector of weights, then a vector of heights, and use those to calculate BMI values.
# names of some imaginary people
names <- c("Joe", "Jane", "Fatman", "Tallgirl")
# weights of these imaginary people
weights <- c(45, 50, 150, 80)
# their heights
heights <- c(1.3, 1.4, 1.1, 1.9)
# we calculate their bmis
bmis <- weights / heights^2
# we paste the results
paste(names, ", your BMI is:", bmis)
You can think of each vector above as a column in a table. In fact, a column in a table is one type of vector which you will use a lot in data analysis. We will consider this on the next handout.
Your turn: 1. Create a vector of four random Celsius temperatures. 2. Convert these to Fahrenheit and store the results in a new object. 3. Use the ‘paste’ function to generate four sentences like this one “35 degrees Celsius is 95 degrees Fahrenheit”
2.1 Comments
By the way, have you noticed the lines preceded by a # symbol above? Anything written after that symbol is treated as a comment. Comments are a great way of keeping track of what each step of your code is doing. In general, you should write a comment before each ‘chunk’ of code. You will thank yourself for this in the future when you are trying to decipher what your code does. Others who read your code will also greatly appreciate all comments.