Variables
Variables, data structures and basic Operations
In R, data objects are manipulated using named data structures. The names of the objects might be called “variables” although that term does not have a specific meaning in the official R documentation. R names are case sensitive and may contain alphanumeric characters(a-z
,A-z
,0-9
), the dot/period(.
) and underscore(_
). To create names for the data structures, we have to follow the following rules:
-
Names that start with a digit or an underscore (e.g.
1a
), or names that are valid numerical expressions (e.g..11
), or names with dashes (’-’) or spaces can only be used when they are quoted:`1a`
and`.11`
. The names will be printed with backticks:list( '.11' ="a") #$`.11` #[1] "a"
-
All other combinations of alphanumeric characters, dots and underscores can be used freely, where reference with or without backticks points to the same object.
-
Names that begin with
.
are considered system names and are not always visible using thels()
-function.
There is no restriction on the number of characters in a variable name.
Some examples of valid object names are: foobar
, foo.bar
, foo_bar
, .foobar
In R, variables are assigned values using the infix-assignment operator <-
. The operator =
can also be used for assigning values to variables, however its proper use is for associating values with parameter names in function calls. Note that omitting spaces around operators may create confusion for users. The expression a<-1
is parsed as assignment (a <- 1
) rather than as a logical comparison (a < -1
).
> foo <- 42
> fooEquals = 43
So foo
is assigned the value of 42
. Typing foo
within the console will output 42
, while typing fooEquals
will output 43
.
> foo
[1] 42
> fooEquals
[1] 43
The following command assigns a value to the variable named x
and prints the value simultaneously:
> (x <- 5)
[1] 5
# actually two function calls: first one to `<-`; second one to the `()`-function
> is.function(`(`)
[1] TRUE # Often used in R help page examples for its side-effect of printing.
It is also possible to make assignments to variables using ->
.
> 5 -> x
> x
[1] 5
>
Types of data structures
There are no scalar data types in R. Vectors of length-one act like scalars.
- Vectors: Atomic vectors must be sequence of same-class objects.: a sequence of numbers, or a sequence of logicals or a sequence of characters.
v <- c(2, 3, 7, 10)
,v2 <- c("a", "b", "c")
are both vectors. - Matrices: A matrix of numbers, logical or characters.
a <- matrix(data = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), nrow = 4, ncol = 3, byrow = F)
.
Like vectors, matrix must be made of same-class elements. To extract elements from a matrix rows and columns must be specified: a[1,2]
returns [1] 5
that is the element on the first row, second column.
- Lists: concatenation of different elements
mylist <- list (course = 'stat', date = '04/07/2009', num_isc = 7, num_cons = 6, num_mat = as.character(c(45020, 45679, 46789, 43126, 42345, 47568, 45674)), results = c(30, 19, 29, NA, 25, 26 ,27) )
. Extracting elements from a list can be done by name (if the list is named) or by index. In the given example
mylist$results
and mylist[[6]]
obtains the same element.
Warning: if you try mylist[6]
, R wont give you an error, but it extract the result as a list. While mylist[[6]][2]
is permitted (it gives you 19), mylist[6][2]
gives you an error.
- data.frame: object with columns that are vectors of equal length, but (possibly) different types. They are not matrices.
exam <- data.frame(matr = as.character(c(45020, 45679, 46789, 43126, 42345, 47568, 45674)), res_S = c(30, 19, 29, NA, 25, 26, 27), res_O = c(3, 3, 1, NA, 3, 2, NA), res_TOT = c(30,22,30,NA,28,28,27))
. Columns can be read by name exam$matr
, exam[, 'matr']
or by index exam[1]
, exam[,1]
. Rows can also be read by name exam['rowname', ]
or index exam[1,]
. Dataframes are actually just lists with a particular structure (rownames-attribute and equal length components)
Common operations and some cautionary advice
Default operations are done element by element. See ?Syntax
for the rules of operator precedence. Most operators (and may other functions in base R) have recycling rules that allow arguments of unequal length.
Given these objects:
Example objects
> a <- 1
> b <- 2
> c <- c(2,3,4)
> d <- c(10,10,10)
> e <- c(1,2,3,4)
> f <- 1:6
> W <- cbind(1:4,5:8,9:12)
> Z <- rbind(rep(0,3),1:3,rep(10,3),c(4,7,1))
Some vector operations
> a+b # scalar + scalar
[1] 3
> c+d # vector + vector
[1] 12 13 14
> a*b # scalar * scalar
[1] 2
> c*d # vector * vector (componentwise!)
[1] 20 30 40
> c+a # vector + scalar
[1] 3 4 5
> c^2 #
[1] 4 9 16
> exp(c)
[1] 7.389056 20.085537 54.598150
Some vector operation Warnings!
> c+e # warning but.. no errors, since recycling is assumed to be desired.
[1] 3 5 7 6
Warning message:
In c + e : longer object length is not a multiple of shorter object length
R sums what it can and then reuses the shorter vector to fill in the blanks… The warning was given only because the two vectors have lengths that are not exactly multiples. c+f # no warning whatsoever.
Some Matrix operations Warning!
> Z+W # matrix + matrix #(componentwise)
> Z*W # matrix* matrix#(Standard product is always componentwise)
To use a matrix multiply: V %*% W
> W + a # matrix+ scalar is still componentwise
[,1] [,2] [,3]
[1,] 2 6 10
[2,] 3 7 11
[3,] 4 8 12
[4,] 5 9 13
> W + c # matrix + vector... : no warnings and R does the operation in a column-wise manner
[,1] [,2] [,3]
[1,] 3 8 13
[2,] 5 10 12
[3,] 7 9 14
[4,] 6 11 16
“Private” variables
A leading dot in a name of a variable or function in R is commonly used to denote that the variable or function is meant to be hidden.
So, declaring the following variables
> foo <- 'foo'
> .foo <- 'bar'
And then using the ls
function to list objects will only show the first object.
> ls()
[1] "foo"
However, passing all.names = TRUE
to the function will show the ‘private’ variable
> ls(all.names = TRUE)
[1] ".foo" "foo"