data.table

Creating a data.table

Remarks#

A data.table is an enhanced version of the data.frame class from base R. As such, its class() attribute is the vector "data.table" "data.frame" and functions that work on a data.frame will also work with a data.table. There are many ways to create, load or coerce to a data.table, as seen here.

Coerce a data.frame

To copy a data.frame as a data.table, use as.data.table or data.table:

DF = data.frame(x = letters[1:5], y = 1:5, z = (1:5) > 3)

DT <- as.data.table(DF)
# or
DT <- data.table(DF)

This is rarely necessary. One exception is when using built-in datasets like mtcars, which must be copied since they cannot be modified in-place.

Build with data.table()

There is a constructor of the same name:

DT <- data.table(
  x = letters[1:5], 
  y = 1:5, 
  z = (1:5) > 3
)
#    x y     z
# 1: a 1 FALSE
# 2: b 2 FALSE
# 3: c 3 FALSE
# 4: d 4  TRUE
# 5: e 5  TRUE

Unlike data.frame, data.table will not coerce strings to factors by default:

sapply(DT, class)
#               x           y           z 
#     "character"   "integer"   "logical" 

Read in with fread()

We can read from a text file:

dt <- fread("my_file.csv")

Unlike read.csv, fread will read strings as strings, not as factors by default.

See the [topic on fread][need_a_link] for more examples.

Modify a data.frame with setDT()

For efficiency, data.table offers a way of altering a data.frame or list to make a data.table in-place:

# example data.frame
DF = data.frame(x = letters[1:5], y = 1:5, z = (1:5) > 3)

# modification
setDT(DF)

Note that we do not <- assign the result, since the object DF has been modified in-place.

The class attributes of the data.frame will be retained:

sapply(DF, class)
#         x         y         z 
#  "factor" "integer" "logical" 

Copy another data.table with copy()

# example data
DT1 = data.table(x = letters[1:2], y = 1:2, z = (1:2) > 3)

Due to the way data.tables are manipulated, DT2 <- DT1 will not make a copy. That is, later modifications to the columns or other attributes of DT2 will affect DT1 as well. When you want a real copy, use

DT2 = copy(DT1)

To see the difference, here’s what happens without a copy:

DT2 <- DT1
DT2[, w := 1:2]

DT1
#    x y     z w
# 1: a 1 FALSE 1
# 2: b 2 FALSE 2
DT2
#    x y     z w
# 1: a 1 FALSE 1
# 2: b 2 FALSE 2

And with a copy:

DT2 <- copy(DT1)
DT2[, w := 1:2]

DT1
#    x y     z
# 1: a 1 FALSE
# 2: b 2 FALSE
DT2
#    x y     z w
# 1: a 1 FALSE 1
# 2: b 2 FALSE 2

So the changes do not propagate in the latter case.


This modified text is an extract of the original Stack Overflow Documentation created by the contributors and released under CC BY-SA 3.0 This website is not affiliated with Stack Overflow