R Time Series Objects vs Data Frames: Advantages and Limitations

As I learn more and more about R, questions often arise about which packages/methods/tools to use for a given situation. R is a vast - and growing - universe and I’m not interested in learning everything in that universe. I’m interested in learning the shortest paths between where I am now and my objective. As an adherent of the tidyverse, I lean strongly toward solutions in that realm. But, to paraphrase an old saying, ‘tidyverse is a playgound…not a jail’ and if a problem can be handled better by stepping outside the tidyverse, I’m all for that.
One of these areas is in dealing with time series: data sets comprised of repeated measurements over consistent time intervals (hourly, daily, monthly, etc). You can work with time series data using data frames, the fundamental building block of data analysis in R, but there are more specialized tools that offer more flexibility, specific capabilities and ease of use when analyzing time-based data. This can come into play in a wide variety of situations: weekly website visits, monthly sales, daily stock prices, annual GDP, electricity use by minute, that kind of thing.
So what are these time series advantages? How do we leverage them? What limitations of time series objects are good to be aware of? I’m not pretending this is a definitive guide, but I’ve been looking at this for a while and hear are my observations…
(A word on forecasting: this is a MAJOR use case for time series but is not the main focus here and I’ll only touch on that briefly below.)
Time Series Essentials
ts is the basic class for time series objects in R. You can do a lot with ts but its functionality have been extend by other packages, in particular zoo and more recently xts.
xts is a leading, evolved R package for working with time series. It builds on zoo, an earlier pkg for handling time series in R. Datacamp has a very nice
So I’m just going to scratch the surface and hit some highlights with examples here.
Get a Time Series Object
At its most basic, a time series object is a list or sometimes matrix of observations at regular time intervals.
Examples in built-in R data sets include:
- annual Nile river flows
class(Nile)
## [1] "ts"
str(Nile)
## Time-Series [1:100] from 1871 to 1970: 1120 1160 963 1210 1160 1160 813 1230 1370 1140 ...
Nile
## Time Series:
## Start = 1871
## End = 1970
## Frequency = 1
## [1] 1120 1160 963 1210 1160 1160 813 1230 1370 1140 995 935 1110 994 1020
## [16] 960 1180 799 958 1140 1100 1210 1150 1250 1260 1220 1030 1100 774 840
## [31] 874 694 940 833 701 916 692 1020 1050 969 831 726 456 824 702
## [46] 1120 1100 832 764 821 768 845 864 862 698 845 744 796 1040 759
## [61] 781 865 845 944 984 897 822 1010 771 676 649 846 812 742 801
## [76] 1040 860 874 848 890 744 749 838 1050 918 986 797 923 975 815
## [91] 1020 906 901 1170 912 746 919 718 714 740
- monthly Air Passengers - yes, I know everybody uses Air Passengers for their time series example. So damn handy. Different examples below, I promise. ;)
class(AirPassengers)
## [1] "ts"
str(AirPassengers)
## Time-Series [1:144] from 1949 to 1961: 112 118 132 129 121 135 148 148 136 119 ...
AirPassengers
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1949 112 118 132 129 121 135 148 148 136 119 104 118
## 1950 115 126 141 135 125 149 170 170 158 133 114 140
## 1951 145 150 178 163 172 178 199 199 184 162 146 166
## 1952 171 180 193 181 183 218 230 242 209 191 172 194
## 1953 196 196 236 235 229 243 264 272 237 211 180 201
## 1954 204 188 235 227 234 264 302 293 259 229 203 229
## 1955 242 233 267 269 270 315 364 347 312 274 237 278
## 1956 284 277 317 313 318 374 413 405 355 306 271 306
## 1957 315 301 356 348 355 422 465 467 404 347 305 336
## 1958 340 318 362 348 363 435 491 505 404 359 310 337
## 1959 360 342 406 396 420 472 548 559 463 407 362 405
## 1960 417 391 419 461 472 535 622 606 508 461 390 432
Both these examples are time series of the ts class, and we can see right off that these are different data structures from data frames. A key thing to note about time series is that date/time is not in a column the way it would be in a data frame, but is in an index - similar to row.names in a data frame.
If we look at the index for the Nile river data, we can see the time values and we can check the start and end. This info corresponds to the structure info shown above, where start = 1871, end = 1970, and frequency = 1, meaning 1 observation per year, annual data.
index(Nile)
## [1] 1871 1872 1873 1874 1875 1876 1877 1878 1879 1880 1881 1882 1883 1884 1885
## [16] 1886 1887 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900
## [31] 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 1915
## [46] 1916 1917 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930
## [61] 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945
## [76] 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960
## [91] 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970
start(Nile)
## [1] 1871 1
end(Nile)
## [1] 1970 1
As discussed above, ts is useful, but xts offers additional flexibility and features.
Convert ts to xts
Converting to an xts object can often make the data more intuitive to deal with.
library(xts)
Nile_xts <- as.xts(Nile)
str(Nile_xts)
## An 'xts' object on 1871-01-01/1970-01-01 containing:
## Data: num [1:100, 1] 1120 1160 963 1210 1160 1160 813 1230 1370 1140 ...
## Indexed by objects of class: [Date] TZ: UTC
## xts Attributes:
## NULL
head(Nile_xts)
## [,1]
## 1871-01-01 1120
## 1872-01-01 1160
## 1873-01-01 963
## 1874-01-01 1210
## 1875-01-01 1160
## 1876-01-01 1160
Air_xts <- as.xts(AirPassengers)
str(Air_xts)
## An 'xts' object on Jan 1949/Dec 1960 containing:
## Data: num [1:144, 1] 112 118 132 129 121 135 148 148 136 119 ...
## Indexed by objects of class: [yearmon] TZ: UTC
## xts Attributes:
## NULL
head(Air_xts)
## [,1]
## Jan 1949 112
## Feb 1949 118
## Mar 1949 132
## Apr 1949 129
## May 1949 121
## Jun 1949 135
- We can see here that xts has reshaped the data from a matrix with rows by year and columns by month to more ‘tidy’ data with mth-year as index and observations in one column.
Native xts
Some data comes as xts time series out of the box. For example, the quantmod package fetches stock market data as xts time series automatically:
library(quantmod)
## use quantmod pkg to get some stock prices as time series
price <- getSymbols(Symbols='EA', from="2020-01-01", to=Sys.Date(), auto.assign=FALSE)
class(price)
## [1] "xts" "zoo"
head(price)
## EA.Open EA.High EA.Low EA.Close EA.Volume EA.Adjusted
## 2020-01-02 108 108 107 107 1901000 107
## 2020-01-03 106 108 105 107 1840300 107
## 2020-01-06 107 109 107 109 2934200 108
## 2020-01-07 109 109 108 108 1692400 108
## 2020-01-08 108 110 108 109 2651600 109
## 2020-01-09 110 110 108 109 1818600 109
As noted, a key characteristic of time series object is that dates are in an index rather than being in a date column, as they would be in typical data frame. Looking at the structure of the xts object, we can again see it is different from a data frame.
str(price)
## An 'xts' object on 2020-01-02/2022-02-04 containing:
## Data: num [1:529, 1:6] 108 106 107 109 108 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:6] "EA.Open" "EA.High" "EA.Low" "EA.Close" ...
## Indexed by objects of class: [Date] TZ: UTC
## xts Attributes:
## List of 2
## $ src : chr "yahoo"
## $ updated: POSIXct[1:1], format: "2022-02-05 17:47:53"
Convert xts to data frame
If you want to work with the time series as a data frame, it is fairly straightforward to convert an xts object:
price_df <- as.data.frame(price)
## add Date field based on index (row names) of xts object
price_df$Date <- index(price)
## set data frame row names to numbers instead of dates
rownames(price_df) <- seq(1:nrow(price))
## reorder columns to put Date first
price_df <- price_df %>% select(Date, 1:ncol(price_df)-1)
## check out structure using glimpse, as is the fashion of the times
glimpse(price_df)
## Rows: 529
## Columns: 7
## $ Date <date> 2020-01-02, 2020-01-03, 2020-01-06, 2020-01-07, 2020-01-0…
## $ EA.Open <dbl> 108, 106, 107, 109, 108, 110, 109, 109, 110, 110, 110, 112…
## $ EA.High <dbl> 108, 108, 109, 109, 110, 110, 109, 110, 110, 110, 111, 113…
## $ EA.Low <dbl> 107, 105, 107, 108, 108, 108, 108, 109, 109, 109, 110, 112…
## $ EA.Close <dbl> 107, 107, 109, 108, 109, 109, 109, 110, 110, 110, 111, 113…
## $ EA.Volume <dbl> 1901000, 1840300, 2934200, 1692400, 2651600, 1818600, 1756…
## $ EA.Adjusted <dbl> 107, 107, 108, 108, 109, 109, 108, 109, 109, 109, 111, 112…
Data frame is basically a straight-up table, whereas the xts object has other structural features.
Convert data frame to xts
## convert data frame to xts object by specifying the date field to use for xts index.
price_xts <- xts(price_df, order.by=as.Date(price_df$Date))
str(price_xts)
## An 'xts' object on 2020-01-02/2022-02-04 containing:
## Data: chr [1:529, 1:7] "2020-01-02" "2020-01-03" "2020-01-06" "2020-01-07" ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:7] "Date" "EA.Open" "EA.High" "EA.Low" ...
## Indexed by objects of class: [Date] TZ: UTC
## xts Attributes:
## NULL
Notice, however, that in the process of converting an xts object to data frame and back to xts, the xts Attributes information has been lost.
Saving/Exporting time series data
Due to the structure of an xts object, the best way to save/export for future use in R and preserve all its attributes is to save as RDS file, using saveRDS. (additional helpful RDS info here.)
However, this won’t be helpful if you need to share the data with someone who is not using R. You can save as a CSV file using write.zoo (be sure to specificy sep=“,”) and this will maintain the table structure of the data but will drop the attributes. It will automatically move the indexes into an Index column so if someone opens it in Excel/Google Sheets, they will see the dates/times.
Saving as RDS or CSV:
## save as RDS to preserve attributes
saveRDS(price, file="price.rds")
price_rds <- readRDS(file='price.rds')
str(price_rds)
## An 'xts' object on 2020-01-02/2022-02-04 containing:
## Data: num [1:529, 1:6] 108 106 107 109 108 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:6] "EA.Open" "EA.High" "EA.Low" "EA.Close" ...
## Indexed by objects of class: [Date] TZ: UTC
## xts Attributes:
## List of 2
## $ src : chr "yahoo"
## $ updated: POSIXct[1:1], format: "2022-02-05 17:47:53"
## save as CSV - ensure to include sep=","
write.zoo(price, file='price.csv', sep=",")
price_zoo <- read_csv('price.csv')
Time Series Strengths
The structure of a time series leads a variety of advantages related to time-based analysis, compared to data frames. A few of the main ones, at least from my perspective:
- Period/Frequency Manipulation: can easily change from granular periods, such as daily, to aggregated periods.
- Period calculations: counting number of periods in the data (months, quarters, years).
- Selection/subsetting based on date ranges.
- Visualization: a number of visualization options are designed to work with time series.
- Decomposition: breaking out time series into trend, seasonal, random components for analysis.
- Forecasting: time series objects are designed for applying various forecasting methods like Holt-Winters and ARIMA. This is well beyond the scope of this post, but we’ll show a quick ARIMA example.
No doubt everything you can do with time series can be done with data frames, but using a time series object can really expedite things.
Time Series Manipulation/Calculation
Period/Frequency Manipulation
Change the period granularity to less granular:
- easily change daily data to weekly, monthly, quarterly, yearly
## get periodicity (frequency) for data set
periodicity(price)
## Daily periodicity from 2020-01-02 to 2022-02-04
## aggregate by period
head(to.weekly(price)[,1:5])
## price.Open price.High price.Low price.Close price.Volume
## 2020-01-03 108 108 105 107 3741300
## 2020-01-10 107 110 107 109 10852800
## 2020-01-17 109 113 109 113 10221500
## 2020-01-24 113 114 112 112 8457000
## 2020-01-31 110 113 106 108 18435600
## 2020-02-07 108 111 104 109 15973600
head(to.monthly(price)[,1:5])
## price.Open price.High price.Low price.Close price.Volume
## Jan 2020 107.9 114 105.1 108 51708200
## Feb 2020 107.9 111 98.6 101 55140000
## Mar 2020 101.9 112 85.7 100 116497800
## Apr 2020 98.4 119 96.7 114 72981400
## May 2020 113.1 123 111.1 123 71400200
## Jun 2020 123.3 134 113.3 132 64143400
head(to.yearly(price)[,1:5])
## price.Open price.High price.Low price.Close price.Volume
## 2020-12-31 108 147 85.7 144 747474600
## 2021-12-31 143 150 120.1 132 641690300
## 2022-02-04 132 143 125.6 138 72451300
Notice that this isn’t a straight roll-up but actual summary: for the monthly data, the High is max of daily data for the month, the Low is minimum for the month, while volume is the sum for the month, all as you would expect.
You can also pull out the values at the END of a period-length, including setting number of periods to skip over each iteration:
- get index for last day of period length specified in ‘on’ for every k period.
- apply index to dataset to extract the rows.
## every 2 weeks (on='week's, k=2)
end_wk <- endpoints(price, on="weeks", k=2)
head(price[end_wk,])
## EA.Open EA.High EA.Low EA.Close EA.Volume EA.Adjusted
## 2020-01-03 105.6 107.8 105.1 107.2 1840300 106.5
## 2020-01-17 112.4 113.0 111.6 112.9 3053300 112.2
## 2020-01-31 110.8 110.8 105.5 107.9 6995800 107.2
## 2020-02-14 108.9 109.9 108.8 109.7 1227500 109.0
## 2020-02-28 100.3 101.9 98.6 101.4 6853700 100.7
## 2020-03-13 98.7 99.6 92.8 97.1 5842000 96.5
## every 6 months
end_mth <- endpoints(price, on='months', k=6)
head(price[end_mth,])
## EA.Open EA.High EA.Low EA.Close EA.Volume EA.Adjusted
## 2020-06-30 133 133 131 132 2177400 131
## 2020-12-31 142 144 142 144 1689900 143
## 2021-06-30 144 145 143 144 1799900 143
## 2021-12-31 134 135 132 132 1610900 132
See end of Period Calculations section for how to get an average during periods shown: averages for each 6 month period, for example.
Period Counts
## get the number of weeks, months, years in the dataset (including partial)
price_nw <- nweeks(price)
price_nm <- nmonths(price)
price_ny <- nyears(price)
The price data covers:
- 529 days
- 110 weeks
- 26 months
- 3 years (or portions thereof)
First/last dates:
## get earliest date
st_date <- start(price)
## get last date
end_date <- end(price)
- Start: 2020-01-02
- End: 2022-02-04
Selecting/Subsetting
Time series objects make it easy to slice the data by date ranges. This is an area where time series really shine compared to trying to do the same thing with a data frame.
- xts is super-efficient at interpreting date ranges based on minimal info.
- ‘/’ is a key symbol for separating dates - it is your friend.
- date ranges are inclusive of references used.
Note that in the following examples based on stock market data, dates are missing due to gaps in data - days when markets closed.
- quickly get entire YEAR
## subset on a YEAR (showing head and tail to confirm data is 2021 only)
head(price["2021"])
## EA.Open EA.High EA.Low EA.Close EA.Volume EA.Adjusted
## 2021-01-04 143 144 138 140 3587000 139
## 2021-01-05 140 141 138 141 2117800 141
## 2021-01-06 139 140 136 137 2398500 136
## 2021-01-07 137 141 137 141 2936200 140
## 2021-01-08 141 142 140 142 1902700 141
## 2021-01-11 142 142 139 141 2589800 141
tail(price["2021"])
## EA.Open EA.High EA.Low EA.Close EA.Volume EA.Adjusted
## 2021-12-23 131 133 131 133 1594000 133
## 2021-12-27 133 134 132 133 1377300 133
## 2021-12-28 133 135 133 133 1230700 133
## 2021-12-29 134 134 132 133 912300 133
## 2021-12-30 134 136 134 134 1177000 134
## 2021-12-31 134 135 132 132 1610900 132
- DURING selected month
## get data DURING selected month
price["2020-02"]
## EA.Open EA.High EA.Low EA.Close EA.Volume EA.Adjusted
## 2020-02-03 108 109 104.4 105 4155500 104
## 2020-02-04 106 107 105.3 107 4190500 106
## 2020-02-05 109 109 107.1 108 2895700 107
## 2020-02-06 109 110 108.4 110 2500600 109
## 2020-02-07 109 111 108.7 109 2231300 108
## 2020-02-10 109 110 108.2 109 2170800 108
## 2020-02-11 109 109 107.9 109 1195300 108
## 2020-02-12 110 110 108.6 110 1567200 109
## 2020-02-13 109 109 108.0 109 1627000 108
## 2020-02-14 109 110 108.8 110 1227500 109
## 2020-02-18 109 110 108.7 109 2171400 109
## 2020-02-19 110 111 109.4 110 1540000 109
## 2020-02-20 109 109 107.4 109 4034000 109
## 2020-02-21 108 109 106.8 108 2546900 107
## 2020-02-24 105 108 105.0 107 2817100 106
## 2020-02-25 108 109 105.2 105 3651300 105
## 2020-02-26 106 108 105.6 107 2858000 106
## 2020-02-27 104 106 102.7 103 4906200 102
## 2020-02-28 100 102 98.6 101 6853700 101
- FROM start of year to END OF SPECIFIC MONTH
## get data FROM start of a year to END OF SPECIFIC MONTH
price_jf <- price["2021/2021-02"]
head(price_jf, 4)
## EA.Open EA.High EA.Low EA.Close EA.Volume EA.Adjusted
## 2021-01-04 143 144 138 140 3587000 139
## 2021-01-05 140 141 138 141 2117800 141
## 2021-01-06 139 140 136 137 2398500 136
## 2021-01-07 137 141 137 141 2936200 140
tail(price_jf, 3)
## EA.Open EA.High EA.Low EA.Close EA.Volume EA.Adjusted
## 2021-02-24 138 140 137 138 3735500 137
## 2021-02-25 137 139 134 135 3042600 135
## 2021-02-26 136 138 134 134 3646600 133
- everything BEFORE specified date
## get everything BEFORE specified date (based on what is avaliable)
price["/2020-01-06"]
## EA.Open EA.High EA.Low EA.Close EA.Volume EA.Adjusted
## 2020-01-02 108 108 107 107 1901000 107
## 2020-01-03 106 108 105 107 1840300 107
## 2020-01-06 107 109 107 109 2934200 108
- everything BETWEEN two dates
## get everything BETWEEN two dates
price["2021-06-01/2021-06-04"]
## EA.Open EA.High EA.Low EA.Close EA.Volume EA.Adjusted
## 2021-06-01 142 144 142 144 2610300 143
## 2021-06-02 144 144 141 141 1522100 141
## 2021-06-03 141 143 141 142 1574900 142
## 2021-06-04 143 146 143 145 1919500 145
- everything AFTER specified date
## get everything AFTER specified date
price["2022-01-18/"]
## EA.Open EA.High EA.Low EA.Close EA.Volume EA.Adjusted
## 2022-01-18 138 143 133 134 8758900 134
## 2022-01-19 136 138 135 137 3820300 137
## 2022-01-20 138 142 138 139 3116900 139
## 2022-01-21 138 141 138 139 3120000 139
## 2022-01-24 137 139 132 135 4262100 135
## 2022-01-25 134 134 130 131 2386400 131
## 2022-01-26 131 132 129 130 2333800 130
## 2022-01-27 131 134 131 131 1781500 131
## 2022-01-28 131 132 130 132 2155000 132
## 2022-01-31 131 136 129 133 4471000 133
## 2022-02-01 133 133 129 130 3828000 130
## 2022-02-02 126 138 126 137 5723700 137
## 2022-02-03 135 140 134 137 3409200 137
## 2022-02-04 135 138 135 138 2396700 138
Period Calculations
Time series objects lend themselves well to time-based calculations.
Simple arithmetic between two dates is not as straightforward as might be expected, but still easily doable:
## subtraction of a given metric between two dates
as.numeric(price$EA.Close["2022-01-21"])-as.numeric(price$EA.Close["2022-01-18"])
## [1] 5.1
## subtraction of one metric from another on same date
price$EA.Close["2022-01-18"]-price$EA.Open["2022-01-18"]
## EA.Close
## 2022-01-18 -4.53
Lag.xts is versatile for lag calculations, calculating differences over time:
## calculates across all columns with one command - default is 1 period but can be set with k
head(price-lag.xts(price))
## EA.Open EA.High EA.Low EA.Close EA.Volume EA.Adjusted
## 2020-01-02 NA NA NA NA NA NA
## 2020-01-03 -2.36 -0.60 -1.64 -0.14 -60700 -0.139
## 2020-01-06 1.37 1.56 1.51 1.58 1093900 1.570
## 2020-01-07 2.05 -0.06 1.10 -0.39 -1241800 -0.388
## 2020-01-08 -0.82 0.75 0.05 1.10 959200 1.093
## 2020-01-09 1.82 0.34 0.49 -0.13 -833000 -0.129
## set k for longer lag - this example starting at a date beyond available data for the lag calculations, so no NAs
head(price["2020-01-13/"]-lag.xts(price, k=7))
## EA.Open EA.High EA.Low EA.Close EA.Volume EA.Adjusted
## 2020-01-13 1.11 1.72 1.86 2.48 -43600 2.464
## 2020-01-14 4.08 2.42 3.59 2.38 -116900 2.365
## 2020-01-15 2.84 1.14 2.51 0.83 -1455100 0.825
## 2020-01-16 1.00 2.04 2.22 2.87 415900 2.852
## 2020-01-17 4.19 2.99 3.77 3.44 401700 3.418
## 2020-01-21 2.55 2.63 3.50 3.05 346400 3.031
## works for individual column
price$EA.Close["2022-01-18/"]-lag.xts(price$EA.Close, k=2)
## EA.Close
## 2022-01-18 3.07
## 2022-01-19 6.47
## 2022-01-20 4.97
## 2022-01-21 2.10
## 2022-01-24 -3.68
## 2022-01-25 -8.00
## 2022-01-26 -5.22
## 2022-01-27 0.05
## 2022-01-28 1.94
## 2022-01-31 1.60
## 2022-02-01 -1.98
## 2022-02-02 4.51
## 2022-02-03 7.35
## 2022-02-04 0.54
Diff for calculating differences, based on combination of lag and difference order:
head(diff(price, lag=1, differences=1))
## EA.Open EA.High EA.Low EA.Close EA.Volume EA.Adjusted
## 2020-01-02 NA NA NA NA NA NA
## 2020-01-03 -2.36 -0.60 -1.64 -0.14 -60700 -0.139
## 2020-01-06 1.37 1.56 1.51 1.58 1093900 1.570
## 2020-01-07 2.05 -0.06 1.10 -0.39 -1241800 -0.388
## 2020-01-08 -0.82 0.75 0.05 1.10 959200 1.093
## 2020-01-09 1.82 0.34 0.49 -0.13 -833000 -0.129
head(diff(price, lag=1, differences=2))
## EA.Open EA.High EA.Low EA.Close EA.Volume EA.Adjusted
## 2020-01-02 NA NA NA NA NA NA
## 2020-01-03 NA NA NA NA NA NA
## 2020-01-06 3.73 2.16 3.15 1.72 1154600 1.71
## 2020-01-07 0.68 -1.62 -0.41 -1.97 -2335700 -1.96
## 2020-01-08 -2.87 0.81 -1.05 1.49 2201000 1.48
## 2020-01-09 2.64 -0.41 0.44 -1.23 -1792200 -1.22
- first example: diff with lag=1, differences=1 gives same result as lag.xts with k=1 (or default)
- second example: diff with differences=2 gives the ‘second order difference’: difference between the differences.
- EA.Open:
- 3.73 = 1.37-(-2.36)
- 0.68 = 2.05-1.37
- -2.87 = -0.82-2.05
- …
- EA.Open:
Useful for some forecasting methods, among other applications.
Returns for calculating % change period over period:
- functions in quantmod package designed for financial asset prices, but can be applied to other xts data.
- various periodicity: daily, weekly, monthly, quarterly, yearly or ALL at once (allReturn())
head(dailyReturn(price))
## daily.returns
## 2020-01-02 -0.00556
## 2020-01-03 -0.00130
## 2020-01-06 0.01474
## 2020-01-07 -0.00359
## 2020-01-08 0.01015
## 2020-01-09 -0.00119
head(monthlyReturn(price))
## monthly.returns
## 2020-01-31 -0.000185
## 2020-02-28 -0.060693
## 2020-03-31 -0.011838
## 2020-04-30 0.140661
## 2020-05-29 0.075442
## 2020-06-30 0.074626
- applied to Air Passenger xts to get % change, even though not financial returns:
head(monthlyReturn(Air_xts))
## monthly.returns
## Jan 1949 0.0000
## Feb 1949 0.0536
## Mar 1949 0.1186
## Apr 1949 -0.0227
## May 1949 -0.0620
## Jun 1949 0.1157
Average for period:
- Using the indexes obtained in the ‘endpoints’ example at the end of the Period/Frequency Manipulation section above, calculate averages for the periods.
period.apply(price, INDEX=end_mth, FUN=mean)
## EA.Open EA.High EA.Low EA.Close EA.Volume EA.Adjusted
## 2020-06-30 111 113 110 112 3454968 111
## 2020-12-31 133 134 131 133 2465653 132
## 2021-06-30 140 142 139 140 2508342 140
## 2021-12-31 138 139 136 137 2583249 137
## 2022-02-04 133 136 131 133 3018804 133
Rolling Average:
You can also calculate a rolling (moving) average quickly with ‘rollmean’ function from zoo:
## get subset of data for demo
price_c <- price[,'EA.Close']
price_c <- price_c['/2020-02-28']
## calc rolling mean and add to original data
## - k=3 means 3-period lag
## - align='right' put calculated number at last date in rolling period
price_c$EA_CLose_rm <- rollmean(price_c, k=3, align='right')
## quick dygraph - more on this below
dygraph(price_c, width='100%')
Visualization
Time series objects offer some different visualization opportunities than data frames. Below are a couple of options.
Plot.ts
You can do a quick, simple plot with plot.ts(). Note that in this case the x-axis is the numerical index of the data point, and doesn’t show the date.
plot.ts(price$EA.Close)
Dygraphs
The dygraphs package offers flexibility and interactivity for time series.
- easily show multiple metrics at once.
- scroll over to see details.
- select chart area to zoom in.
library(dygraphs)
dygraph(price[,1:4], width='100%')
- subset for individual columns.
- easily add annotations for events.
## use dyEvent to add annotations
graph <- dygraph(price$EA.Close, width='100%')
graph <- dyEvent(graph, "2020-02-21","Start of Covid 19", labelLoc = 'top')
graph <- dyEvent(graph, "2021-06-10","New product announcements", labelLoc = 'top')
## print chart
graph
Decomposition Plots
Decomposition of a time series enables you to view it broken out into 3 key components (in addition to observed values):
- overall trend
- seasonality trending
- randomness trend (noise)
This can make it easier to ‘separate the signal from the noise’ and get a clearer sense of what is going on.
There has to be data over a long enough period to assess any seasonal trend, so this requires:
- frequency > 1, where 1=annual data; typically it would be at least 4 (quarterly), 12 (monthly), 52 (weekly), 365 (daily).
- period longer than 2 years: one year is not enough to establish a seasonal pattern over time.
- if you get ‘Error in decomposet(): time series has no or less than 2 periods’ it is usually due to violating one or both of the above conditions.
- need to translate xts object to ts for this.
## Air Passengers has enough data
ap_decomp <- decompose(AirPassengers)
plot(ap_decomp)
apx_decomp <- decompose(ts(Air_xts, frequency=12))
plot(apx_decomp)
- same results with both approaches, although the original ts object maintains dates on x-axis, making it easier to interpret.
- interpretation: steady upward trend; peaks at mid-year; randomness fairly large at first, settles down, then appears to be growing over time.
- coincides with what we see in the observed data but makes the patterns more evident.
If we fetch some longer daily data for stock price, we can do the same:
## fetch some longer price data
price_d <- getSymbols('EA', from='2016-01-01', to='2021-12-31', auto.assign = FALSE)
price_decomp <- decompose(ts(price_d$EA.Close, frequency=365), type="additive")
plot(price_decomp)
- we provide 6 full years of data and most of that is used to calculated decomposition.
- x-axis is year number.
- TREND: trending up to about half-way through year 2, then down until about the same point in year 3, then back up, looking like a peak in mid year 4. Not willing to stretch out beyond that. ;)
- SEASONAL: pattern has been detected where tends to be a dip at beginning of year, rising up to a peak toward end of first quarter, dropping sharply, smaller peak mid-year, peak in q3 or early q4, drop with a smaller bump at end of year.
- RANDOM: as to be expected with stock price in general, lots of randomness involved!
Looks like there may be money to be made riding the seasonal wave! Please: do not buy or sell stocks based on this information. ;)
Forecasting
A primary use case for time series objects is forecasting. This is a whole other, involved topic way beyond the scope of this post.
Here is a quick example to show how easy forecasting can be in R. Note that we need to bring in the forecast package for this. (There is also the amazing [tidyverts eco-system](https://tidyverts.org/) for working with time series that I have recently discovered - again, a whole other topic for another time.)
Get an ARIMA Model
Some basic terms, over-simplified for our purposes here:
- ARIMA stands for Auto Regression Integrated Moving Average
- One of the most widely-used time series forecasting methods, although certainly not the only.
- 3 essential parameters for ARIMA are p,d,q: p=periods of lag, d=differencing, q=error of the model.
library(forecast)
## get closing prices for forecasting
price_cl <- price[,4]
## get a model for the time series - using auto.arima for simplicity
fitA <- auto.arima(price_cl, seasonal=FALSE) ## can add trace=TRUE to see comparison of different models
## show model
fitA
## Series: price_cl
## ARIMA(0,1,1)
##
## Coefficients:
## ma1
## -0.089
## s.e. 0.043
##
## sigma^2 estimated as 5.69: log likelihood=-1208
## AIC=2420 AICc=2420 BIC=2428
The model we get back is ARIMA(0,1,1) which means p=0, d=1, q=1. We can generate a model by setting these parameters manually, but auto.arima automatically checks a variety of models and selects the best. When comparing models, lowest AIC and BIC are preferred.
We can check the accuracy of the model. Most useful item here for interpretation and comparison is MAPE (mean average percent error). In this case,
## check accuracy - based on historical data
accuracy(fitA)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.0631 2.38 1.76 0.0311 1.39 0.999 -0.00216
fitAa <- accuracy(fitA)
100-fitAa[,5]
## [1] 98.6
So in this case a MAPE of 1.39 can be seen as accuracy of 98.61%.
We can also plot the residuals of the model for visual inspection.
## check residuals
tsdisplay(residuals(fitA), main='Residuals of Simple ARIMA Forecasting Model for Stock Price')
As usual with residuals, we are looking for mean around 0, roughly evenly distributed. For ARIMA we also get ACF and PACF, where we are looking for bars to be short and at least within blue dotted lines. So looks like we are good to go here.
Create A Forecast
We just need a little more code to create and plot forecast. We can set the forecast period for whatever we want, based on the periodicity of the data, in this case days and we are looking out 30 days.
days=30
fcastA <- forecast(fitA, h=days)
plot(fcastA)
That was easy! And we can use this approach to quickly iterate over various models, if we are not convinced that auto.arima is the best. Of course you can use data frames to create forecasts of various sorts but the xts object makes it super-easy to apply common time series methods.
This also reveals a shortcoming of times-series forecasting:
- dependence of pattern recognition and pattern repetition, which can lead to conservative forecast, especially with noisy data.
- as a result, the forecast is: ‘steady as she goes, with possibility of moving either quite a bit higher or quite a bit lower’.
So not that useful. To be fair, if stock market prices are not actually predictable, so it is a perfectly reasonable outcome that grounds us in reality.
Conclusion
Times series objects are obviously a powerful way to work with time-based data and a go-to when your data is based on time. Particular strengths inculde:
- Ease of manipulation such as aggregation by date periods, selecting date ranges, period calculations.
- Some great visualization options for exploring the data.
- Forecasting which is really the bread and butter of time series objects.
There are some cases where you may prefer to stick with data frames:
- Multi-dimensional data: time series work best when each row represents a distinct time. If you are dealing with multi-dimensional data where dates are broken down by customer, or region, etc., especially in tidy format, you may want to stick with data frame.
- Visualization preferences: if you are more comfortable with using ggplot2 (or other visualization tools geared toward data frames) a data frame may be preferable. Or if the document you are producing has ggplot2 charts, you may want to maintain standard presentation.
- Forecasting needs: if you are doing time series forecasting you will want to use a time series object. If you’re not doing forecasting, there is less of a need. Limitation is that time series forecasting is based only on historical trends in the data and doesn’t include things like correlation with other factors.
Ultimately, the right tool for the job depends on a variety of situational factors, and having a collection of tools at your disposal helps you avoid the ‘when you have is a hammer…’ pitfall. If your data is based on time, time series should be in consideration.
So that’s quite a lot for one blog post - hopefully helps you make the most of your ‘time’!
Resources
Additional resources that may be helpful with time-series and xts in particular:
- xts Cheat Sheet.
- supplementary info. to cheat sheet.
- xts package vignette.
- time series section in R Cookbook 2nd Edition.
- tsibble package info. - time series for tidyverse.