Jeromy Anglim's Blog: Psychology and Statistics


Thursday, May 17, 2012

Getting Started with R Markdown, knitr, and Rstudio 0.96

This post examines the features of R Markdown using knitr in Rstudio 0.96. This combination of tools provides an exciting improvement in usability for reproducible analysis. Specifically, this post (1) discusses getting started with R Markdown and knitr in Rstudio 0.96; (2) provides a basic example of producing console output and plots using R Markdown; (3) highlights several code chunk options such as caching and controlling how input and output is displayed; (4) demonstrates use of standard Markdown notation as well as the extended features of formulas and tables; and (5) discusses the implications of R Markdown. This post was produced with R Markdown. The source code is available here as a gist. The post may be most useful if the source code and displayed post are viewed side by side. In some instances, I include a copy of the R Markdown in the displayed HTML, but most of the time I assume you are reading the source and post side by side.

Getting started

To work with R Markdown, if necessary:

  • Install R
  • Install the lastest version of RStudio (at time of posting, this is 0.96)
  • Install the latest version of the knitr package: install.packages("knitr")

To run the basic working example that produced this blog post:

opts_knit$set(upload.fun = imgur_upload)  # upload all images to imgur.com

Prepare for analyses

set.seed(1234)
library(ggplot2)
library(lattice)

Basic console output

To insert an R code chunk, you can type it manually or just press Chunks - Insert chunks or use the shortcut key. This will produce the following code chunk:

```{r}

```

Pressing tab when inside the braces will bring up code chunk options.

The following R code chunk labelled basicconsole is as follows:

```{r basicconsole}
x <- 1:10
y <- round(rnorm(10, x, 1), 2)
df <- data.frame(x, y)
df
```

The code chunk input and output is then displayed as follows:

x <- 1:10
y <- round(rnorm(10, x, 1), 2)
df <- data.frame(x, y)
df
##     x    y
## 1   1 1.31
## 2   2 2.31
## 3   3 3.36
## 4   4 3.27
## 5   5 5.04
## 6   6 6.11
## 7   7 8.43
## 8   8 8.98
## 9   9 8.38
## 10 10 9.27

Plots

Images generated by knitr are saved in a figures folder. However, they also appear to be represented in the HTML output using a data URI scheme. This means that you can paste the HTML into a blog post or discussion forum and you don't have to worry about finding a place to store the images; they're embedded in the HTML.

Simple plot

Here is a basic plot using base graphics:

```{r simpleplot}
plot(x)
```
plot(x)

plot of chunk simpleplot

Note that unlike traditional Sweave, there is no need to write fig=TRUE.

Multiple plots

Also, unlike traditional Sweave, you can include multiple plots in one code chunk:

```{r multipleplots}
boxplot(1:10~rep(1:2,5))
plot(x, y)
```
boxplot(1:10 ~ rep(1:2, 5))

plot of chunk multipleplots

plot(x, y)

plot of chunk multipleplots

ggplot2 plot

Ggplot2 plots work well:

qplot(x, y, data = df)

plot of chunk ggplot2ex

lattice plot

As do lattice plots:

xyplot(y ~ x)

plot of chunk latticeex

Note that unlike traditional Sweave, there is no need to print lattice plots directly.

R Code chunk features

Create Markdown code from R

The following code hides the command input (i.e., echo=FALSE), and outputs the content directly as code (i.e., results=asis, which is similar to results=tex in Sweave).

```{r dotpointprint, results='asis', echo=FALSE}
cat("Here are some dot points\n\n")
cat(paste("* The value of y[", 1:3, "] is ", y[1:3], sep="", collapse="\n"))
```

Here are some dot points

  • The value of y[1] is 1.31
  • The value of y[2] is 2.31
  • The value of y[3] is 3.36

Create Markdown table code from R

```{r createtable, results='asis', echo=FALSE}
cat("x | y", "--- | ---", sep="\n")
cat(apply(df, 1, function(X) paste(X, collapse=" | ")), sep = "\n")
```
x y
1 1.31
2 2.31
3 3.36
4 3.27
5 5.04
6 6.11
7 8.43
8 8.98
9 8.38
10 9.27

Control output display

The folllowing code supresses display of R input commands (i.e., echo=FALSE) and removes any preceding text from console output (comment=""; the default is comment="##").

```{r echo=FALSE, comment="", echo=FALSE}
head(df)
```
  x    y
1 1 1.31
2 2 2.31
3 3 3.36
4 4 3.27
5 5 5.04
6 6 6.11

Control figure size

The following is an example of a smaller figure using fig.width and fig.height options.

```{r smallplot, fig.width=3, fig.height=3}
plot(x)
```
plot(x)

plot of chunk smallplot

Cache analysis

Caching analyses is straightforward. Here's example code. On the first run on my computer, this took about 10 seconds. On subsequent runs, this code was not run.

If you want to rerun cached code chunks, just delete the contents of the cache folder

```{r longanalysis, cache=TRUE}
for (i in 1:5000) {
    lm((i+1)~i)
}
```

Basic markdown functionality

For those not familiar with standard Markdown, the following may be useful. See the source code for how to produce such points. However, RStudio does include a Markdown quick reference button that adequatly covers this material.

Dot Points

Simple dot points:

  • Point 1
  • Point 2
  • Point 3

and numeric dot points:

  1. Number 1
  2. Number 2
  3. Number 3

and nested dot points:

  • A
    • A.1
    • A.2
  • B
    • B.1
    • B.2

Equations

Equations are included by using LaTeX notation and including them either between single dollar signs (inline equations) or double dollar signs (displayed equations). If you hang around the Q&A site CrossValidated you'll be familiar with this idea.

There are inline equations such as $y_i = \alpha + \beta x_i + e_i$.

And displayed formulas:

$$\frac{1}{1+\exp(-x)}$$

knitr provides self-contained HTML code that calls a Mathjax script to display formulas. However, in order to include the script in my blog posts I took the script and incorporated it into my blogger template. If you are viewing this post through syndication or an RSS reader, this may not work. You may need to view this post on my website.

Tables

Tables can be included using the following notation

A B C
1 Male Blue
2 Female Pink

Hyperlinks

  • If you like this post, you may wish to subscribe to my RSS feed.

Images

Here's an example image:

image from redmond barry building unimelb

Code

Here is Markdown R code chunk displayed as code:

```{r}
x <- 1:10
x
```

And then there's inline code such as x <- 1:10.

Quote

Let's quote some stuff:

To be, or not to be, that is the question: Whether 'tis nobler in the mind to suffer The slings and arrows of outrageous fortune,

Conclusion

  • R Markdown is awesome.
    • The ratio of markup to content is excellent.
    • For exploratory analyses, blog posts, and the like R Markdown will be a powerful productivity booster.
    • For journal articles, LaTeX will presumably still be required.
  • The RStudio team have made the whole process very user friendly.
    • RStudio provides useful shortcut keys for compiling to HTML, and running code chunks. These shortcut keys are presented in a clear way.
    • The incorporated extensions to Markdown, particularly formula and table support, are particularly useful.
    • Jump-to-chunk feature facilitates navigation. It helps if your code chunks have informative names.
    • Code completion on R code chunk options is really helpful. See also chunk options documentation on the knitr website.
  • Other recent posts on R markdown include those by :

Questions

The following are a few questions I encountered along the way that might interest others.

Annoying <br/>'s

Question: I asked on the Rstudio discussion site: Why does Markdown to HTML insert <br/> on new lines?

Answer: I just do a find and delete on this text for now. Specifically, I have a sed command that extracts just the content between the body tags and removes br tags. I can then, readily incorporate the result into my blogposts.

sed -i -e '1,/<body>/d' -e'/^<\/body>/,$d' -e 's/<br\/>$//' filename.html

Temporarily disable caching

Question: I asked on StackOverflow about How to set cache=FALSE for a knitr markdown document and override code chunk settings?

Answer: Delete the cache folder. But there are other possible workflows.

Equivalent of Sexpr

Question: I asked on Stack Overvlow about whether there an R Markdown equivalent to Sexpr in Sweave?.

Answer: Include the code between brackets of “backtick r space” and “backtick”. E.g., in the source code I have calculated 2 + 2 = 4 .

Image format

Question: When using the URI scheme images don't appear to display in RSS feeds of my blog. What's a good strategy?

Answer: One strategy is to upload to imgur. The following provides an example of exporting to imgur.

Add the following lines of code near the top of the file:

``` {r optsknit}
opts_knit$set(upload.fun = imgur_upload) # upload all images to imgur.com
```

I found that the function failed when I was at work behind a firewall, but worked at home.

43 comments:

  1. Great post. I wonder how to get tables like in LaTeX using the code:

    ```{r out, results = "asis"}
    print(
    xtable(
    x = df
    )
    , table.placement = "H"
    , caption.placement = "top"
    , include.rownames = TRUE
    , include.colnames = TRUE
    , size = "small"
    )

    ```

    ReplyDelete
    Replies
    1. Great question. Given that Markdown just passes on HTML code, I guess, the question is how to produce HTML tables that do what you want. Where you want control, I imagine HTML would be a better option than trying to use the Github-enhanced Markdown syntax for tables.

      As I understand it xtable supports html output:

      http://blog.revolutionanalytics.com/2010/02/making-publicationready-tables-with-xtable.html

      That said, I'm a little fuzzy on the details. I'm currently preparing a presentation for Melbourne R Users on reproducible data analysis; as part of that I want to better understand table creation options. So I might write a separate post about tables in the near future.

      Delete
    2. I've been using xtable to produce my tables, the only trick is make sure your chunk options include results='asis' and to print the xtable as html. For example:

      ```{r covars, results='asis'}
      # Sample information table
      pDataTable <- xtable(pData(affyRaw))
      print(pDataTable, type='html')
      ```

      Delete
  2. Another question, how to center graphs?

    ReplyDelete
    Replies
    1. I haven't felt the need to do that for simple blog posts. For now, if I wanted that level of control, I'd probably switch to LaTeX.

      That said, it's interesting to think how far you can push Markdown and HTML.

      It seems like fig.pos might only work for LaTeX at present (I haven't tried: http://yihui.name/knitr/options ).

      I'd look into putting div or table html tags around the R code chunk and restrict output to one plot per code chunk.

      http://www.w3schools.com/tags/att_div_align.asp

      Delete
    2. Better avoid table and use div..more ...Positioning Div

      ling

      Delete
  3. Markdown is supposed to be simple, so it is not straightforward to center graphs. In Rnw/HTML/reStructured documents, the chunk option fig.align='center' will do the job. For markdown, you'd better customize the CSS for images when it is converted to HTML, e.g. display: block; margin: auto

    ReplyDelete
  4. Thanks for your post, now I understand a little better how this package works. But I'm still having some questions (I'm still a little confused...)

    What's the main difference between this package and knitr?

    Also, I already used the brew package to successfully generate automated reports in Latex (I used loops to open more than 200 images and apply the same template to each one with a few lines of code). Is there an option to use internal loops to generate the same as I did with Latex, but with HTML code?

    ReplyDelete
  5. 1. The most important features are in knitr. knitr does the conversion from R Markdown to Markdown.
    R Studio provides a pleasant user interface and a particular Github-enabled version of a Markdown to HTML converter.

    2. Yes you can print from loops in knitr.

    For example, this code produces a set of 10 plots.

    ```{r}
    for (i in 1:10) {
    plot(i, main=i)
    }
    ```

    Here's the resulting HTML:

    http://cloud.github.com/downloads/jeromyanglim/assorted-files/loops.html

    ReplyDelete
    Replies
    1. Thanks for the answer, now I got it (I guess).

      There's only one more thing. With brew, you can loop text with R code. Can you do that with Markdown and knitr? I guess for the moment you can't.

      For example, I try this (like I did with brew), expecting to get 10 plots, with text below each one:

      ```{r}
      for (i in 1:10) {
      plot(i, main=i)
      ```
      *this is a test*
      ```{r}
      }
      ```

      Didn't work... but, I tried with brew first, and then using the normal procedure and worked well

      <% for (i in 1:3) { -%>
      ```{r}
      plot(<%cat(i)%>, main=<%cat(i)%>)
      ```
      *this is a test*
      <% } -%>

      Is there any easier way to get it?

      Delete
  6. Jeremy, outstanding post. as for tables, is there any way to control borders and lines? what i am looking for is to have model results automatically populating tables that I can then just copy and paste into manuscripts. the closer they might be to publication quality the better.

    thanks

    ReplyDelete
    Replies
    1. I've achieved a lot of control over table borders and lines with manual functions in R that produce LaTeX with Sweave.
      However, I'm still exploring HTML options like xtable for working with R Markdown. Thanks for reminding me that I need to think more about it. In the meanwhile, perhaps its a question for Stackoverflow: I asked it here: http://stackoverflow.com/q/10774285/180892

      Delete
    2. Very good, helpful post. I wrote up some notes about using xtable to generate HTML tables and styling them using custom CSS, which might address some of the questions in the comments here.

      Delete
    3. Thanks so much. That's really useful.

      Delete
  7. Jeromy do you know how can I enforce the html page created to support multilingual characters (utf-8 encoding), under windows? Thank you.

    ReplyDelete
    Replies
    1. No, not really. perhaps it is a question for Stack Overflow.
      Just a few thoughts: Rstudio lets you control the file encoding and permits UTF-8. If you know what needs to appear in your HTML file, you could use a sed command to insert it over the HTML, or you could use the markdown package to have more control over markdown to HTML conversion.

      Delete
  8. Great post.
    I followed the instruction of this post along with others, and
    I got to have everything work except that my graphics don't look as good as yours on blogspot. Your graphics are nicely blurly pixelized, but mine isn't. Also, do you upload the R markdowned document? I just copy and paste the produced html other than html tag and /html tag on the blogger as a new post. Is that the workflow you are using?

    ReplyDelete
    Replies
    1. With regards to images, I'm not sure, there are certainly graphics options on the knitr site. I also use the imgur option above for the final version.

      With regards to workflows, for this post I just copied the HTML content between BODY tags into my HTML editor on blogspot.

      Going forward I'll probably use a script using the markdown package which allows me to only output that part of the HTML file by default.
      See this answer:

      http://stackoverflow.com/a/10969107/180892

      Delete
    2. Thank you ver much. You answered everythign I wanted to know! I am going to look into more details of my own question based on your suggestion and let you know!

      Delete
  9. Since you're an R, vim, and markdown guy -- do you know how to get R and markdown syntax highligthing in VIM for a .Rmd file?

    ReplyDelete
    Replies
    1. As much as I miss Vim key bindings, I've generally been using R Studio when working with R Markdown.

      There is some discussion of Vim support here:

      https://github.com/yihui/knitr/issues/252

      and the latest release (at time of posting) of the Vim-r-plugin mentions knitr support.

      http://www.vim.org/scripts/script.php?script_id=2628

      Delete
  10. How is it possible to suppress the print outputs when loading a library? I have tried echo=FALSE, results='hide', warning=FALSE. However, I am still getting the library messages.

    Cheers

    ReplyDelete
    Replies
    1. try message=FALSE

      http://yihui.name/knitr/options

      Delete
    2. Ahh, perfect. And, thanks for the fast reply!

      Delete
  11. Thanks for a great post. Could you show how to adjust the size of an image? My image was many times larger than I wanted it to be.

    ReplyDelete
    Replies
    1. If you are referring to images generated by R, then check out the knitr figure options listed here: http://yihui.name/knitr/options

      If you are referring to external images, a couple of options include: (a) resizing the source image, (b) using raw HTML with the IMG tag with various size attributes: http://www.w3schools.com/tags/tag_img.asp

      Delete
    2. Thank you for your suggestion. I was referring to the external image. Will try the IMG tag.

      Delete
  12. Dear Jeromy, I really appreciated this post. You explained things clearly and concisely: thank you!
    I had some problems knitting the gist though:

    Error in parse_block(g) : duplicated label 'basicconsole'
    Calls: knit ... process_file -> split_file -> lapply -> FUN -> parse_block

    What I found was that if I changed the word "basicconsole" at line 54, this (and a bunch of other duplicated label errors) went away and I could knit things beautifully.

    Any words of wisdom on that?

    ReplyDelete
    Replies
    1. I agree that you can't duplicate code chunk labels. The gist works fine for me. I imagine the issue is that there are several R Markdown code blocks which are tab indented. They look like R code chunks, except they are only displayed as code. I included these so that they are designed to only display in the resulting markdown file. Perhaps at some point, the tabs were removed from your version of the r markdown file, and code that was meant to be displayed as markdown code began to be read as actual R code chunks.

      Delete
  13. Great post, and many thanks for sharing your code. One question. What does the "" command produce and why did you use it? Thanks in advance.

    ReplyDelete
  14. I assume you are talking about the triple backtick ``` .
    It is used as a delimiter for R code blocks in R Markdown files. RStudio makes it easy to insert with an "insert code chunk" menu item and shortcut key.

    ReplyDelete
  15. Thank you, this post was great! I'm trying to print a short 20-row dataframe about 6 columns wide. It prints fine in the r console, but only prints two columns at a time via rmd. I tried out.width and fig.width with no luck. Is this what you guys are talking about with xtable()?

    ReplyDelete
    Replies
    1. I'm not sure about console printing issues. But yes, you could use xtable with the HTML option to export a data.frame to an HTML table. e.g., see some discussion here: http://jeromyanglim.blogspot.com.au/2012/06/how-to-convert-sweave-latex-to-knitr-r.html

      Delete
  16. Got it, thanks! The syntax was simpler than I thought. I was being stubborn and wanting to print a data frame the regular way.

    What's your coding process like? I like to write my chunks and run them on the console as I go, then knit it every few chunks. I'm not caching because I know one day it will come back and bite me, so knitting is a bit clunky. Just wondering if you have a best practice flow / cadence when writing your scripts.

    Thanks again for your blogs!

    Cheers

    ReplyDelete
    Replies
    1. It sounds like I do something similar. I've been using R Studio for the last year or so. I'll typically have a first chunk that imports all the libraries and prepares the data for analysis. Thus, I can run that to get the workspace into a state that I can start running analyses. I also try to minimise dependencies between other chunks.

      And yes, I generally do a lot of sending code from the script editor to the console and knitting the whole lot from time to time.

      Delete
  17. Hi, I used the https://gist.github.com/2716336,
    downloaded as raw, and opened in RStudio 0.97 with knitr 0.9.
    On "Knit HTML" I get "duplicated label 'basicconsole'".
    It seems it can't distinguish between the stuff that is indented
    by two tabs to show the code, and the actual chunk you want to be executed.
    Am I missing anything obvious here ?
    Yours,
    Steffen

    ReplyDelete
    Replies
    1. I agree. knitr did not used to have this problem. Perhaps the parser was updated and it isn't realising that tab indented code blocks are just plain code. I've lodged an issue https://github.com/yihui/knitr/issues/443

      I updated the gist to remove the duplicate labels. The gist hopefully works again. However, knitr seems to process these tab indented code blocks. So, perhaps this is just the new behaviour. In Sweave, the behaviour was to only process R Code chunks if the start and end symbols occur at the start of the new lines. Perhaps this new behaviour is intended.

      Delete
    2. Sorry about that. I have mentioned the solution at https://github.com/yihui/knitr/issues/443#issuecomment-11747373

      Delete
    3. Note the <br/> problem has gone in the latest version of markdown.

      Delete
  18. Hi Jeromy,
    Thanks for a great post! I like your method for displaying tables---a simple, explicit method. Since first reading this post, however, I learned of pander's table functions. Have you tried it?
    Not being very adept with Windows command prompt, I was happy to learn from the following post that so much can be done within R: http://quantifyingmemory.blogspot.ca/2013/02/reproducible-research-with-r-knitr.html
    For my own work and in the interest of being able to advise grad students (in epidemiology), who may be even a little less computer savvy than me, I am trying to find straightforward literate programming/reproducible research techniques within RStudio and understand where there is overlap in the various packages and related tools. I'll really appreciate any further comments and posts you can add.

    ReplyDelete
    Replies
    1. Thanks for sharing. I'll have to check out pander.

      Delete
  19. Great cheat sheet for knitr newbies, thanks! I have a follow-up, though. Each chunk generates a .png, which default name is the name of the chunk. For example,
    ```{r ChunkName}
    p<-ggplot(ds, aes(x=varible1))
    p<-p+geom_bar()
    print(p)
    ```
    will produce a .png named "ChunkName.png"

    If there are a several graphs to be produced by the same chunk, they will be automatically ordered. So
    ```{r ChunkName}
    years <- 1997:1999
    for (year in years){
    p<-ggplot(ds[ds$year==year,], aes(x=variable1))
    p<-p+geom_bar()
    print(p)
    }
    }
    ```
    will produce 3 graphs: "ChunkName1.png", "ChunkName2.png" and "ChunkName3.png". However, I'd like to name the pngs myself, tying them to some automatic variable (like year, in this example), so that it produced something like
    "ChunkName_1997.png", "ChunkName_1998.png" and "ChunkName_1999.png".

    Any ideas how to override the defaults?


    ReplyDelete
    Replies
    1. I guess you could use the traditional graphics devices like png and pdf
      http://stat.ethz.ch/R-manual/R-devel/library/grDevices/html/Devices.html

      Otherwise, I'm not sure. I guess you could scan the knitr options, but I haven't seen anything designed to do this: http://stat.ethz.ch/R-manual/R-devel/library/grDevices/html/Devices.html

      Delete