Quick costum bioanalyzer plots using R

The default plots made by the bioanalyzer are, in my opinion, not amazing. For one, the lines are really thin. Furthermore, it would be nice to draw two traces is the same plot, to compare different samples. Luckily, the bioanalyzer software lets you export the raw data. This short post will show how to quickly get bioanalyzer trace data into R and ready for plotting.

On the way we will make use of the tidyverse packages.

library(tidyverse)

Importing one file into R

You can get the raw data out of the bioanalyzer by going File > Export. Somewhat depending on the options you select next, it will spit out many different files:

list.files(path = "./20210106data")
##  [1] "2100 expert_Eukaryote Total RNA Nano_DE13805377_2020-05-25_11-42-52.xml"            
##  [2] "2100 expert_Eukaryote Total RNA Nano_DE13805377_2020-05-25_11-42-52_GEL.bmp"        
##  [3] "2100 expert_Eukaryote Total RNA Nano_DE13805377_2020-05-25_11-42-52_GEL_Ladder.bmp" 
##  [4] "2100 expert_Eukaryote Total RNA Nano_DE13805377_2020-05-25_11-42-52_GEL_Sample1.bmp"
##  [5] "2100 expert_Eukaryote Total RNA Nano_DE13805377_2020-05-25_11-42-52_GEL_Sample2.bmp"
##  [6] "2100 expert_Eukaryote Total RNA Nano_DE13805377_2020-05-25_11-42-52_GEL_Sample3.bmp"
##  [7] "2100 expert_Eukaryote Total RNA Nano_DE13805377_2020-05-25_11-42-52_GEL_Sample4.bmp"
##  [8] "2100 expert_Eukaryote Total RNA Nano_DE13805377_2020-05-25_11-42-52_GEL_Sample5.bmp"
##  [9] "2100 expert_Eukaryote Total RNA Nano_DE13805377_2020-05-25_11-42-52_GEL_Sample6.bmp"
## [10] "2100 expert_Eukaryote Total RNA Nano_DE13805377_2020-05-25_11-42-52_Ladder.csv"     
## [11] "2100 expert_Eukaryote Total RNA Nano_DE13805377_2020-05-25_11-42-52_Results.csv"    
## [12] "2100 expert_Eukaryote Total RNA Nano_DE13805377_2020-05-25_11-42-52_Sample1.csv"    
## [13] "2100 expert_Eukaryote Total RNA Nano_DE13805377_2020-05-25_11-42-52_Sample2.csv"    
## [14] "2100 expert_Eukaryote Total RNA Nano_DE13805377_2020-05-25_11-42-52_Sample3.csv"    
## [15] "2100 expert_Eukaryote Total RNA Nano_DE13805377_2020-05-25_11-42-52_Sample4.csv"    
## [16] "2100 expert_Eukaryote Total RNA Nano_DE13805377_2020-05-25_11-42-52_Sample5.csv"    
## [17] "2100 expert_Eukaryote Total RNA Nano_DE13805377_2020-05-25_11-42-52_Sample6.csv"

Annoyingly, the raw data for each well of the bioanalyzer chip is found in a separate csv file. Manually pasting these together in excel is too much clicking, and more importantly error prone. Depending on your location, excel may see the comma as the thousand separator, which makes converting the csv file into excel columns with Data > Text to Columns go wrong for some rows (specifically those with a round number for time and high fluorescence, like 40,123). Importing the csv files into R will not run into this problem.

I generally start out by creating an R project in the folder containing the bioanalyzer data. Among other things, this avoids having to set the working directory.

The next step is to separate the csv files containing the raw data from the rest. These files always end in “Sample” + the well number + “.csv” (at least for RNA nano and pico chips). We can use this pattern to isolate them, by creating a regular expression and feeding this pattern to list.files(). We are not interested in what comes before Sample, so we can preface Sample with the * symbol. This means zero or more of anything can occur. Sample is always followed by one or more digits. We tell the function to look for a digit using \\d and use + to allow for one or more them. Finally, all these files should end in .csv. The dollar sign signifies that the string should end there.

# Ignore the path = , this is only here because of how the website is structured. 
# If you have made an r project inside the folder containing the bioanalyzer 
# data you won't need it. 

samples <- list.files(path = "./20210106data", pattern = "*Sample\\d+.csv$")
samples
## [1] "2100 expert_Eukaryote Total RNA Nano_DE13805377_2020-05-25_11-42-52_Sample1.csv"
## [2] "2100 expert_Eukaryote Total RNA Nano_DE13805377_2020-05-25_11-42-52_Sample2.csv"
## [3] "2100 expert_Eukaryote Total RNA Nano_DE13805377_2020-05-25_11-42-52_Sample3.csv"
## [4] "2100 expert_Eukaryote Total RNA Nano_DE13805377_2020-05-25_11-42-52_Sample4.csv"
## [5] "2100 expert_Eukaryote Total RNA Nano_DE13805377_2020-05-25_11-42-52_Sample5.csv"
## [6] "2100 expert_Eukaryote Total RNA Nano_DE13805377_2020-05-25_11-42-52_Sample6.csv"

It worked!

Let’s look at what happens when we import one of these files.

# Again, the path = , and paste0 shenanigans is only needed for the website. 
# In a normal situation just read_csv(samples[1]) would be enough.

path <- "./20210106data/"
sample1 <- read_csv(paste0(path, samples[1]))
Table 1: Import attempt one
Data File Name2100 expert_Eukaryote Total RNA Nano_DE13805377_2020-05-25_11-42-52.xad
Data File PathC:Files (x86)\2100 bioanalyzer\2100 expert\2020-05-25
Date CreatedMonday, May 25, 2020 9:42:52 AM
Date Last ModifiedMonday, May 25, 2020 10:06:42 AM
Version CreatedB.02.10.SI764
Version Last ModifiedB.02.10.SI764
Assay NameEukaryote Total RNA Nano Series II
Assay PathC:Files (x86)\2100 bioanalyzer\2100 expert
Assay TitleEukaryote Total RNA Nano
Assay Version2.6
Number of Samples Run12
Sample Name0 ctrk
Number of Events1060
TimeValue
171.657798E-02
17.052.474964E-02

The first few rows are all metadata we are not really interested in. We only need the Time and Value information. Therefore, we can skip the first 17 rows (including some empty rows).

sample1 <- read_csv(paste0(path, samples[1]),
                    skip = 17)
## Warning: 2 parsing failures.
##  row   col expected    actual                                                                                             file
## 1061 Time  a double Alignment './20210106data/2100 expert_Eukaryote Total RNA Nano_DE13805377_2020-05-25_11-42-52_Sample1.csv'
## 1061 Value a double On        './20210106data/2100 expert_Eukaryote Total RNA Nano_DE13805377_2020-05-25_11-42-52_Sample1.csv'

But now we get 2 parsing failures. That is because, all the way at the bottom of the file, some text (“alignment, on”) appears where only numbers are expected, which results in NA values in the final row of our table.

Table 2: Import attempt two
TimeValue
69.700.7465172
69.750.7465172
69.800.7465172
69.850.7465172
69.900.7465172
69.950.7465172
NANA

To prevent this we can tell read_csv to not read this far into the file.

sample1 <- read_csv(paste0(path, samples[1]),
                    skip = 17,
                    n_max = 1060)

No failures this time!

Importing multiple files into r

We have successfully imported the information of one well into r. But there are 11 or 12 wells on a chip. One option is to simply copy paste the code 12 times and change the numbers each time. But this leads to a lot of lines of code and increases the risk of errors.

sample1 <- read_csv(paste0(path, samples[1]),
                    skip = 17,
                    n_max = 1060)

sample2 <- read_csv(paste0(path, samples[2]),
                    skip = 17,
                    n_max = 1060)

# O no, a typo
sample2 <- read_csv(paste0(path, samples[3]),
                    skip = 17,
                    n_max = 1060)
# Etc. 

Instead, we can use a single for loop to import all the data files. In this loop we also save them into one big data table. We will use mutate to make a new column that contains the well number each measurement originates from. str_extract is used to extract this information from the file name.

# Create an empty table to save the data in.
all_data <- tibble()

for(i in samples){
  # Import one by one
  data <- read_csv(paste0(path, i), 
                   skip = 17, 
                   n_max = 1060)
  
  # Add the well number information
  data <- mutate(data, sample_nr = str_extract(i, pattern = "Sample\\d+"))
  
  # Store in the big table
  all_data <- rbind(all_data, data)
}
Table 3: Note the extra column with the well information
TimeValuesample_nr
17.000.0165780Sample1
17.050.0247496Sample1
17.100.0391493Sample1
17.150.0626396Sample1
17.200.0790722Sample1
17.250.0928879Sample1
17.300.0869223Sample1
17.350.0712983Sample1

It would be even more convenient if the sample_nr column only contained, well … , numbers. Again str_extract can help use remove all the unwanted symbols. Even though we are only left with numbers, we still need to convert it to a numeric, because a string will stay a string until told otherwise.

all_data <- all_data %>%
  mutate(sample_nr = as.numeric(str_extract(sample_nr, pattern = "\\d+$")))
Table 4: Almost there
TimeValuesample_nr
17.000.01657801
17.050.02474961
17.100.03914931
17.150.06263961
17.200.07907221
17.250.09288791
17.300.08692231
17.350.07129831

Adding sample information

Now it is also easy to add additional information about each sample. You can simply prepare a small table in excel that contains at least a sample_nr column, and any number of additional columns with extra information about each sample.

exp_design <- read_delim("20210106_exp_design.csv", delim = ";")
Table 5: Example of a small experimental design table
sample_nrtreatmentreplicate
1CTRL1
2CTRL2
3CTRL3
4treatment1
5treatment2
6treatment3

Finally we can join these two tables together based on the sample_nr column.

all_data <- all_data %>%
  left_join(exp_design, by = "sample_nr")
Table 6: The final table
TimeValuesample_nrtreatmentreplicate
17.000.01657801CTRL1
17.050.02474961CTRL1
17.100.03914931CTRL1
17.150.06263961CTRL1
17.200.07907221CTRL1
17.250.09288791CTRL1
17.300.08692231CTRL1
17.350.07129831CTRL1

Saving and/or plotting

At this point you could save the final table if you want to do further analysis in other software.

write_csv(all_data, "bioanalyzer_data.csv")

Or you can make plots in r!

library(ggpubr)
library(ggsci)

all_data %>%
  ggplot(aes(x = Time, y = Value, color = as.factor(replicate)))+
    geom_line()+
    facet_grid(cols = vars(treatment))+
    scale_x_continuous(breaks = c(20, 40, 60))+
    scale_color_jco(name = "Replicate")+
    theme_pubr()+
    labs(x = "Time in seconds",
         y = "Fluorescence (a.u.)")

Jorik Bot
Jorik Bot
PhD student

Site under construction.

comments powered by Disqus

Related