Follow the steps here.
FAQ
Which are the steps to start a project?
Where can I find information on sample processing and analysis procedures?
All the information on the materials and methods used for sample processing as well as the description of delivered files can be found in the PDF report attached to the delivery.
What are replicate and coverage recommendations for epigenomics BS-seq experiments?
High coverage (10-30X) is most important when the goal is to detect short DMRs with small methylation differences. Very IMPORTANT when analyzing closely related sample types. However, specificity and sensitivity are maximized by increasing the number of replicates per group, not by increasing sequencing depth (10x per sample is enough). Biological replicates should be analyzed separately to increase the power, as opposed to being pooled together for the analysis. At least two biological replicates should be used for differentially methylated region (DMR) analysis: an appropriate number of replicates is influenced by the degree of within-group heterogeneity and the magnitude of between-group differences. Read more here.
Where can I find information on sample processing and analysis procedures?
All the information on the materials and methods used for sample processing as well as the description of delivered files can be found in the PDF report attached to the delivery.
Can whole-genome sequencing be used for the assessment of mitochondrial and chloroplast DNA?
Organelle reads (mtDNA in animals and fungi and mt- and cpDNA in plants) can be extracted from nuclear reads in WGS. Organelle copies per cell varies greatly, however, eukaryotic cell could contain thousands of mtDNA/cpDNA copies in comparison to having only two copies of the nuclear DNA. A rough estimate is that a human WGS with a mean nuclear genome coverage of 30–40× simultaneously provides 3.000–4.000× mean coverage of the mitochondrial genome (exp on total genomic DNA isolated from peripheral blood using standard methods).
What should I use to elute nucleic acids?
For most NGS library prep protocols, DNA must be resuspended in Tris-HCl (ph 8.0 - 8.5), which is the buffer used to elute DNA in most commercial kits (NO EDTA must be present in the solution except for specific handling previously agreed). UltraPure Water is a second-choice alternative. RNAse-free water is the mandatory elution for RNA.
How to prevent spill out of samples during shipment?
What is the advantage of using NovaSeq 6000 in metabarcoding experiment?
Many of our customers are switching to NovaSeq6000 2x250bp sequencing mode as it carries many advantages at reduced prices. Unless your pipeline strictly requires read overlapping with an amplicon that is >480bp there is no benefit to staying on MiSeq.If the amplicon size you want to cover is less than 480bp or any way you do not need to overlap the two reads, sequencing on a 250bp PE mode on a NovaSeq platform, will make you able to access:
- prices that are going to be as much as 40% less than standard MiSeq sequencing;
- faster turnaround (360 libraries @ 100k fragments will take at least 4 MiSeq runs, whilst on NovaSeq you have all by one lane);
- much more data: you will likely get in the order of 200-300k fragments on an average per sample;
- better quality: the drop in Q30 on Novaseq PE250 is minor than on MiSeq.
Do reads contain adapters?
Unless differently agreed, reads are provided with masking of adapters read-through. When a minimum of 5bp read-through is found with respect to sample-specific (barcode included) adapters, bases are masked with N character. Thus, read length is maintained to its original size. No quality clipping is applied on raw reads delivery, while regularly used in our standard bioinformatic pipelines.
How can I eliminate impurities and inhibitors from DNA extracts?
DNA purification is of vital importance since it can help determine the success or failure of library preparation. Clean-up of DNA to remove buffer salts, enzymes, or other substances that can inhibit PCR and other reactions can be done with ethanol precipitation, silica column-based kits or magnetic beads. In metabarcoding experiments a widely applied method is the dilution of the sample or the extracted nucleic acid, which will automatically result in a dilution of the PCR inhibitors. However, keep in mind that dilution is accompanied by a decrease in sensitivity.
What causes variable sequencing depth among samples of the same batch?
The variability in the production is a question of library loading on the flow-cell. To get as close as possible to a desired number of reads for each sample, it’s important to pool samples in equimolar ratios so each library is evenly represented. If libraries to be pooled do not have the same or similar size ranges, then sample distribution (especially after size selection) will require re-pooling and additional sequencing runs. Sample pooling and normalization may be a tricky task and as much as one tries to balance the libraries, the performance of the individual library is not always predictable. However, it is important to note that discrepancies are not due to the quality issues.
How should I approach the removal of UMI sequence in RNA-seq?
In our RNA-seq experimets, UMIs are an extra reading after the i7 index and they do not need to be removed at any time, since they have never been part of any R1 or R2 sequence. They are separated and incorporated in the read name during demultiplexing of indexes.
Why do I have adapter dimers in the final library?
Adapter dimers in the final library are diagnosed with chip-based capillary electrophoresis, such as BioAnalyzer, as a peak at 120-170 bp. Given that they can bind and cluster to the flow-cell they subtract a significat portion of sequencing reads from the desired library fragments and can negatively affect sequencing data quality. The presence of dimers can be caused by insufficient or degraded starting material. It is essential to clean final libraries prior to sequencing. However, when dealing with poor libraris, elimination of dimers mostly impoverishes library quantity, which in turn results in poor sequencing yields. In metabarcoding experiments, it would be useful to include control samples. Not just positive ones, such as mock communities, but also negative controls. A blank is usually used when dealing with samples with low microbial biomass to rule out possible contaminants. Nevertheless, sometimes input DNA may be abundant, but may lack template for the amplification. Yuo can find more information, here.
How many biological replicates are needed in an RNA-seq experiment?
In this RNA-seq study with 48 biological replicates in each of two conditions authors tried to answer this question and provide guidelines for experimental design. With three biological replicates only 20%–40% of the significantly differentially expressed (SDE) genes identified with the full set of 42 clean replicates. To achieve >85% for all SDE genes regardless of fold change requires more than 20 biological replicates. It is suggested that at least six biological replicates should be used, rising to at least 12 when it is important to identify SDE genes for all fold changes.
How does the count normalization with DESeq2 work?
You can find detailed description of count normalization with DESeq2 at https://hbctraining.github.io/DGE_workshop_salmon_online/lessons/02_DGE_count_normalization.html
Which protocol should be used to extract RNA?
We strongly recommend commercial kits for totalRNA or miRNA extraction (e.g. Spectrum Plant Total RNA Kit, TRI-REAGENT, RNeasy, MirVANA or MirPremier).
What should I do before RNA sample shipment?
In order to obtain a high quality sequencing data, customers must provide a good quality RNA, in detail:
· the 260/280 ratio of your RNA sample should be >1.8;
· RNA samples should be resuspended in nuclease-free water;
· on a gel, high-quality RNA should have two prominent bands (e.g. ribosomal RNA) with the 28S one (at 4.5 kb) should be twice the intensity of 18S (at 1.9 kb);
· on an Agilent Bioanalyzer 2100, RNA should have an RNA Integrity Number
(RIN) > 8.
Customers need to provide the result analysis of Agilent 2100 Bioanalyzer or, at least, gel-electophoresis image that can show the RNA quality. Quantify your RNA samples by spectrophotometer (e.g. Nanodrop) or fluorimeter (e.g. QuBit).
In order to obtain a high quality sequencing data, customers must provide a good quality RNA, in detail:
· the 260/280 ratio of your RNA sample should be >1.8;
· RNA samples should be resuspended in nuclease-free water;
· on a gel, high-quality RNA should have two prominent bands (e.g. ribosomal RNA) with the 28S one (at 4.5 kb) should be twice the intensity of 18S (at 1.9 kb);
· on an Agilent Bioanalyzer 2100, RNA should have an RNA Integrity Number
(RIN) > 8.
Customers need to provide the result analysis of Agilent 2100 Bioanalyzer or, at least, gel-electophoresis image that can show the RNA quality. Quantify your RNA samples by spectrophotometer (e.g. Nanodrop) or fluorimeter (e.g. QuBit).
Which instruments should be used to evaluate sample quality?
Customers need to provide the result analysis of Agilent 2100 Bioanalyzer or, at least, gel-electophoresis image that can show the RNA quality. Quantify your RNA samples by spectrophotometer (e.g. Nanodrop) or fluorimeter (e.g. QuBit).
Are there alternatives to dry ice in order to ship RNA?
If you’re not able to send RNA samples in dry-ice you can send them lyophilized with RNAstable (Biomatrica, http://biomatrica.com/rnastable.php).
What causes important human DNA contamination in 16S rRNA gene sequence analysis?
Human contamination in 16S rRNA metabarcoding is quite common especially The amount of off-target amplification relates to the ratio of human to bacterial DNA. Thus, the issue usually does not affect stool and skin samples, which contain lower amounts of human DNA (stool <10% and skin <90%). Nevertheless, contamination can heavily impact analysis of biopsies where over 97% of the DNA present is of human origin.
What is the sample volume and concentration required to perform mRNA-Seq?
The minimum total amount requested is 60 ng (minimum concentration of 0.6 ng/µL, minimum volume of 20 µL). For de novo (paired-ends) application as well as for human and mouse FFPE samples, we suggest sending a minimum of 200 ng. For other species at least 500 ng in case of degraded RNA. Keep in mind that the use of degraded RNA can result in low yield, over-representation of 3’ends of the RNA molecules or failure of the protocol. Please refer to "RNA-smallRNA_Sample_preparation_guidelines_V5" document for more details.
What is the sample volume and concentration required to perform stranded Total RNA-Seq?
The total amount requested is 1 µg in at least 20 µl (minimum concentration of 50 ng/µl).
This protocol works also with degraded RNAs and FFPE RNAs even if the success rate is not guarantee.
A DNase I step is mandatory after the RNA isolation. RNA that has DNA contamination will result in an underestimation of the amount of the RNA used and poor data quality. Look at species compatibility here. Please refer to "RNA-smallRNA_Sample_preparation_guidelines" document for more details.
What is the sample volume and concentration required to perform smallRNA-Seq?
The total amount requested is 2 µg (minimum concentration of 200 ng/µl) of total RNA or 100 ng of previously isolated microRNA (minimum concentration of 10 ng/µl) in 10 µl of nuclease-free water or 10 nM Tris-HCl, pH 8.5. Please refer to "RNA-smallRNA_Sample_preparation_guidelines" document for more details.
What is the depth of coverage that I need?
There is no official recommendation for sequencing coverage level. Coverage requirements depend on application and standards are set by the field you are in and scientific journals. One has to keep in mind that every base in the sample has to be sequenced several times to allow for the reliable base call and that, in addition, reads are not distributed evenly over an entire genome or target region (many bases will be covered by fewer reads than the average, while other bases will be covered by more reads than average). Increase of coverage enhances the detection of rare variants present in a highly heterogenous samples, such as cancer and permits detection of rarely expressed genes.
Why target enrichment doesn’t yield even coverage distribution?
PCR-based methods require highly multiplexed oligonucleotide pairs targeted to heterogeneous sequences with a range of melting temperatures and CG content to generate hundreds or thousands of amplicons in a single tube. This leads to differences in amplicon presentation and uneven sequence coverage. In hybridization-based methods efficiency of capture is not uniform. High GC content in regions such as the 5’UTR, promoter regions and the first exons of genes affect enrichment efficiency as well as repeat elements, tandem repeats and pseudogenes resulting in uneven distribution of coverage. Finally, but not less importantly, a lower quantity or lower quality of DNA is often found to introduce bias in the downstream analysis.
Should I treat my RNA samples with DNase?
SmallRNA-Seq and mRNA-Seq DOES NOT require DNase treatment. Instead DNase treatment is recommended for other protocols such as total RNA-Seq.
Should I remove duplicates in RNA-seq?
Duplicates in RNA-seq are not necessarily an artifact. In fact, observing high rates of read duplicates in RNA-seq libraries is common. It may not be an indication of poor library complexity caused by low sample input or over-amplification. In general, for paired-end reads, removal of duplicates could be a part of a standard procedure since alignments that start at the same locations at both read 1 and read 2 are very unlikely to occur by chance because of the variation in fragment size. However, for short RNA (i.e., small transcript, miRNA, etc) that is very highly expressed there might be many, many legitimate duplicate copies with exactly the same fragment size/position.
Which is the optimal sequencing depth in ChIP-seq experiments?
An important consideration in experimental design is the minimum number of sequenced reads required to obtain statistically significant results. The amount of produced reads, i.e. the required sequencing depth, depends on the nature of the mark and the state of the cell in each experiment. However you can find some good guidelines here. Jung et al. observed that sufficient depth is often reached at <20 million reads for fly, while for human they suggest 40-50 million reads as a practical minimum for most marks.
How to treat duplicated reads in ChIP-seq experiments?
The most “politically correct” solution is the MACS2 one. If the read length parameter is set to zero, MACS2 detects read length automatically and proceeds to filter out duplicate reads. By default it calculates the maximum number of duplicate reads in a single position warranted by the sequencing depth, and removes redundant reads in excess of this number. Alternatively, you can select to keep only one read, or all duplicates.
What is the best control for ChIP-Seq: Input, Igg or Untagged Strain?
Most labs use Input since IgG can be biased because: most IgG antibodies are not obtained from true preimmune serum from the same animal in which the specific antibody was raised; and IgG antibodies usually immunoprecipitate much less DNA than specific antibodies do, and thus limited genomic regions from the control may be over-amplified during the library construction step. You can find a good overview here.
What is the peak model building in ChIP-seq data?
MACS2 models the distance between the paired forward and reverse strand peaks from the data. It slides a window across the genome to find enriched regions, which have M-fold more reads than background. The size of the window is twice the bandwidth parameter. The expected background is the number of reads times their length divided by the mappable genome size. Note that the mappable genome size is always less than the real genome size because of repetitive sequence. The regions' fold enrichment must be higher than 10 and less than 30 (these values can be changed if not enough regions are found). However, a smaller value for the lower cutoff provides more regions for model building, but it can also include spurious data into the model and thereby adversely affect the peak finding results. MACS2 uses 1000 enriched regions to model the distance between the forward and reverse strand peaks, predicting the fragment size.
How does MACS2 detect peaks?
In the peak detection phase, MACS2 extends the reads in the 3' direction to the fragment length obtained from modeling. If the model building failed or if it was switched off, the reads are extended to the value of the extension size parameter. If a control sample is available, MACS2 scales the samples linearly to the same read number. It then selects candidate peaks by scanning the genome again, now using a window size which is twice the fragment length. MACS2 calculates a p-value for each peak using a dynamic Poisson distribution to capture local biases in read background levels. If a control sample is available, it is used to calculate the local background. Finally, q-values are calculated using the Benjamini-Hochberg correction.
Which is the effect of sequencing depth on new microRNA discovery?
Sequencing depth is one of the most crucial factors for both differential expression analysis and discovery of rare or novel microRNAs and can vary from tissue to tissue. Having said that, 10 million of reads are sufficient for thorough discovery and effective differential expression analysis. For more info, please, look here.
Does mRNA-seq detect long non-coding RNAs?
LncRNAs are 50/50 polyA+ and polyA-. If the RNA-seq library is polyA+ enriched there will be a bias in analysis for those lncRNAs that are polyA+.
Does poly(A) RNA selection in mRNA-seq causes coverage bias at 3’?
Poly(A) RNA selection does not necessarily causes bias in gene body coverage since double-strand cDNA generation is performed by using a mixture of random and poly(dT) primers.