Genome Sequencing
Your Industry, Our Focus
Denovo and Reference Guided Genome Assembly
Genome Analysis
Whole genome sequencing provides the all-inclusive information of entire genetic material of an organism. There are two approaches for assembling high throughput sequencing reads into longer contiguous genomic sequences. Sequencing of non-model genomes (no reference genome) De novo based, where sequenced reads are compared to each other, and then overlapped reads are used to build longer contiguous sequences. Contigs are oriented and ordered by long jumping distance (LJD) libraries. The reference-based assembly approach involves mapping each read to a reference genome sequence to identify genetic variation like single nucleotide polymorphisms(SNPs), indels, insertions, copy number variants, genome wide association studies (GWAS) and building haplotypes from genome assemblies.
Eurofins genomics provides various combination of sequencing platforms like Illumina HiSeq, MiSeq, NextSeq, PacBio with various read length and libraries (paired-end and mate-pair) for whole genome sequencing of human, animals, plants and microorganisms like bacteria, virus, fungus. Eurofins Genomics offers libraries with wide jumping distances to easily tackle any genome size - from bacterial genomes to large and complex eukaryotic genomes. The long paired-end reads are used to determine the orientation and relative position of the contigs generated during the data assembly of shotgun reads. These unique long jumping distance (LJD) libraries offer ultra-high throughput and cost-efficient scaffolding of contigs.
Eurofins provides various genome assembly services as listed below:
- Bacterial/Fungus Denovo Genome Assembly
- Pacbio Bacterial/Fungus Denovo Assembly
- Denovo Genome Assembly upto 1Gb
- Large Genome Denovo Assembly >1Gb
- Fungus hybrid Denovo assembly and analysis
- Large genome hybrid Denovo assembly and analysis
- Reference guided genome analysis
- PacBio reference guided analysis
Workflow & Deliverables
Bioinformatics workflow for genome analysis:
a. Quality check of raw reads:
The raw reads will be subjected to quality filtration and adapter trimming. The primer sequences, poly(A) tails and reads produced from ribosomal DNA templates will be removed. The high quality data will be used for downstream analysis.
b. Denovo Assembly :
The denovo assembly of high quality reads will be carried out using any one of the assembler. The assembler is often sensitive to the input parameters, so we will perform multiple Kmer assembly runs to optimize the assembly. The PE data will be assembled using various Kmer length, coverage cutoff, insert length, insert length standard deviation, expected coverage for scaffold assembly. The best assembly will be selected based on scaffold N50 and max scaffold length. Then final assembly will be evaluated based on scaffolds N50, assembly coverage (depth), reads participated in assembly, GC content, assembly completeness and accuracy.
c. Reference Based analysis
The reference genome and gene information (GFF3 file) are downloaded from public databases.
High quality reads are aligned against the reference genome with optimized parameters.
d. Gene prediction
Ab initio gene predictors are statistical models which are trained to find features of genes, such start and stop codons, CDS of the genes. The draft genome assembly will be used as input gene prediction program to predict the coding region in the given sample.
e. Annotation
The predicted coding regions will be annotated against NCBI non redundant protein database(Nr), Swissprot, Kyoto Encyclopedia of Genes and Genomes(KEGG), Cluster of Orthologous Group(COG) databases using Basic local alignment search tool (BlastX)(E-value ≤ 1e-05).
In biological pathways, coding region will mapped to reference canonical pathways in KEGG. All the coding genes classified mainly under five categories: Metabolism, Cellular processes, Genetic information processing, Environmental information processing. The output of KEGG analysis includes KEGG Orthology (KO) assignments and Corresponding Enzyme commission (EC) numbers and metabolic pathways of predicted coding genes using KEGG automated annotation server.
Gene ontology (GO) annotations of the coding genes will be determined by OmicsBox. GO terms will be assigned to coding genes for functional categorization. Genes will be categorized into categories namely biological process, molecular functions, and cellular component.
Deliverables:
Denovo genome assembly
- Quality filtration of reads
- Denovo assembly generating scaffolds/contigs,
- Assembly statistics
- Insilco validation of assembly using RNA-Seq data (for large complex plants)
- GC percentage
- Repeat identification
- Gene prediction
- Gene annotation
- GO analysis
- SSR discovery
- SNP/Indel discovery(if more than one sample)
- Phylogenetic analysis
- KEGG pathway analysis
- Comparative genomics with closely related genomes
- Circos plot
- COG orthologous groups analysis
- Comprehensive report with publication standard methodology, graphs and tables.
Reference guided genome analysis
- Quality filtration of reads
- Mapping of high quality reads to the reference genome
- Alignment summary (# reads mapped, # uniquely mapped reads,# reads unmapped, genome coverage)
- Consensus sequence in fasta format
- Gene prediction using gtf/gff
- SNP/Indels identification
- SNP/Indels annotation
- Core gene analysis
- Comparative genomics
- Phylogenetic analysis
- Comprehensive report with publication standard methodology, graphs and tables.