Title: | Virus-Host Codon Usage Co-Adaptation Analysis |
---|---|
Description: | Analyze the co-adaptation of codon usage between a virus and its host,calculate various codon usage bias measurements as; effective number of codons (ENc) Novembre (2002) <doi:10.1093/oxfordjournals.molbev.a004201>, codon adaptation index (CAI) Sharp and Li (1987) <doi:10.1093/nar/15.3.1281>, relative codon deoptimization index (RCDI) Puigbò et al (2010) <doi:10.1186/1756-0500-3-87>, similarity index (SiD) Zhou et al (2013) <doi:10.1371/journal.pone.0077239>, synonymous codon usage orderliness (SCUO) Wan et al (2004) <doi:10.1186/1471-2148-4-19> and, relative synonymous codon usage (RSCU) Sharp et al (1986) <doi:10.1093/nar/14.13.5125>. Also, it provides a statistical dinucleotide over- and underrepresentation with three different models. Implement several methods for visualization of codon usage as ENc.GC3plot() and PR2.plot(). |
Authors: | Ali Mostafa Anwar [aut, cre], Mohamed Soudy [aut] |
Maintainer: | Ali Mostafa Anwar <[email protected]> |
License: | GPL-3 |
Version: | 1.0.0 |
Built: | 2025-01-24 04:42:49 UTC |
Source: | https://github.com/aliyoussef96/vhcub |
Measure the Codon Adaptation Index (CAI) Sharp and Li (1987), of DNA sequence.
CAI.values(df.virus, ENc.set.host, df.host,genetic.code = "1",set.len = 5, threshold = 0)
CAI.values(df.virus, ENc.set.host, df.host,genetic.code = "1",set.len = 5, threshold = 0)
df.virus |
a data frame with seq_name and its virus DNA sequence. |
ENc.set.host |
a data frame with ENc values of a host. |
df.host |
a data frame with seq_name and its host DNA sequence. |
genetic.code |
a single string that uniquely identifies a genetic code to use. |
set.len |
a number represents a percent that will be used as reference genes from the total host genes. |
threshold |
optional numeric, specifying sequence length, in codons, used for filtering. |
For more information about CAI Sharp and Li, 1987.
A data.frame containing the computed CAI values for each DNA sequences within df.fasta.
Ali Mostafa Anwar [email protected] and Mohmed Soudy [email protected]
## Not run: # read DNA from fasta file fasta <- fasta.read("virus.fasta", "host.fasta") fasta.v <- fasta[[1]] fasta.h <- fasta[[2]] # Calculate CAI enc.df.h <- ENc.values(fasta.h) cai.df <- CAI.values(fasta.v, enc.df.h, fasta.h) ## End(Not run)
## Not run: # read DNA from fasta file fasta <- fasta.read("virus.fasta", "host.fasta") fasta.v <- fasta[[1]] fasta.h <- fasta[[2]] # Calculate CAI enc.df.h <- ENc.values(fasta.h) cai.df <- CAI.values(fasta.v, enc.df.h, fasta.h) ## End(Not run)
A measure of statistical dinucleotide over- and underrepresentation; by allows for random sequence generation by shuffling (with/without replacement) of all bases in the sequence.
dinuc.base(df.virus,permutations=500,exact_numbers = FALSE)
dinuc.base(df.virus,permutations=500,exact_numbers = FALSE)
df.virus |
data frame with seq_name and its DNA sequence. |
permutations |
the number of permutations for the z-score computation. |
exact_numbers |
if TRUE exact analytical calculation will be used. |
For more information seqinr.
A data.frame containing the computed statistic for each dinucleotide in all DNA sequences within df.virus.
Ali Mostafa Anwar [email protected] and Mohmed Soudy [email protected]
## Not run: # read DNA from fasta file fasta <- fasta.read("virus.fasta", "host.fasta") fasta.v <- fasta[[1]] fasta.h <- fasta[[2]] # Calculate zscore using (base model) base <- dinuc.base(fasta.v, permutations = 500) ## End(Not run)
## Not run: # read DNA from fasta file fasta <- fasta.read("virus.fasta", "host.fasta") fasta.v <- fasta[[1]] fasta.h <- fasta[[2]] # Calculate zscore using (base model) base <- dinuc.base(fasta.v, permutations = 500) ## End(Not run)
A measure of statistical dinucleotide over- and underrepresentation; by allows for random sequence generation by shuffling (with/without replacement) of codons.
dinuc.codon(df.virus,permutations=500,exact_numbers = FALSE)
dinuc.codon(df.virus,permutations=500,exact_numbers = FALSE)
df.virus |
data frame with seq_name and its DNA sequence. |
permutations |
the number of permutations for the z-score computation. |
exact_numbers |
if TRUE exact analytical calculation will be used. |
For more information seqinr.
A data.frame containing the computed statistic for each dinucleotide in all DNA sequences within df.virus.
Ali Mostafa Anwar [email protected] and Mohmed Soudy [email protected]
## Not run: # read DNA from fasta file fasta <- fasta.read("virus.fasta", "host.fasta") fasta.v <- fasta[[1]] fasta.h <- fasta[[2]] # Calculate zscore using (codon model) codon <- dinuc.codon(fasta.v, permutations = 500) ## End(Not run)
## Not run: # read DNA from fasta file fasta <- fasta.read("virus.fasta", "host.fasta") fasta.v <- fasta[[1]] fasta.h <- fasta[[2]] # Calculate zscore using (codon model) codon <- dinuc.codon(fasta.v, permutations = 500) ## End(Not run)
A measure of statistical dinucleotide over- and underrepresentation; by allows for random sequence generation by shuffling (with/without replacement) of synonymous codons.
dinuc.syncodon(df.virus,permutations=500,exact_numbers = FALSE)
dinuc.syncodon(df.virus,permutations=500,exact_numbers = FALSE)
df.virus |
data frame with seq_name and its DNA sequence. |
permutations |
the number of permutations for the z-score computation. |
exact_numbers |
if TRUE exact analytical calculation will be used. |
For more information seqinr.
A data.frame containing the computed statistic for each dinucleotide in all DNA sequences within df.virus.
Ali Mostafa Anwar [email protected] and Mohmed Soudy [email protected]
## Not run: # read DNA from fasta file fasta <- fasta.read("virus.fasta", "host.fasta") fasta.v <- fasta[[1]] fasta.h <- fasta[[2]] # Calculate zscore using (syncodon model) syncodon <- dinuc.syncodon(fasta.v, permutations = 500) ## End(Not run)
## Not run: # read DNA from fasta file fasta <- fasta.read("virus.fasta", "host.fasta") fasta.v <- fasta[[1]] fasta.h <- fasta[[2]] # Calculate zscore using (syncodon model) syncodon <- dinuc.syncodon(fasta.v, permutations = 500) ## End(Not run)
Make an ENc-GC3 scatterplot. Where the y-axis represents the ENc values and the x-axis represents the GC3 content. The red fitting line shows the expected ENc values when codon usage bias affected solely by GC3.
ENc.GC3plot(enc.df, gc.df)
ENc.GC3plot(enc.df, gc.df)
enc.df |
a data frame with ENc values. |
gc.df |
a data frame with GC3 values. |
For more information about ENc-GC3 plot Butt et al., 2016.
A ggplot object.
Ali Mostafa Anwar [email protected] and Mohmed Soudy [email protected]
## Not run: # read DNA from fasta file fasta <- fasta.read("virus.fasta", "host.fasta") fasta.v <- fasta[[1]] fasta.h <- fasta[[2]] enc.df <- ENc.values(fasta.v) gc.df <- GC.content(fasta.v) ENc.GC3plot(enc.df, gc.df) ## End(Not run)
## Not run: # read DNA from fasta file fasta <- fasta.read("virus.fasta", "host.fasta") fasta.v <- fasta[[1]] fasta.h <- fasta[[2]] enc.df <- ENc.values(fasta.v) gc.df <- GC.content(fasta.v) ENc.GC3plot(enc.df, gc.df) ## End(Not run)
Measure the Effective Number of Codons (ENc) of DNA sequence. Using its modified version (Novembre, 2002).
ENc.values(df.fasta,genetic.code = "1",threshold=0)
ENc.values(df.fasta,genetic.code = "1",threshold=0)
df.fasta |
a data frame with seq_name and its DNA sequence. |
genetic.code |
a single string that uniquely identifies a genetic code to use. |
threshold |
optional numeric, specifying sequence length, in codons, used for filtering. |
For more information about ENc Novembre, 2002.
A data.frame containing the computed ENc values for each DNA sequences within df.fasta.
Ali Mostafa Anwar [email protected] and Mohmed Soudy [email protected]
## Not run: # read DNA from fasta file fasta <- fasta.read("virus.fasta", "host.fasta") fasta.v <- fasta[[1]] fasta.h <- fasta[[2]] # Calculate ENc enc.df <- ENc.values(fasta.v) ## End(Not run)
## Not run: # read DNA from fasta file fasta <- fasta.read("virus.fasta", "host.fasta") fasta.v <- fasta[[1]] fasta.h <- fasta[[2]] # Calculate ENc enc.df <- ENc.values(fasta.v) ## End(Not run)
Read fasta formate and convert it to data frame
fasta.read(virus.fasta,host.fasta)
fasta.read(virus.fasta,host.fasta)
virus.fasta |
directory path to the virus fasta file. |
host.fasta |
directory path to the host fasta file. |
A list with two data frames.
The list with two data.frames; the first one for virus DNA sequences and the second one for the host.
Ali Mostafa Anwar [email protected] and Mohmed Soudy [email protected]
## Not run: fasta <- fasta.read("virus.fasta", "host.fasta") fasta.v <- fasta[[1]] fasta.h <- fasta[[2]] ## End(Not run)
## Not run: fasta <- fasta.read("virus.fasta", "host.fasta") fasta.v <- fasta[[1]] fasta.h <- fasta[[2]] ## End(Not run)
Calculates overall GC content as well as GC at first, second, and third codon positions.
GC.content(df.virus)
GC.content(df.virus)
df.virus |
data frame with seq_name and its DNA sequence. |
A data.frame with overall GC content as well as GC at first, second, and third codon positions of all DNA sequence from df.virus.
Ali Mostafa Anwar [email protected] and Mohmed Soudy [email protected]
## Not run: # read DNA from fasta file fasta <- fasta.read("virus.fasta", "host.fasta") fasta.v <- fasta[[1]] fasta.h <- fasta[[2]] # Calculate GC content gc.df <- GC.content(fasta.v) ## End(Not run)
## Not run: # read DNA from fasta file fasta <- fasta.read("virus.fasta", "host.fasta") fasta.v <- fasta[[1]] fasta.h <- fasta[[2]] # Calculate GC content gc.df <- GC.content(fasta.v) ## End(Not run)
Make a Parity rule 2 (PR2) plot, where the AT-bias [A3/(A3 +T3)] at the third codon position of the four-codon amino acids of entire genes is the ordinate and the GC-bias [G3/(G3 +C3)] is the abscissa. The center of the plot, where both coordinates are 0.5, is where A = U and G = C (PR2), with no bias between the influence of the mutation and selection rates.
PR2.plot(fasta.df)
PR2.plot(fasta.df)
fasta.df |
a data frame with seq_name and its DNA sequence. |
For more information about PR2 plot Butt et al., 2016.
A ggplot object.
Ali Mostafa Anwar [email protected] and Mohmed Soudy [email protected]
## Not run: # read DNA from fasta file fasta <- fasta.read("virus.fasta", "host.fasta") fasta.v <- fasta[[1]] fasta.h <- fasta[[2]] PR2.plot(fasta.v) ## End(Not run)
## Not run: # read DNA from fasta file fasta <- fasta.read("virus.fasta", "host.fasta") fasta.v <- fasta[[1]] fasta.h <- fasta[[2]] PR2.plot(fasta.v) ## End(Not run)
Measure the Relative Codon Deoptimization Index (RCDI) of DNA sequence.
RCDI.values(fasta.virus, fasta.host, enc.host, set.len= 5)
RCDI.values(fasta.virus, fasta.host, enc.host, set.len= 5)
fasta.virus |
a data frame with virus seq_name and its DNA sequence. |
fasta.host |
a data frame with host seq_name and its DNA sequence. |
enc.host |
a data frame of a hosts' ENc values. |
set.len |
a number represents a percent that will be used as reference genes from the total host genes. |
For more information about RCDI Puigbò et al., 2010
A data.frame containing the computed ENc values for each DNA sequences within df.fasta.
Ali Mostafa Anwar [email protected] and Mohmed Soudy [email protected]
## Not run: # read DNA from fasta file fasta <- fasta.read("virus.fasta", "host.fasta") fasta.v <- fasta[[1]] fasta.h <- fasta[[2]] # Calculate RCDI enc.df.h <- ENc.values(fasta.h) rcdi.df <- RCDI.values(fasta.v, fasta.h, enc.df.h) ## End(Not run)
## Not run: # read DNA from fasta file fasta <- fasta.read("virus.fasta", "host.fasta") fasta.v <- fasta[[1]] fasta.h <- fasta[[2]] # Calculate RCDI enc.df.h <- ENc.values(fasta.h) rcdi.df <- RCDI.values(fasta.v, fasta.h, enc.df.h) ## End(Not run)
Measure the Relative Synonymous Codon Usage (RSCU) of DNA sequence.
RSCU.values(df.fasta)
RSCU.values(df.fasta)
df.fasta |
a data frame with seq_name and its DNA sequence. |
For more information about ENc Sharp et al., 1986.
A data.frame containing the computed RSCU values for each codon for each DNA sequences within df.fasta.
Ali Mostafa Anwar [email protected] and Mohmed Soudy [email protected]
## Not run: # read DNA from fasta file fasta <- fasta.read("virus.fasta", "host.fasta") fasta.v <- fasta[[1]] fasta.h <- fasta[[2]] # Calculate RSCU RSCU.H <- RSCU.values(fasta.h) RSCU.V <- RSCU.values(fasta.v) ## End(Not run)
## Not run: # read DNA from fasta file fasta <- fasta.read("virus.fasta", "host.fasta") fasta.v <- fasta[[1]] fasta.h <- fasta[[2]] # Calculate RSCU RSCU.H <- RSCU.values(fasta.h) RSCU.V <- RSCU.values(fasta.v) ## End(Not run)
Measure the Synonymous Codon Usage Orderliness (SCUO) of DNA sequence (Wan et al., 2004).
SCUO.values(df.fasta,genetic.code = "1",threshold=0)
SCUO.values(df.fasta,genetic.code = "1",threshold=0)
df.fasta |
a data frame with seq_name and its DNA sequence. |
genetic.code |
a single string that uniquely identifies a genetic code to use. |
threshold |
optional numeric, specifying sequence length, in codons, used for filtering. |
For more information about ENc Wan et al., 2004.
A data.frame containing the computed SCUO values for each DNA sequences within df.fasta.
Ali Mostafa Anwar [email protected] and Mohmed Soudy [email protected]
## Not run: # read DNA from fasta file fasta <- fasta.read("virus.fasta", "host.fasta") fasta.v <- fasta[[1]] fasta.h <- fasta[[2]] # Calculate SCUO SCUO.df <- SCUO.values(fasta.v) ## End(Not run)
## Not run: # read DNA from fasta file fasta <- fasta.read("virus.fasta", "host.fasta") fasta.v <- fasta[[1]] fasta.h <- fasta[[2]] # Calculate SCUO SCUO.df <- SCUO.values(fasta.v) ## End(Not run)
Measure the Similarity Index (SiD) between a virus and its host codon usage.
SiD.value(rscu.host,rscu.virus)
SiD.value(rscu.host,rscu.virus)
rscu.host |
a data frame with RSCU a host codon values. |
rscu.virus |
a data frame with RSCU a virus codon values. |
For more information about SiD Zhou et al., 2013.
A numeric represent a SiD value.
Ali Mostafa Anwar [email protected] and Mohmed Soudy [email protected]
## Not run: # read DNA from fasta file fasta <- fasta.read("virus.fasta", "host.fasta") fasta.v <- fasta[[1]] fasta.h <- fasta[[2]] # Calculate SiD RSCU.H <- RSCU.values(fasta.h) RSCU.V <- RSCU.values(fasta.v) SiD <- SiD.value(RSCU.H, RSCU.V) ## End(Not run)
## Not run: # read DNA from fasta file fasta <- fasta.read("virus.fasta", "host.fasta") fasta.v <- fasta[[1]] fasta.h <- fasta[[2]] # Calculate SiD RSCU.H <- RSCU.values(fasta.h) RSCU.V <- RSCU.values(fasta.v) SiD <- SiD.value(RSCU.H, RSCU.V) ## End(Not run)
vhcub can calculate various codon usage bias measurements as; effective number of codons (ENc), codon adaptation index (CAI), relative codon deoptimization index (RCDI), similarity index (SiD), synonymous codon usage eorderliness (SCUO) and, relative synonymous codon usage (RSCU). Also, it provides a statistical dinucleotide over- and underrepresentation with three different models. Implement several methods for visualization of codon usage as ENc.GC3plot and PR2.plot.
fasta.read: read fasta format files and convert it to data.frame.
GC.content: calculates overall GC content as well as GC at first, second, and third codon positions.
RSCU.values: measure the Relative Synonymous Codon Usage (RSCU) of DNA sequence.
SCUO.values: measure the Synonymous Codon Usage Eorderliness (SCUO) of DNA sequence.
RCDI.values: measure the Relative Codon Deoptimization Index (RCDI) of DNA sequence.
CAI.values: measure the Codon Adaptation Index (CAI) Sharp and Li (1987), of DNA sequence.
ENc.values: measure the Effective Number of Codons (ENc) of DNA sequence. Using its modified version.
dinuc.syncodon: measure of statistical dinucleotide over- and underrepresentation; by allows for random sequence generation by shuffling (with/without replacement) of synonymous codons.
dinuc.codon: measure of statistical dinucleotide over- and underrepresentation; by allows for random sequence generation by shuffling (with/without replacement) of codons.
dinuc.base: measure of statistical dinucleotide over- and underrepresentation; by allows for random sequence generation by shuffling (with/without replacement) of all bases in the sequence.
ENc.GC3plot: make an ENc-GC3 scatterplot. Where the y-axis represents the ENc values and the x-axis represents the GC3 content. The red fitting line shows the expected ENc values when codon usage bias affected solely by GC3.
PR2.plot: make a Parity rule 2 (PR2) plot, where the AT-bias [A3/(A3 +T3)] at the third codon position of the four-codon amino acids of entire genes is the ordinate and the GC-bias [G3/(G3 +C3)] is the abscissa. The center of the plot, where both coordinates are 0.5, is where A = U and G = C (PR2), with no bias between the influence of the mutation and selection rates.
Ali Mostafa Anwar [email protected] and Mohmed Soudy [email protected]
## Not run: # read DNA from fasta files fasta <- fasta.read("virus.fasta", "host.fasta") fasta.v <- fasta[[1]] fasta.h <- fasta[[2]] # calculate GC content gc.df <- GC.content(fasta.v) # measure of statistical dinucleotide over- and underrepresentation syncodon <- dinuc.syncodon(fasta.v,permutations=10) base <- dinuc.base(fasta.v,permutations=10) codon <- dinuc.codon(fasta.v,permutations=10) # calculate ENc enc.df <- ENc.values(fasta.v) enc.df.h <- ENc.values(fasta.h) # calculate SCUO and CAI SCUO.df <- SCUO.values(fasta.v) cai.df <- CAI.values(fasta.v,enc.df.h, fasta.h) # calculate RSCU RSCU.H <- RSCU.values(fasta.h) RSCU.V <- RSCU.values(fasta.v) # calculate SiD SiD <- SiD.value(RSCU.H,RSCU.V) # calculate RCDI rcdi.df <- RCDI.values(fasta.v,fasta.h, enc.df.h) # plot ENc.GC3plot ENc.GC3plot(enc.df,gc.df) # plot PR2.plot PR2.plot(fasta.v) ## End(Not run)
## Not run: # read DNA from fasta files fasta <- fasta.read("virus.fasta", "host.fasta") fasta.v <- fasta[[1]] fasta.h <- fasta[[2]] # calculate GC content gc.df <- GC.content(fasta.v) # measure of statistical dinucleotide over- and underrepresentation syncodon <- dinuc.syncodon(fasta.v,permutations=10) base <- dinuc.base(fasta.v,permutations=10) codon <- dinuc.codon(fasta.v,permutations=10) # calculate ENc enc.df <- ENc.values(fasta.v) enc.df.h <- ENc.values(fasta.h) # calculate SCUO and CAI SCUO.df <- SCUO.values(fasta.v) cai.df <- CAI.values(fasta.v,enc.df.h, fasta.h) # calculate RSCU RSCU.H <- RSCU.values(fasta.h) RSCU.V <- RSCU.values(fasta.v) # calculate SiD SiD <- SiD.value(RSCU.H,RSCU.V) # calculate RCDI rcdi.df <- RCDI.values(fasta.v,fasta.h, enc.df.h) # plot ENc.GC3plot ENc.GC3plot(enc.df,gc.df) # plot PR2.plot PR2.plot(fasta.v) ## End(Not run)