--- title: "RScelestial Vignette" author: "Mohammad-Hadi Foroughmand-Araabi" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{RScelestial Vignette} %\VignetteEngine{knitr::rmarkdown} \VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(RScelestial) # We load igraph for drawing trees. If you do not want to draw, # there is no need to import igraph. library(igraph) ``` # Installing RScelestial The RScelestial package could be installed easily as follows ```{r eval=FALSE} install.packages("RScelestial") ``` # Simulation Here we show a simulation. We build a data set with following command. ```{r} # Following command generates ten samples with 20 loci. # Rate of mutations on each edge of the evolutionary tree is 1.5. D = synthesis(10, 20, 5, seed = 7) D ``` # Inferring the phylogenetic tree ```{r run-scelestial-0} seq = as.ten.state.matrix(D$seqeunce) SP = scelestial(seq, return.graph = TRUE) SP ``` You can draw the graph with following command ```{r fig.width=5, fig.height=5} tree.plot(SP, vertex.size = 30) ``` Also, we can make a rooted tree with cell "C8" as the root of the tree as follows: ```{r fig.width=5, fig.height=5} SP = scelestial(seq, root.assign.method = "fix", root = "C8", return.graph = TRUE) tree.plot(SP, vertex.size = 30) ``` Setting root.assign.method to "balance" lets the algorithm decide for a root that produces minimum height tree. ```{r fig.width=5, fig.height=5} SP = scelestial(seq, root.assign.method = "balance", return.graph = TRUE) tree.plot(SP, vertex.size = 30) ``` # Evaluating results Following command calculates the distance array between pairs of samples. ```{r} D.distance.matrix <- distance.matrix.true.tree(D) D.distance.matrix SP.distance.matrix <- distance.matrix.scelestial(SP) SP.distance.matrix ## Difference between normalized distance matrices vertices <- rownames(SP.distance.matrix) sum(abs(D.distance.matrix[vertices,vertices] - SP.distance.matrix)) ``` # Running Scelestial on multiple sequence alignment Given a multiple sequence alignment, Scelestial infers the phylogeny of them. Here we present a simple example. First we load libraries to load a multiple alignment. ```{r load-libraries} library(stringr) if (!require("seqinr")) install.packages("seqinr") library(seqinr) ``` In this example, we load a multiple alignment from seqinr package. ```{r load-data} data(phylip, package = "seqinr") ``` Then we clean the data and build a zero-one matrix representing taxa and characters. Note that Scelestial accept matrices with taxa as its columns and characters as its rows. ```{r data-cleaning} # Removing non-informative columns and duplicate rows. mcb <- toupper(t(sapply(seq(phylip$seq), function(i) unlist(strsplit(phylip$seq[[i]], ''))))) ccb <- as.character(phylip$seq) occb <- order(ccb) cbColMask <- sapply(seq(ncol(mcb)), function(j) length(levels(as.factor(mcb[,j]))) == 1) cbRowMask <- rep(TRUE, length(ccb)) for (i in seq(length(ccb))) { if (i == 1 || ccb[occb[i]] != ccb[occb[i-1]]) { cbRowMask[occb[i]] <- FALSE } } mcbRows <- apply(mcb[!cbRowMask, !cbColMask], MARGIN = 1, FUN = function(a) paste0(str_replace(a, "-", "X"), collapse = "")) ``` Executing Scelestial on the input matrix. ```{r run-scelestial} n.seq <- data.frame(nodes = phylip$nam[!cbRowMask], seq = mcbRows) seq2 <- data.frame(t(as.ten.state.matrix.from.node.seq(n.seq)), stringsAsFactors = TRUE) # Running Scelestial SP = scelestial(seq2, return.graph = TRUE) ``` ```{r plot} tree.plot(SP, vertex.size=20, vertex.label.dist=0, asp = 0, vertex.label.cex = 1) ```