Lab 10 Entry: Analysis of EPIC markers

Carter Pope

Professor John Paul

Molecular Ecology

11/6/18

Lab 10 Entry

Analysis of EPIC markers

 

On October 31st, I arrived at the lab and was given the sad news that none of our class’s plant DNA worked. Instead, we used a previous class’s plant DNA as a temporary replacement to continue our work with Geneious. I began by opening Geneious on my computer and making a new folder to hold my EPIC sequence reads, both the forward and reverse sequences. After importing these sequences from Canvas to my new folder, I then built an alignment using Muscle and began trimming the bad reads at the beginning and end of the sequences.

Once these edits were made, I zoomed in and began looking for polymorphic sites, heterozygous sites, as well as sites that were both heterozygotes and had polymorphisms. Polymorphic sites could be found in my following columns: 50, 91, 106, 131, and 411. Heterozygous sites could be found in my following columns: 271, 391, 403, 412, 457, 458, 465, and 468. Sites that were heterozygous and polymorphic were on columns 403 and 412. The heterozygous sites were then edited to match the correct IUPAC Ambiguity Code.

The next step I did was to create consensus sequences from the forward and reverse reads. I began by selecting the forward and reverse pair for each specific DNA code and aligned them to make a new assembly document. I edited out bad sections of the new aligned sequences,¬†changed any obvious unknown bases that were labeled “N”, and then saved the changes. I did these previous two steps for every individual DNA code that had a forward and reverse read (TG0248 and JP1132 were the only two that didn’t have both the forward and reverse read). Then I selected all of my newly edited assemblies and created a new consensus sequence list. I extracted this sequence list to a new subfolder and labeled it “5334 Consensus Sequences”. I then edited this “5334 consensus sequence” with a “Batch Rename” and put 34 in the text space provided in order to remove all the characters except the sequence name for the selected documents which is needed to concatenate the alignments of different markers. However, after working with Professor Paul and Jinwoo Kim during office hours, we discovered that selecting 34 would result in the four number on the far right on the identification code being cut off. This lead to a problem because several DNA samples now had the same code and they now couldn’t be distinguished. Therefore, I redid the steps starting from extracting the sequence into a new subfolder and this time labeled it “Correct 5334 Consensus Sequences” and put 33 in the text space provided. This fixed our problem and I continued by adding the TG0248 and JP1132 which were the two that didn’t have both a forward and reverse read, and added them into the “Correct 5334 Consensus Sequences”.

I then built an alignment with Muscle using the default parameters for the “Correct 5334 Consensus Sequences” and made a last minute edit towards the end of the alignment which cut off a few hundred base pairs that looked bad. I saved these edits and then renamed it as “5334 Nucleotide Alignment”.

At home, I continued working in Geneious by inferring a Bayesian tree of this single locus alignment. By using my “5334 Nucleotide Alignment” and choosing MrBayes to run a tree, selecting¬†M. lewisii (TG0248) as the outgroup, and running the tree long enough, I was able to get a good posterior distribution and fuzzy caterpillar.

The tree on the left shows equal branch length and the one on the right shows proportional branch length.

My laptop is very difficult in cooperating with me when it comes to transferring over images and dealing with big images that hold a lot of memory so I had to take a picture of my laptop screen with my iPhone. Link: 5334 nucleotide tree-288bbml

Based off of the trees that I created, I can infer that there are many clades that are strongly supported through the MrBayes that I ran. The clades that are formed support the idea that they indeed represent geographic populations for the most part. The clades will show that the plant species in each clade are in a similar geographic setting; however, other clades can possibly also be in a generally similar geographic setting, regardless of them being in a different clade.

 

Below shows my posterior distribution and my trace:

 

Thoughts: For some reason, I can’t seem to find my options for working with my nodes and even changing the decimal place for them won’t change when I try editing the settings. I might have to go in during office hours and find any potential mistakes that may have occurred.

Leave a Reply

Your email address will not be published. Required fields are marked *