Plant Population Genetics II – Analysis of EPIC Markers

October 31, 2018

Due to unfortunate circumstances, there were no forward or reverse reads for the four EPIC markers used in the class data that included the twenty-five samples of Mimulus cardinalis. Thus, data from previous classes was used to score for polymorphisms, code for heterozygotes, and assemble EPIC marker alignments. The following protocol was performed:

  1. A new folder named ‘Plant Population EPIC Marker 5525’ was created under the ‘Local’ category.
  2. 5525_Forward_Reverse file was downloaded from Canvas and imported into Geneious inside of the ‘Plant Population EPIC Marker 5525’ folder, creating a new folder named EPIC_5525_Forward_Reverse.
  3. A muscle alignment was generated by selecting all the sequences and selecting ‘Multiple Align’.
  4. Ends were trimmed on the starts and ends of all sequences from where the good reads began.
  5. Polymorphisms were scored and the columns recorded.
  6. Heterozygous sites were identified by comparing whether there were dual clear peaks within both forward and reverse reads for the same sample. These sites wee recorded.
  7. Heterozygous sites were re-coded with the symbols according to the IUPAC Ambiguity Code list by selecting the heterozygous site and replacing the nucleotide with the corresponding letter.
  8. After all heterozygous sites were re-coded, the document was saved, making sure to apply changes to the original sequences.
  9. An assembly sequence was generated for each forward and reverse reads pair by selecting the pair and using ‘De Novo Assembly’ with default settings.
  10. Open each new assembly document and trim sequences at the start and ends where the good reads begin. Also edit any areas where nucleotides may be ambiguously marked with an N on one read, but clear on the other read. Replace the ambiguous nucleotide with the clear nucleotide.
  11. Double check heterozygous sites and ambiguities, then save the document. Repeat for all assembly sequences of each sample.
  12. All assembly documents were selected and ‘Generate consensus sequence’ was clicked then a sequence list was created, generating a new file named ‘Consensus sequences’.
  13. ‘Consensus sequences’ was selected and a new folder named ‘5525 Consensus Sequences’ was created by click ‘Sequence’ then ‘Extract sequences from list’, followed by ‘Extract to a new subfolder’.
  14. All files were selected in the new folder the batch was renamed by removing 33 characters from the end.
  15. The outgroup (TG0248…) was copied from the ‘EPIC_5525_Forward_Reverse’ folder and pasted into the ‘5525 Consensus Sequences’ folder. The name was then edited to ‘TG0248’.
  16. All sequences were selected and used to build an alignment using Multiple Align, Muscle, and default settings.
  17. The alignment was opened and double-checked for possible additional edits then saved and renamed to ‘5525 Nucleotide alignment’.
  18. A MrBayes tree inference was run on ‘5525 Nucleotide alignment’ using the following parameters: HKY85 substitution model, gamma rate variation, TG0248 outgroup, 3,000,000 chain length, 500 subsampling frequency, 300,000 burn-in length, and 24,278 random seed. The results are shown below:¬†


Based on the first nucleotide alignment created with all forward and reverse reads present, there a number of polymorphic sites as well as heterozygous sites. Some even overlapped and were both polymorphic and heterozygous sites. There were twelve apparent polymorphic sites, exhibited in columns 41, 48, 56, 74, 86, 140, 191, 239, 303, 315, 329, and 334. For the heterozygous sites, five were identified, shown in columns 86 (C/T), 140 (C/G), 191 (A/G), 239 (C/G), and 334 (A/C). However, in both cases, there may have been more that were not identified due to carelessness or human error. Four sites exhibited both polymorphism and heterozygosity. These columns included 86, 140, 191, and 334.

According to the inferred tree, it appears that the clade at the top comprised of samples JP1132, JP1158, JP1133, JP1134, JP1156, JP1228 was strongly supported (100%), as well as the clade that grouped JP1152 separately from the rest of the 19 samples (approximately 99%). It is possible that the clade at the top represented populations that were geographically closer, due to the degree of their genetic similarity and the posterior probability value. However, since the bottom clade only portrayed a posterior probability of approximately 51%, there may be other factors aside from geographic proximity that contribute to the lower support value obtained.

Leave a Reply

Your email address will not be published. Required fields are marked *

Important: Read our blog and commenting guidelines before using the USF Blogs network.

Skip to toolbar