Neither of my two samples that were sent for sequencing worked, so I worked with three of Elaine’s samples. They were labeled as white tuna, red snapper, and yellowtail.
First, we loaded the sequences into Geneious, and spent some time becoming familiar with the layout of the program, how to read the sequences, and how to edit them.
Each sample has a forward read and a reverse read. We assembled the two together, which allows us to see what the consensus sequence looks like. The sequence reads can be of varying quality, so creating a consensus of the forward and reverse read gives a sequence we can have greater confidence in. Some parts of each read may be of low quality, so if the other read was high quality in that area then we have a good idea of what those bases should be and the consensus should reflect that. Below is an image of the program while viewing the De Novo assembly.
Now we need to edit the reads to make sure the consensus sequence that we generate reflects the areas of high quality in our reads. The beginning of each read is usually of low quality, so we trim those areas down. Additionally, we had to look through the rest of the sequence to spot bases labeled with “N”. This may be due to the two reads conflicting, but one read may have a much higher quality read than the other. So we should edit the poor quality read to match the other. Elaine’s samples had no low quality reads within the sequence, so I only had to trim the ends. Once finished editing, we created the consensus sequence.
We used BLAST (Basic Local Alignment Search Tool) to compare our consensus sequence to a database of other sequences. Elaine’s reads were all of great quality, so the top hit’s for all three samples ranged from 99.8%-100%.
Labeled White Tuna – BLAST Thunnus alalunga (Albacore)
Labeled Red Snapper – BLAST Oreochromis niloticus (Nile Tilapia/gross)
Labeled Yellowtail – BLAST Seriola quinqueradiata (Yellowtail)
Sequence Alignment with top hits:
Next and last, I aligned the white tuna sample sequence with 5 other Thunnus alalunga sequences from the BLAST search. Across the 5 sequences, there were only two polymorphism, at site 443 and 530. At both sites, the polymorphism was between 2 bases, C and T.