Phylogenomic methods are incredibly popular, and there are various baits and probes for capturing different pieces of the insect genome for analysis. One of the next aspects of 'big data' will be to develop ways to combine all of these sources!
My previous post covered a method for visualizing data on tree tips -- specifically I showed a plot of the number of loci recovered after doing a quick combination of genomes, transcriptomes, and UCEs (using the Phyluce pipeline). I'm happy to write that our research paper on the combination of phylogenomic data has been published in Molecular Phylogenetics and Evolution. Bossert et al. 2019 report successful results when three types of datasets are combined for Apidae (the largest bee family).
Bossert, S., Murray, E.A., Almeida, E.A.B., Brady, S.G., Blaimer, B.B. & Danforth, B.N. (2019) Combining transcriptomes and ultraconserved elements to illuminate the phylogeny of Apidae. Molecular Phylogenetics and Evolution, 130, 121-131. doi.org/10.1016/j.ympev.2018.10.012
Kieran, T.J., Gordon, E.R., Forthman, M., Hoey-Chamberlain, R., Kimball, R.T., Faircloth, B.C., Weirauch, C. and Glenn, T.C. (2019) Insight from an ultraconserved element bait set designed for Hemipteran phylogenetics integrated with genomic resources. Molecular Phylogenetics and Evolution, 130, 297-303.
Phylogenomic datasets can be gappier than matrices concatenated from a small number of hand-selected, Sanger-sequenced genes. Here is a nice way to visualize the percent data coverage and see the distribution of missing data across the tree.
I have a dataset of hundreds of loci acquired using ultraconserved element (UCE) probe matching. Besides my own data, there are other types of data that can potentially be incorporated (published genomes, transcriptomes, etc.). I want to see how the topology may be sensitive to different data and parameter permutations. I can use this handy barplot visual for dataset exploration, such as identifying problem clades with low data coverage.
Programs and packages involved in producing this figure:
Load your tree and your data (count data, continuous measurements, etc.) into R, use the handy script from Liam Revell's site, and in short time, you'll have a beautiful and informative figure!
Branstetter, M.G., Danforth, B.N., Pitts, J.P., Faircloth, B.C., Ward, P.S., Buffington, M.L., Gates, M.W., Kula, R.R. & Brady, S.G. (2017) Phylogenomic Insights into the evolution of stinging wasps and the origins of ants and bees. Current Biology, 27, 1019-1025.
Branstetter, M.G., Longino, J.T., Ward, P.S. & Faircloth, B.C. (2017) Enriching the ant tree of life: enhanced UCE bait set for genome-scale phylogenetics of ants and other Hymenoptera. Methods in Ecology and Evolution, 8, 768-76.
Faircloth, B.C. (2015) PHYLUCE is a software package for the analysis of conserved genomic loci. Bioinformatics, btv646.
Lopez-Osorio, F., Pickett, K.M., Carpenter, J.M., Ballif, B.A. & Agnarsson, I. (2017) Phylogenomic analysis of yellowjackets and hornets (Hymenoptera: Vespidae, Vespinae). Molecular Phylogenetics and Evolution, 107, 10-15.
Use the free photo processing program, Fiji, to make a gif out of your stack of z-stepped images! You'll produce a nice little video clip that scrolls through all of the layered shots that you've taken of your specimen.
make a gif using ImageJ and your stack of images:
Setting a starting tree in the program BEAST can be a complicated issue, and I've been asked about troubleshooting for it. Here is a full XML file with annotations, as an example of how to designate a starting tree and how to force BEAST to keep it as a fixed topology.
BEAST is a widely-used phylogenetic dating program and has an excellent GUI interface in BEAUti, where users can control most all parameters and inputs they'd need. BEAUti is the front-end program that produces the XML file that is then used by BEAST for tree estimation. XML stands for 'eXtensible Markup Language', which is both human and machine readable, and is similar to HTML.
One piece that must be manually edited in a text editor is the user-specified starting tree, if desired. Why use a starting tree? For large and difficult datasets, one can start in the best area of parameter space, so that the Markov chain isn't wasting time jumping around to sample the presumably 'incorrect' topologies. I'm not sure as to how much a starting tree increases efficiency, since alternate topologies can still be sampled (it's certainly not needed for small, straightforward datasets), but I may update my opinions in the future based on the success of trying to manipulate phylogenomic data.
There are two different XML editing tasks I'll cover.
Setting a user-specified starting topology
The default starting tree is a random tree, which is coded in an element (content & attributes surrounded by an opening and closing tag ('init', our tree initializer). The whole element needs to be replaced by a user-specified starting tree in newick format. Take a look at the example document!
Fixing the starting tree topology
There are four lines to comment out: the operators for subtree-slide, wide & narrow exchange, and Wilson-Balding. Removing these four operators prevents the topology from updating, but still allows for estimation of the node ages (i.e., branch lengths will be modified even though the topology will not). In an XML file, comments are surrounded by ' <!-- ' and ' --> ', which means they will not be processed.
There are a couple of nice sites with information on user input starting trees, yet translating that to your own data can still be a bit of a struggle. I hope here I could add a bit of guidance on the issue by providing an annotated XML file to help clarify the changes needed.
There is now an updated and more-detailed tutorial at: http://www.beast2.org/fix-starting-tree/.
Also, thanks to great info from: http://www.northernbotanist.com/?page_id=732.
For the PC owners: Have you ever used Windows PowerShell? It's something akin to Command Prompt, and can be utilized for task automation. It provides a really easy way to batch rename hundreds of files in a folder.
Specifically, I had hundreds of individual file names containing a "-" that needed to be replaced or removed, and I was looking for something straightforward for changing the names. I found Windows PowerShell, which is a management framework that was developed about ten years ago. I discovered it was pretty easy to use PowerShell for this task and others like it, and thought I'd share it here!
Here's how to get set up:
All your files should be in one folder, which you'll designate as your working directory. Open PowerShell on your PC. You can just type "powershell" into the search box in Windows 10. A window will open.
To navigate to your folder of files, type "cd" (change directory) at the prompt and then add a space. Drag and drop your working folder into the PowerShell window. The drag and drop puts the whole file path into place for you. Press enter and you'll see that you're now operating out of your folder.
to do a batch rename on the files in your current directory:
How does it work? You are 'piping' ("|") the contents of your directory ("dir") to the "rename-item" cmdlet (read: command-lette). If you were to enter only "dir" at the prompt, you would see all the files in your folder displayed; we use the last section of the script to run through all of these files.
I initially learned of this command at: https://www.howtogeek.com/111859/how-to-batch-rename-files-in-windows-4-ways-to-rename-multiple-files/
Covering topics of phylogenetics and systematics.