DNA data used for building Quercus phylogenetic tree in the article of "A consistent species richness-climate relationship for oaks across the Northern Hemisphere"


Xiaoting Xu, Dimitar Dimitrov, Nawal Shrestha, Carsten Rahbek, Zhiheng Wang




The attached zip file contains:

1. Folder of Aligned_DNA_Data: All sequences of each DNA marker which we used for generate consensus sequence for each species.


Over 4000 accessions of Quercus from GenBank (key word “Quercus”; search field “Organism”, 2012-10-06) were downloaded and aligned with MAFFT v.7 (Katoh et al., 2005, https://mafft.cbrc.jp/alignment/server/) using the E-INS-I or FFT-INS-I method followed by visually checking for inconsistencies or erroneous sequences. Taxa with unclear taxonomic placement or labelled as hybrids were removed. In total, we obtained sequence information for 145 taxa and 22 gene markers. Sequences with "aberrant" behaviour or resulting in long branches in the best scoring ML tree were removed.


2. FinalData.nex: concatenated matrix of 11 DNA markers for 42 species (40 species of Quercus and 2 species as outgroup)


We used consensus sequence in final analysis. A consensus sequence was constructed using MATLAB R2013a (The MathWorks, Inc., Natick, MA, US) when more than one sequence was available for the same gene and taxa. We then built a ML tree using the rapid bootstrapping method with 1000 replicates and used RogueNaRok with default parameters to remove rogue taxa (Aberer et al., 2013).



Aberer, A.J., Krompass, D. & Stamatakis, A. (2013) Pruning Rogue Taxa Improves Phylogenetic Accuracy: An Efficient Algorithm and Webservice. Systematic Biology, 62, 162-166.

Katoh, K., Kuma, K., Toh, H. & Miyata, T. (2005) MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Research, 33, 511 - 518.