mergeit is part of the EIGENSOFT package and can be used to merge exactly two EIGENSTRAT datasets without converting to PACKEDPED format first. In this post, I’ll show how to merge the sample we created in Pseudohaploid Genotyping for Ancient DNA: BAM to EIGENSTRAT with the AADR dataset.
Setting up EIGENSOFT
mergeit is part of the EIGENSOFT package. You can install it via conda:
conda install -c bioconda eigensoft
If you haven’t installed conda yet, see the Miniconda setup in the pseudohaploid genotyping post.
Alternatively, if you prefer to compile from source, see: From EIGENSTRAT to PACKEDPED.
Setting Up A Parameter File
Like other EIGENSOFT tools, mergeit requires a parameter file:
geno1: aadr.geno
snp1: aadr.snp
ind1: aadr.ind
geno2: eigenstrat_output.geno
snp2: eigenstrat_output.snp
ind2: eigenstrat_output.ind
genooutfilename: merged.geno
snpoutfilename: merged.snp
indoutfilename: merged.ind
Note: The first dataset (geno1, snp1, ind1) acts as the reference, only SNPs present in dataset 1 will appear in the output. Since we’re merging our sample into AADR, put AADR as dataset 1. Adjust the input prefixes as needed.
Save this as mergeit.par and run:
mergeit -p mergeit.par
This produces merged.geno, merged.snp, and merged.ind, ready for downstream analysis with ADMIXTOOLS.