EIGENSTRAT

Downloading and Converting AADR v66

Recently, in April 2026, new AADR versions were released on Harvard Dataverse. Among the more important additions are the new compatibility datasets introduced for reducing platform-specific bias when co-analyzing ancient DNA generated with different experimental setups. This is especially relevant when combining data produced with different capture reagents such as Agilent (AG), Twist (TW), and shotgun (SG), because these can introduce systematic differences that may affect downstream analyses. The compatibility panels were added to minimize that problem and make mixed-platform datasets more directly comparable. ...

How to Subset Genetic Samples by Population Labels with awk (Create PLINK --keep file)

In an earlier post, PLINK PCA Tutorial: Running PCA in PLINK (Commands + Output), I showed the manual way to build a subset from the .ind/.fam. That works, but if you want to keep thousands of samples it gets tedious fast. Below is a one-liner using awk that generates a PLINK --keep file automatically from a list of populations. Prepare a list of populations to keep: Create a text file (e.g. pops) in the same directory as your reference .ind and .fam. Put one population label per line: ...

Converting EIGENSTRAT/PACKEDANCESTRYMAP to PACKEDPED

The files downloaded in the previous blog post are distributed as an EIGENSTRAT-style .geno/.snp/.ind dataset. This naming can be confusing: the .snp and .ind files are the usual EIGENSTRAT metadata files, but the .geno file may either be plain-text EIGENSTRAT or binary PACKEDANCESTRYMAP. PACKEDPED format allows for easier downstream processing using the PLINK toolset. With PLINK, it becomes straightforward to extract sample subsets, filter SNPs, and perform a wide range of analyses. ...