Recently, in April 2026, new AADR versions were released on Harvard Dataverse. Among the more important additions are the new compatibility datasets introduced for reducing platform-specific bias when co-analyzing ancient DNA generated with different experimental setups. In practice, this matters when combining data produced with different capture reagents such as Agilent (AG), Twist (TW), and shotgun (SG), because these can introduce systematic differences that may affect downstream population genetic analyses. The compatibility panels were added to minimize that problem and make mixed-platform datasets more directly comparable.
Downloading the New Compatibility 2M SNP Subset
Below are the commands to download the latest compatibility dataset. At the moment, this seems like the most sensible choice if you want to work with mixed-platform ancient DNA in ADMIXTOOLS, especially when combining data generated with different capture reagents or with shotgun data.
wget -O v66_compatibility.tgeno "https://dataverse.harvard.edu/api/access/datafile/13663735"
wget -O v66_compatibility.snp "https://dataverse.harvard.edu/api/access/datafile/13664262"
wget -O v66_compatibility.ind "https://dataverse.harvard.edu/api/access/datafile/13663701"
# Optional: sample metadata / annotations
wget -O v66_compatibility.anno "https://dataverse.harvard.edu/api/access/datafile/13663707"
Note: the newer AADR releases are now distributed in TGENO format. That works fine with the original ADMIXTOOLS implementation, but not directly with admixtools2 in R. If you want to use the R version, you first need to convert the dataset to classic EIGENSTRAT format with convertf.
Converting TGENO to EIGENSTRAT
If you are using admixtools2, you need a recent build of convertf that supports TGENO input. Older binaries, including ones compiled against earlier versions of the Reich Lab tools, will not work here.
To compile the current version on a Debian-based Linux system:
sudo apt update
sudo apt install -y \
build-essential \
gfortran \
liblapack-dev \
liblapacke-dev \
libgsl-dev \
libopenblas-dev
git clone https://github.com/DReichLab/AdmixTools
cd AdmixTools/src
make clobber
make LDLIBS="-llapacke" install
After compilation, the binaries will be inside:
AdmixTools/bin
You can either copy convertf somewhere in your $PATH:
sudo cp ../bin/convertf /usr/local/bin/
or just move the binary into the dataset directory and run it locally from there. For occasional use, that is usually enough.
Next, create a parameter file called for example convert.par:
genotypename: v66_compatibility.tgeno
snpname: v66_compatibility.snp
indivname: v66_compatibility.ind
outputformat: EIGENSTRAT
genooutfilename: v66_EIGENSTRAT.geno
snpoutfilename: v66_EIGENSTRAT.snp
indoutfilename: v66_EIGENSTRAT.ind
Now we run:
convertf -p convert.par
# If convertf is in the current directory instead:
./convertf -p convert.par
This may take a while, since convertf is single-threaded and not especially speed optimised. After it finishes, you will have an admixtools2-compatible EIGENSTRAT version of the v66 compatibility dataset.
One thing worth pointing out here: the TGENO files may appear to load in admixtools2 without throwing an obvious error, but in practice the results are not reliable. For example, basic f4 tests can come out non-significant where they would normally be significant. So even if it seems to “work”, it is better to convert first and avoid bad output.
General Thoughts on V66
I have not used this SNP set enough yet to say anything definitive about how it compares to 1240k in practice, but my first impression is positive. The idea behind the compatibility panels makes sense, and for mixed-platform analyses they will likely be more useful than forcing everything onto older sets without accounting for platform effects.
What I like less is the current .ind file labeling. Population groupings seem more country-based than site-based in many cases, which feels like a step back compared to v62. The older naming was often easier to work with when you actually wanted archaeologically meaningful grouping rather than broader geographic bins.
That said, this is not a major problem. You can always relabel or regroup samples manually depending on the analysis. In practice, that is often necessary anyway.
For now, I would say the compatibility dataset looks promising, especially for ADMIXTOOLS workflows where platform-specific bias is a real concern. Whether it turns out to be broadly preferable to the standard 1240k sets is something that will only become clear after more actual use.
If you want to continue from dataset preparation to actual analysis, see Running f4-Statistics with Admixtools in R and Running qpAdm in R: Testing and Interpreting Ancestry Models for ancestry modeling.