Downloading and Converting AADR v66

Recently, in April 2026, new AADR versions were released on Harvard Dataverse. Among the more important additions are the new compatibility datasets introduced for reducing platform-specific bias when co-analyzing ancient DNA generated with different experimental setups. This is especially relevant when combining data produced with different capture reagents such as Agilent (AG), Twist (TW), and shotgun (SG), because these can introduce systematic differences that may affect downstream analyses. The compatibility panels were added to minimize that problem and make mixed-platform datasets more directly comparable.

Downloading the New Compatibility 2M SNP Subset

Below are the commands to download the latest AADR compatibility dataset. At the moment, this seems like the most sensible choice if you want to work with mixed-platform ancient DNA in ADMIXTOOLS, especially when combining data generated with different capture reagents or with shotgun data. If, instead, you need a broader set of modern samples, for example for PCA or ADMIXTURE, you should choose an _HO dataset, since these datasets include more modern individuals.

wget -O v66_compatibility.tgeno "https://dataverse.harvard.edu/api/access/datafile/13663735"
wget -O v66_compatibility.snp "https://dataverse.harvard.edu/api/access/datafile/13664262"
wget -O v66_compatibility.ind "https://dataverse.harvard.edu/api/access/datafile/13663701"

# Optional: sample metadata / annotations
wget -O v66_compatibility.anno "https://dataverse.harvard.edu/api/access/datafile/13663707"

Note: the newer AADR releases are now distributed in TGENO format. That works fine with the original ADMIXTOOLS implementation, but not directly with admixtools2 in R. If you want to use the R version, you first need to convert the dataset to PACKEDANCESTRYMAP format with convertf.

Converting TGENO to PACKEDANCESTRYMAP

For use in admixtools2, you will need a recent enough build of convertf that supports TGENO input and can convert the current TGENO format to PACKEDANCESTRYMAP. Older binaries, including the EIGENSOFT version I compiled in a previous post alongside smartpca, will not work here.

To compile the current version on a Debian-based Linux system:

sudo apt update
sudo apt install -y \
  build-essential \
  gfortran \
  liblapack-dev \
  liblapacke-dev \
  libgsl-dev \
  libopenblas-dev

git clone https://github.com/DReichLab/AdmixTools
cd AdmixTools/src
make clobber
make LDLIBS="-llapacke" install

After compilation, the binaries will be inside:

AdmixTools/bin

You can either copy convertf somewhere in your $PATH:

sudo cp ../bin/convertf /usr/local/bin/

or just move the binary into the dataset directory and run it locally from there. For occasional use, that is usually enough.

Next, create a parameter file called for example convert.par:

genotypename:    v66_compatibility.tgeno
snpname:         v66_compatibility.snp
indivname:       v66_compatibility.ind
outputformat:    PACKEDANCESTRYMAP
genooutfilename: v66_EIGENSTRAT.geno
snpoutfilename:  v66_EIGENSTRAT.snp
indoutfilename:  v66_EIGENSTRAT.ind

Now we run:

convertf -p convert.par

# If convertf is in the current directory instead:
./convertf -p convert.par

This may take a while. After it finishes, you will have an admixtools2-compatible EIGENSTRAT version of the v66 compatibility dataset.

One thing worth pointing out here: the TGENO files may appear to load in admixtools2 without throwing an obvious error, but the results are not reliable. For example, basic f4 tests can come out non-significant where they would normally be significant. So even if it seems to “work”, it is better to convert first and avoid bad output.

Alternative: AdmixPy

As an alternative, AdmixPy, my own implementation of f-statistics, qpAdm, and qpWave in Python, works directly on the TGENO files, so no conversion step is needed. It also reads EIGENSTRAT, PACKEDANCESTRYMAP, and PLINK binary input, and runs on Linux, macOS, and Windows. See Introducing AdmixPy for setup and a worked f4 example.

General Thoughts on V66

My first impression of this SNP set is positive. The compatibility panels are a sensible and useful approach, especially for mixed-platform analyses, where they can help account for platform effects more effectively than relying on older SNP sets alone.

What I like less is the current .ind file labeling. Population groupings seem more country-based than site-based in many cases, which feels like a step back compared to v62. The older naming was often easier to work with when you actually wanted archaeologically meaningful grouping rather than broader geographic bins.

That said, this is not a major problem. You can always relabel or regroup samples manually depending on the analysis with the annotation files shipped with the releases.

For now, I would say the compatibility dataset looks promising, especially for ADMIXTOOLS workflows where platform-specific bias is a real concern.

If you want to continue from dataset preparation to actual analysis, see: Running f4-Statistics with Admixtools in R and Running qpAdm in R: Testing and Interpreting Ancestry Models for ancestry modeling.

Downloading the New Compatibility 2M SNP Subset#

Converting TGENO to PACKEDANCESTRYMAP#

Alternative: AdmixPy#

General Thoughts on V66#

Downloading the New Compatibility 2M SNP Subset

Converting TGENO to PACKEDANCESTRYMAP

Alternative: AdmixPy

General Thoughts on V66