How to Download the AADR Dataset (Linux & WSL)

A Linux environment is unavoidable when it comes to bioinformatical data processing and preparation. You can use your favorite distribution.

For Windows users, the Windows Subsystem for Linux (WSL) provides a good alternative to dual booting or setting up a full virtual machine.

Installing WSL with Debian

Open PowerShell as Administrator and run:

wsl --install -d Debian

Once installed, update the system:

sudo apt update && sudo apt upgrade -y

Downloading A Genetic Dataset

Before doing PCA, ADMIXTURE, qpAdm, etc, you need actual genotype data. A good and comprehensive resource is the Allen Ancient DNA Resource (AADR).

If you’re only interested in ancient samples, download the following three files from the AADR:
v62.0_1240k_public.geno, .snp, and .ind.
(If you want information on sample origins, you should also download the corresponding .anno file.)

If you’d like to include modern samples as well, which can be useful for personal genetic comparisons, download the same file types, but with the prefix v62.0_HO_public.

Since these files are large, it’s best to download them using wget (over Linux) from the direct download links. This avoids browser interruptions.

Example using `wget`:

# Install wget
sudo apt install wget

# Download the files
# Example for v62
wget -O v62.0_HO_public.geno "https://dataverse.harvard.edu/api/access/datafile/10537419"
wget -O v62.0_HO_public.snp "https://dataverse.harvard.edu/api/access/datafile/10537421"
wget -O v62.0_HO_public.ind "https://dataverse.harvard.edu/api/access/datafile/10537420"

Installing WSL with Debian#

Downloading A Genetic Dataset#

Example using wget:#

Installing WSL with Debian

Downloading A Genetic Dataset

Example using `wget`: