Convert Raw DNA Files to EIGENSTRAT for ADMIXTOOLS and Merge with AADR

Commercial raw DNA exports are not provided in the file formats normally used by ADMIXTOOLS, ADMIXTOOLS 2, AADR-based workflows, or PLINK. Files from 23andMe, AncestryDNA, FamilyTreeDNA, MyHeritage, and Living DNA are usually plain-text vendor exports, while downstream workflows often require PLINK PACKEDPED or EIGENSTRAT/PACKEDANCESTRYMAP files.

EIGENSTRAT is often used loosely to refer to the .geno/.snp/.ind triplet. Strictly speaking, EIGENSTRAT is the plain-text version of that triplet; PACKEDANCESTRYMAP is the packed binary form of the same three files. ADMIXTOOLS and ADMIXTOOLS 2 work with either, but PACKEDANCESTRYMAP takes far less disk space and loads much faster, which is why it’s the practical default used here.

This toolkit converts commercial raw DNA exports into both PACKEDPED and PACKEDANCESTRYMAP, and can merge the result with an AADR dataset.

It takes one or more raw DNA exports for the same individual and writes:

.bed .bim .fam
.geno .snp .ind

It can also merge the converted sample directly into an existing AADR-style dataset, including the newer AADR releases that use TGENO-style genotype storage.

The purpose is to reduce the amount of manual format handling normally required for this workflow. There is no need to compile convertf, no need to compile mergeit, no PLINK dependency for the raw conversion step, and no manual concatenation or conversion of several vendor files to a common 23andMe-style format first. The wrapper detects the input layouts, creates a merged master raw file for the same individual, writes the converted outputs, and keeps the result in a clear folder structure.

The toolkit is available for 12$.

Download: Raw DNA to AADR Toolkit - Convert 23andMe, Ancestry & More to PackedAncestryMap And Packedped

The free workflow

If you want to analyze your own DNA with ADMIXTOOLS (qpAdm, qpWave, f-statistics) or run PCA and ADMIXTURE against a reference, you first need to convert your raw file and merge it into an AADR-style reference dataset.

The manual workflow can involve a lot of brittle steps:

convert the commercial raw file to 23andMe-style format
use PLINK to convert it to PACKEDPED
compile mergeit/convertf
convert the latest AADR v66 dataset with convertf to PACKEDANCESTRYMAP by setting up a parameter file and waiting at least an hour
convert your own PLINK-derived PACKEDPED dataset to PACKEDANCESTRYMAP
set up another parameter file to merge both PACKEDANCESTRYMAP datasets
hope that after several hours you have a working merged dataset

In the best case, this works after a lot of manual setup and waiting.

In the worst case, all you get is a cryptic error message. You might then try converting the AADR dataset to PACKEDPED and merging with PLINK’s --bmerge, which can lead to another round of allele, SNP, and strand issues, followed by another conversion back to the PACKEDANCESTRYMAP format needed for ADMIXTOOLS.

Advantages of this Bundle

A major advantage is that the bundle can automatically create a single master raw file from several commercial DNA exports for the same individual.

For example, if you have one AncestryDNA file, one FamilyTreeDNA file, and one MyHeritage file for the same person, you can pass all of them to the wrapper. The tool reads the different vendor formats, normalizes them, merges them by rsID, and writes one merged raw file before producing PACKEDPED and PACKEDANCESTRYMAP output.

Here is the practical difference this bundle makes: instead of preparing several intermediate files by hand and moving between different tools, the wrapper does the format detection, raw-file merging, conversion, and optional AADR merge in one reproducible workflow.

The main advantages are:

accepts raw DNA exports from 23andMe, AncestryDNA, FamilyTreeDNA, MyHeritage, and Living DNA directly, no manual conversion to 23andMe format first
can merge multiple vendor exports for the same individual into a single master raw file (by rsID), handling concatenation, sorting, deduplication, and vendor-format cleanup automatically
preserves the merged master file in 01_merged_raw/ for reuse
writes both PLINK PACKEDPED (.bed/.bim/.fam) and PACKEDANCESTRYMAP (.geno/.snp/.ind) output
no PLINK dependency, no need to compile ADMIXTOOLS, no manual convertf or mergeit parameter files
merges the new sample directly into an AADR-style PACKEDANCESTRYMAP dataset, intersecting SNPs and checking allele compatibility automatically
clear output folder structure and logs for easy inspection
runs on Windows, Linux, and macOS with minimal dependencies (Python + NumPy)
end-to-end in under 10 minutes

Usage

The workflow has two basic steps.

First, create a merged master raw file and convert it to PACKEDPED and PACKEDANCESTRYMAP. You can pass a single --input for one vendor file, or multiple --input flags to merge several exports for the same individual:

python tools/bundle_raw_convert.py \
  --input my_ancestry_raw.txt \
  --input my_ftdna_raw.csv \
  --input my_myheritage_raw.txt \
  --iid sample_001 \
  --gender M \
  --ind-label Sample \
  --out-dir output

This reads the vendor files (one or more), merges them by rsID when multiple are given, and writes:

output/01_merged_raw/sample_001.merged.txt
output/02_packedped/sample_001.bed
output/02_packedped/sample_001.bim
output/02_packedped/sample_001.fam
output/03_packedancestrymap/sample_001.geno
output/03_packedancestrymap/sample_001.snp
output/03_packedancestrymap/sample_001.ind

Second, merge the new sample into an existing AADR-style dataset:

python tools/mergeit_fast.py \
  /path/to/aadr/v66 \
  output/03_packedancestrymap/sample_001 \
  output/04_aadr_merged/sample_001.aadr

mergeit_fast.py automatically detects the genotype layout of the input datasets.

It then detects whether the genotype file is:

packed SNP-major .geno with a GENO header
packed sample-major .tgeno with a TGENO header
plain-text SNP-major genotype data
plain-text sample-major genotype data

The merged output is always written as packed ancestry map, which can be used directly with ADMIXTOOLS 2 in R:

merged_prefix.geno
merged_prefix.snp
merged_prefix.ind

So you do not need to manually convert every input to the same internal genotype layout before merging. The merger reads the supported .geno or .tgeno input layout, intersects SNPs, checks chromosome, position, and allele compatibility, applies allele flips where needed, and writes a packed ancestry map result.

After downloading and unzipping the bundle, see the README.md file in the root directory. It explains the full workflow in more detail, including setting up Python, installing the requirements, downloading the AADR v66 files, running the conversion script, and merging your sample into the reference dataset.

The free workflow#

Advantages of this Bundle#

Usage#

The free workflow

Advantages of this Bundle

Usage