Commercial raw DNA exports are not provided in the file formats normally used by ADMIXTOOLS, ADMIXTOOLS 2, AADR-based workflows, or PLINK-based population genetics pipelines. Files from 23andMe, AncestryDNA, FamilyTreeDNA, MyHeritage, and Living DNA are usually plain-text vendor exports, while downstream workflows often require PLINK PACKEDPED or EIGENSTRAT/PACKEDANCESTRYMAP files.
This toolkit provides a direct conversion and merge workflow for those formats.
It takes one or more raw DNA exports for the same individual and writes:
.bed .bim .fam
.geno .snp .ind
It can also merge the converted sample directly into an existing AADR-style dataset, including the newer AADR releases that use TGENO-style genotype storage.
The purpose is to reduce the amount of manual format handling normally required for this workflow. There is no need to compile convertf, no need to compile mergeit, no PLINK dependency for the raw conversion step, and no manual concatenation or conversion of several vendor files to a common 23andMe-style format first. The wrapper detects the input layouts, creates a merged master raw file for the same individual, writes the converted outputs, and keeps the result in a clear folder structure.
The toolkit is available for 12$.
Download: Raw DNA to AADR Toolkit - Convert 23andMe, Ancestry & More to PackedAncestryMap And Packedped
The free workflow
If you work with AADR, ADMIXTOOLS, qpAdm, qpWave, PCA pipelines, or other EIGENSTRAT-based tools, you often need to add a modern sample to a reference dataset.
The manual workflow can involve a lot of brittle steps:
- convert the commercial raw file to 23andMe-style format
- use PLINK to convert it to PACKEDPED
- compile ADMIXTOOLS
- convert the latest AADR v66 dataset with convertf to PACKEDANCESTRYMAP by setting up a parameter file and waiting around an hour
- convert your own PLINK-derived PACKEDPED dataset to PACKEDANCESTRYMAP
- set up another parameter file to merge both PACKEDANCESTRYMAP datasets
- hope that after several hours you have a working merged dataset
In the best case, this works after a lot of manual setup and waiting.
In the worst case, all you get is a cryptic error message. You might then try converting the AADR dataset to PACKEDPED and merging with PLINK’s --bmerge, which can lead to another round of allele, SNP, and strand issues, followed by another conversion back to the PACKEDANCESTRYMAP format needed for ADMIXTOOLS 2.
Advantages of this Bundle
A major advantage is that the bundle can automatically create a single master raw file from several commercial DNA exports for the same individual.
For example, if you have one AncestryDNA file, one FamilyTreeDNA file, and one MyHeritage file for the same person, you can pass all of them to the wrapper. The tool reads the different vendor formats, normalizes them, merges them by rsID, and writes one merged raw file before producing PACKEDPED and PACKEDANCESTRYMAP output.
Here is the practical difference this bundle makes: instead of preparing several intermediate files by hand and moving between different tools, the wrapper does the format detection, raw-file merging, conversion, and optional AADR merge in one reproducible workflow.
The main advantages are:
- accepts common commercial raw DNA exports directly
- supports 23andMe, AncestryDNA, FamilyTreeDNA, MyHeritage, and Living DNA-style files
- no need to manually convert everything to 23andMe format first
- can automatically create a single merged master raw file from multiple vendor exports for the same individual
- merges raw files by rsID before conversion
- avoids manual concatenation, sorting, deduplication, and vendor-format cleanup
- preserves the merged master file in
01_merged_raw/for other use - no PLINK dependency for the raw DNA conversion step
- no need to compile ADMIXTOOLS just to run
convertformergeit - no manual
convertfparameter files for the basic conversion workflow - no manual
mergeitparameter files for the AADR merge workflow - writes PLINK PACKEDPED files:
.bed,.bim,.fam - writes PACKEDANCESTRYMAP files:
.geno,.snp,.ind - can merge the new sample directly into an AADR-style PACKEDANCESTRYMAP dataset
- intersects SNPs automatically during merge
- checks allele compatibility during merge
- uses clear output folders instead of dumping everything into one directory
- writes logs so problems are easier to inspect
- works on Windows, Linux, and macOS
- keeps the workflow in Python
- is significantly faster. The whole procedure does not take longer than 10 minutes.
Usage
The workflow has two basic steps.
First, create a merged master raw file and convert it to PACKEDPED and PACKEDANCESTRYMAP:
python tools/bundle_raw_convert.py \
--input my_ancestry_raw.txt \
--input my_ftdna_raw.csv \
--input my_myheritage_raw.txt \
--iid sample_001 \
--gender M \
--ind-label Sample \
--out-dir output
This reads the different vendor files, merges them by rsID for the same individual, and writes:
output/01_merged_raw/sample_001.merged.txt
output/02_packedped/sample_001.bed
output/02_packedped/sample_001.bim
output/02_packedped/sample_001.fam
output/03_packedancestrymap/sample_001.geno
output/03_packedancestrymap/sample_001.snp
output/03_packedancestrymap/sample_001.ind
Second, merge the new sample into an existing AADR-style dataset:
python tools/mergeit_fast.py \
/path/to/aadr/v66 \
output/03_packedancestrymap/sample_001 \
output/04_aadr_merged/sample_001.aadr
mergeit_fast.py automatically detects the genotype layout of the input datasets.
It then detects whether the genotype file is:
- packed SNP-major
.genowith aGENOheader - packed sample-major
.tgenowith aTGENOheader - plain-text SNP-major genotype data
- plain-text sample-major genotype data
The merged output is always written as packed ancestry map, which can be used directly with ADMIXTOOLS 2 in R:
merged_prefix.geno
merged_prefix.snp
merged_prefix.ind
So you do not need to manually convert every input to the same internal genotype layout before merging. The merger reads the supported .geno or .tgeno input layout, intersects SNPs, checks chromosome, position, and allele compatibility, applies allele flips where needed, and writes a packed ancestry map result.
After downloading and unzipping the bundle, see the README.md file in the root directory. It explains the full workflow in more detail, including setting up Python, installing the requirements, downloading the AADR v66 files, running the conversion script, and merging your sample into the reference dataset.