Computational population genetics, clearly explained

From PCA and ADMIXTURE to imputation and f-statistics. In-depth writing on population genetics, ancient DNA, bioinformatics tools, and pipelines.

Latest Posts

Downloading and Converting AADR v66

Recently, in April 2026, new AADR versions were released on Harvard Dataverse. Among the more important additions are the new compatibility datasets introduced for reducing platform-specific bias when co-analyzing ancient DNA generated with different experimental setups. In practice, this matters when combining data produced with different capture reagents such as Agilent (AG), Twist (TW), and shotgun (SG), because these can introduce systematic differences that may affect downstream population genetic analyses. The compatibility panels were added to minimize that problem and make mixed-platform datasets more directly comparable. ...

April 17, 2026

dt: A Modern awk Alternative for Common Data Workflows

I recently published dt, a modern data transformation tool designed to make the awk workflows commonly used on this blog more intuitive, expressive, and fast. Dt is written in Rust because it compiles to a single binary that runs anywhere, and it uses Polars for the actual data processing, giving you columnar operations that handle large files efficiently. The syntax uses explicit functions (filter(), select(), mutate()) that you chain together with pipes, so transformations read like a recipe instead of a regex puzzle. There’s also an interactive REPL that shows you the result after each operation, letting you build complex pipelines step-by-step, catch mistakes early, and undo errors with .undo. ...

February 11, 2026

Fast, Transparent f4-Based Admixture Screening in R

In this post, I will build a transparent admixture-screening workflow from scratch in R using f4-statistics and constrained regression. The main advantage is automation: instead of hand-writing every candidate model, the script tests many 2-way, 3-way, and 4-way source combinations in one pass and ranks them by fit. The result is not a replacement for qpAdm, but a fast screening layer that can help you identify promising models before you validate them more formally. ADMIXTOOLS 2 already includes batch tools such as qpadm_multi() and qpadm_rotate(), so the point here is not that qpAdm cannot be automated. The point is that this custom workflow is compact, transparent, easy to modify, and useful for exploratory model search. ...

February 3, 2026

How to Merge EIGENSTRAT Datasets Using mergeit

mergeit is part of the EIGENSOFT package and can be used to merge exactly two EIGENSTRAT datasets without converting to PACKEDPED format first. In this post, I’ll show how to merge the sample we created in Pseudohaploid Genotyping for Ancient DNA: BAM to EIGENSTRAT with the AADR dataset. Setting up EIGENSOFT mergeit is part of the EIGENSOFT package. You can install it via conda: conda install -c bioconda eigensoft If you haven’t installed conda yet, see the Miniconda setup in the pseudohaploid genotyping post. ...

January 12, 2026

Pseudohaploid Genotyping for Ancient DNA: BAM to EIGENSTRAT

This is a follow-up to my previous post Processing Ancient DNA: From FASTQ to Aligned BAM, where I aligned an ancient DNA sample against the hs37d5 reference genome, producing a filtered BAM compatible with the AADR dataset. In this post, I’ll cover pseudohaploid genotype calling using pileupCaller and converting the output to EIGENSTRAT format for use with ADMIXTOOLS. Since we just created this BAM ourselves in the previous post, we already know it’s aligned to hs37d5. However, if you’re starting with a BAM file, you’ll need to verify the reference genome first. I’ll start by showing how to check BAM headers to identify the reference genome. ...

January 4, 2026