Computational population genetics, clearly explained

In-depth writing on genetic data, ancient DNA, and the methods used to study ancestry and human history.

Latest Posts

The Genetic Origins of the Proto-Anatolians

The origins of the Proto-Anatolians are often treated as one of the more obscure problems in Indo-European archaeogenetics, but the genetic data may be less ambiguous than this framing suggests. Since Anatolian is widely regarded as the earliest-splitting branch of Indo-European, the relevant question is whether the earlier Eneolithic steppe-related ancestry behind Yamnaya, especially the Caucasus-Lower Volga or CLV component, also moved south of the Caucasus and into Anatolia. For this purpose, I use Progress-2 specifically as a practical proxy for the north Caucasus-facing part of this Eneolithic steppe-related ancestry. ...

May 10, 2026

Downloading and Converting AADR v66

Recently, in April 2026, new AADR versions were released on Harvard Dataverse. Among the more important additions are the new compatibility datasets introduced for reducing platform-specific bias when co-analyzing ancient DNA generated with different experimental setups. This matters when combining data produced with different capture reagents such as Agilent (AG), Twist (TW), and shotgun (SG), because these can introduce systematic differences that may affect downstream population genetic analyses. The compatibility panels were added to minimize that problem and make mixed-platform datasets more directly comparable. ...

April 17, 2026

dt: A Modern awk Alternative for Common Data Workflows

I recently published dt, a modern data transformation tool designed to make the awk workflows commonly used on this blog more intuitive, expressive, and fast. Dt is written in Rust because it compiles to a single binary that runs anywhere, and it uses Polars for the actual data processing, giving you columnar operations that handle large files efficiently. The syntax uses explicit functions (filter(), select(), mutate()) chained together with pipes, making common transformations easier to read and modify. There’s also an interactive REPL that shows you the result after each operation, letting you build complex pipelines step-by-step, catch mistakes early, and undo errors with .undo. ...

February 11, 2026

Fast, Transparent f4-Based Admixture Screening in R

In this post, I will build a transparent admixture-screening workflow from scratch in R using f4-statistics and constrained regression. The main advantage is automation: instead of hand-writing every candidate model, the script tests many 2-way, 3-way, and 4-way source combinations in one pass and ranks them by fit. The result is not a replacement for qpAdm, but a fast screening layer that can help you identify promising models before you validate them more formally. ADMIXTOOLS 2 already includes batch tools such as qpadm_multi() and qpadm_rotate(), so the point here is not that qpAdm cannot be automated. The point is that this custom workflow is compact, transparent, easy to modify, and useful for exploratory model search. ...

February 3, 2026

How to Merge EIGENSTRAT Datasets Using mergeit

mergeit is part of the EIGENSOFT package and can be used to merge exactly two EIGENSTRAT/PACKEDANCESTRYMAP datasets without converting to PACKEDPED format first. In this post, I’ll show how to merge the sample we created in Pseudohaploid Genotyping for Ancient DNA: BAM to EIGENSTRAT with the AADR dataset. Setting up EIGENSOFT mergeit is part of the EIGENSOFT package. You can install it via conda: conda install -c bioconda eigensoft If you haven’t installed conda yet, see the Miniconda setup in the pseudohaploid genotyping post. ...

January 12, 2026