In this post, I will build a transparent admixture-screening workflow from scratch in R using f4-statistics and constrained regression. The main advantage is automation: instead of hand-writing every candidate model, the script tests many 2-way, 3-way, and 4-way source combinations in one pass and ranks them by fit. ADMIXTOOLS 2 already includes batch tools such as qpadm_multi() and qpadm_rotate(), so the point is not that qpAdm cannot be automated. The point is that this custom workflow is compact, transparent, easy to modify, and useful for exploratory model search before you validate the strongest candidates more formally.
qpAdm remains the standard formal framework for testing admixture models with left and right populations. It estimates admixture weights, computes model fit, optionally constrains weights to be non-negative, and uses block jackknife or bootstrap resampling to estimate uncertainty. The script below is a complementary f4-space screening tool, not a substitute for qpAdm or qpWave.
The idea
If a target population can be approximated as a mixture of several source-related ancestries, then the target’s pattern of f4-statistics relative to a fixed outgroup and a fixed reference set should also be approximable as a weighted combination of the source f4-vectors. This relies on the same broad f-statistic framework that underlies ADMIXTOOLS methods more generally.
In this post I compute f4-statistics of the form:
f4(Outgroup, Test; Ref1, RefX)
for the target and for each candidate source. Each population is therefore represented by a vector of allele-sharing relationships across the reference panel. If the target lies near the convex combination of some sources in this f4 space, that source set is a plausible ancestry model worth checking more carefully.
What the script actually optimizes
The core fit is a constrained weighted least-squares problem:
subject to:
and
Here, y is the target f4-vector, X is the matrix of source f4-vectors, and se contains the marginal standard errors of the target f4 entries. So this is not ordinary Euclidean distance in raw coordinates. It is a diagonally weighted residual objective, solved under biologically meaningful constraints using quadratic programming.
Note one simplification: the fit weights by the marginal standard errors of each f4 entry, but it does not use the full covariance matrix among f4-statistics the way qpAdm does. The weights are still useful as screening estimates, but the p-values from this script should be treated as approximate fit scores rather than as formal qpAdm-equivalent tests.
The role of the outgroup and references
The outgroup anchors the statistic, while the reference populations define the coordinate system in which the target and sources are compared. This is why the reference choice matters: a model is never tested in an absolute vacuum, but relative to the chosen right-side information. In qpWave terms, the left and right populations define an f4 matrix whose rank tells us about the minimum number of ancestry streams separating them, and qpAdm extends this logic to test whether a target can be explained by the chosen left populations relative to the chosen right populations.
This has two practical consequences.
First, more references are not automatically better. A reference that is symmetrically related to all candidate sources contributes little discriminatory information. Second, changing the right set can change whether a model looks acceptable, because the model is conditional on that set. The same caveat applies to qpAdm.
What this workflow is good for
This workflow is especially useful when you have a target, a stable right set, and a medium-sized panel of candidate sources, and you want to answer questions like:
- Which 2-way, 3-way, or 4-way models are geometrically plausible?
- Which sources recur across many good-fitting models?
- Which sources look interchangeable?
- Which candidates are obviously poor proxies?
Because the script builds one f4 basis and reuses it across all combinations, it can screen large model spaces quickly. That makes it a good first-pass tool before running competition tests, qpWave checks, or full qpAdm validation on the strongest candidates.
What it is not
This script does not test the rank structure of the full left-right f4 matrix the way qpWave does, and it does not use the covariance-aware qpAdm machinery. If a model looks good here, the right next step is still to check it with qpAdm, and in many cases also with qpWave or source-competition tests.
That limitation should not be confused with the limitations of PCA-distance models or ADMIXTURE-style component fitting, though. PCA is mainly a visualization and dimensionality-reduction method, and its distances are measured in a reduced coordinate system whose geometry is optimized for variance, not ancestry modeling. ADMIXTURE estimates cluster-like components that depend on the chosen sample set, K value, and optimization run. This script instead fits mixtures directly in a space defined by allele-sharing asymmetries relative to explicit outgroups and reference populations. So it sits in an intermediate position: less rigorous than qpAdm because it lacks full covariance-aware model testing, but more interpretable and statistically anchored than PCA-based or ADMIXTURE-based ancestry modeling.
Installation
You will need admixtools, tidyverse, and quadprog:
install.packages("remotes")
install.packages("tidyverse")
install.packages("quadprog")
remotes::install_github("uqrmaie1/admixtools")
library(admixtools)
library(tidyverse)
library(quadprog)
Example setup
Below I will model the Samara Yamnaya Early Bronze Age group.
# Specify your target population
target_name <- "Russia_Samara_EBA_Yamnaya.AG"
# Specify the outgroup
outgroup <- "Mbuti"
# EIGENSTRAT prefix
data <- "data"
Candidate sources:
potential_sources <- c(
"Turkey_N",
"Turkey_Cayonu_PPN",
"Iran_HajjiFiruz_N",
"Georgia_KotiasKlde_Mesolithic",
"Russia_Samara_EN_Mesolithic",
"Tajikistan_Mesolithic",
"Hungary_EN_Starcevo-1",
"Azerbaijan_MenteshTepe_N_ShomutepeShulaveri",
"Russia_AfontovaGora_UP",
"Russia_YanaRiver_UP",
"Russia_Vologda_Mesolithic"
)
Right populations (references):
references <- c(
"Turkey_Epipaleolithic",
"Russia_Malta_UP",
"Switzerland_Epipaleolithic",
"Georgia_Satsurblia_LateUP",
"Iran_BeltCave_Mesolithic",
"Iraq_PPNA"
)
These references are intended to capture different axes of West Eurasian variation, including western hunter-gatherer-related ancestry, Caucasus-related ancestry, Ancient North Eurasian-related ancestry, early Iranian farmer-related ancestry, and Anatolian-related ancestry. The reference panel should contain populations that help distinguish among the candidate sources.
In retrospect, I would have included a broader set of reference populations: with only six references the script has five f4 entries to fit, so a 4-way model is left with just two degrees of freedom. When the fit is that loosely constrained, many source combinations will look good regardless of whether they are real, which feeds directly into the optimistic p-values discussed below. A wider reference panel would add f4 entries, tighten the fits, and make the ranking more meaningful.
The full script
# --- 1. INPUT CHECKS ---
stopifnot(length(references) >= 2)
potential_sources <- unique(setdiff(potential_sources, target_name))
if (length(potential_sources) < 2) {
stop("You need at least two candidate sources.")
}
# --- 2. INTERNAL HELPER FUNCTIONS ---
solve_weights <- function(target_vec, source_mat, se_vec, tol = 1e-10) {
row_ok <- is.finite(target_vec) &
is.finite(se_vec) & se_vec > 0 &
apply(source_mat, 1, function(z) all(is.finite(z)))
y <- as.numeric(target_vec[row_ok])
X <- source_mat[row_ok, , drop = FALSE]
se <- as.numeric(se_vec[row_ok])
if (length(y) == 0 || nrow(X) == 0) return(NULL)
# Weighted least squares objective:
# minimize sum(((y - X %*% w) / se)^2)
y_w <- y / se
X_w <- sweep(X, 1, se, "/")
# Skip rank-deficient combinations
if (qr(X_w)$rank < ncol(X_w)) return(NULL)
n <- ncol(X_w)
Dmat <- crossprod(X_w) + diag(tol, n)
dvec <- crossprod(X_w, y_w)
# Constraints: sum(weights) = 1 and weights >= 0
Amat <- cbind(rep(1, n), diag(n))
bvec <- c(1, rep(0, n))
fit <- try(quadprog::solve.QP(Dmat, dvec, Amat, bvec, meq = 1), silent = TRUE)
if (inherits(fit, "try-error")) return(NULL)
w <- fit$solution
w[abs(w) < 1e-8] <- 0
if (any(w < -1e-6)) return(NULL)
w <- pmax(w, 0)
w <- w / sum(w)
list(weights = w, keep = row_ok)
}
evaluate_model <- function(y, X, se, w) {
y_pred <- as.numeric(X %*% w)
residuals <- y - y_pred
z_resid <- residuals / se
chi_sq <- sum(z_resid^2)
df_val <- length(y) - (length(w) - 1)
p_val <- if (df_val > 0) pchisq(chi_sq, df_val, lower.tail = FALSE) else NA_real_
rss <- sum(residuals^2)
max_abs_z <- max(abs(z_resid))
list(
chi_sq = chi_sq,
df = df_val,
p_value = p_val,
rss = rss,
max_abs_z = max_abs_z
)
}
# --- 3. DATA PREPARATION ---
cat("Calculating f4 basis...\n")
all_pops <- c(target_name, potential_sources)
f4_results <- f4(
data,
pop1 = outgroup,
pop2 = all_pops,
pop3 = references[1],
pop4 = references[-1],
allsnps = FALSE
)
# allsnps = FALSE keeps SNP selection consistent across these f4 entries.
# This can reduce SNP counts, but it avoids mixing different SNP subsets
# across the basis vectors.
f4_matrix_wide <- f4_results %>%
dplyr::select(pop2, pop4, est, se) %>%
tidyr::pivot_wider(names_from = pop2, values_from = c(est, se))
y <- as.numeric(f4_matrix_wide[[paste0("est_", target_name)]])
target_se <- as.numeric(f4_matrix_wide[[paste0("se_", target_name)]])
x_full <- f4_matrix_wide %>%
dplyr::select(starts_with("est_")) %>%
dplyr::select(-all_of(paste0("est_", target_name)))
colnames(x_full) <- gsub("^est_", "", colnames(x_full))
x_full <- as.matrix(x_full)
if (ncol(x_full) < 2) {
stop("Not enough candidate sources after preprocessing.")
}
# --- 4. COMBINATORIAL SEARCH ---
results_list <- list()
source_sizes <- 2:min(4, ncol(x_full))
for (m in source_sizes) {
cat("Testing all", m, "source combinations...\n")
combos <- combn(colnames(x_full), m)
for (i in seq_len(ncol(combos))) {
curr_names <- combos[, i]
x_sub <- x_full[, curr_names, drop = FALSE]
fit <- solve_weights(y, x_sub, target_se)
if (!is.null(fit)) {
w <- fit$weights
keep <- fit$keep
y_use <- y[keep]
se_use <- target_se[keep]
x_use <- x_sub[keep, , drop = FALSE]
stats <- evaluate_model(y_use, x_use, se_use, w)
results_list[[length(results_list) + 1]] <- tibble::tibble(
n_sources = length(w),
model = paste(curr_names, collapse = " + "),
chi_sq = stats$chi_sq,
df = stats$df,
rss = stats$rss,
max_abs_z = stats$max_abs_z,
p_value = stats$p_value,
weights = paste(
paste0(curr_names, "=", round(w * 100, 1), "%"),
collapse = "\n "
)
)
}
}
}
if (length(results_list) == 0) {
stop("No valid models were found.")
}
# --- 5. OUTPUT ---
final_table <- dplyr::bind_rows(results_list) %>%
dplyr::mutate(
status = dplyr::case_when(
is.na(p_value) ~ "NA",
p_value > 0.05 ~ "PASS",
TRUE ~ "FAIL"
)
) %>%
dplyr::arrange(dplyr::desc(p_value), chi_sq)
top20 <- final_table %>%
dplyr::slice_head(n = 20)
cat("\n--- TOP 20 MODELS (RANKED BY APPROXIMATE P-VALUE) ---\n\n")
for (i in seq_len(nrow(top20))) {
cat(sprintf("[%d]\n", i))
cat("model: ", top20$model[i], "\n", sep = "")
cat("weights:\n", top20$weights[i], "\n", sep = "")
cat(sprintf("p_value: %.6g\n", top20$p_value[i]))
cat(sprintf("chi_sq: %.6g\n", top20$chi_sq[i]))
cat(sprintf("df: %s\n", top20$df[i]))
cat(sprintf("max_abs_z: %.6g\n", top20$max_abs_z[i]))
cat("status: ", top20$status[i], "\n\n", sep = "")
}
A note on allsnps = FALSE
ADMIXTOOLS 2 documents that, when computing f4-statistics directly from genotype data, different f4 entries may otherwise be based on different SNP subsets if some data are missing. Setting allsnps = FALSE forces a common SNP set across the populations involved in the calculation, which is attractive here because the script is treating the f4 values as coordinates in one shared basis. The trade-off is that SNP counts can drop, sometimes a lot, when many populations are included.
Interpreting the output
The script returns a ranked table of source combinations and their fitted weights. The most useful columns are:
model, the source combinationweights, the fitted ancestry proportions under the simplex constraintp_value, an approximate fit score under the script’s diagonal weighting schememax_abs_z, the largest standardised residual, showing the biggest unexplained deviation left by the model. Values near or above 3 are worth inspecting, because they indicate that at least one f4 entry is misfit by roughly three standard errorsstatus, a rough pass/fail flag
Because the script does not use the full covariance structure among f4-statistics, its p-values are usually more optimistic than formal qpAdm p-values. They are still useful for ranking and screening, but they should not be treated as the final arbiters of model validity.
Example result
In my Yamnaya example, the top model was:
[1]
model: Turkey_Cayonu_PPN + Georgia_KotiasKlde_Mesolithic + Russia_Vologda_Mesolithic
weights:
Turkey_Cayonu_PPN=19%
Georgia_KotiasKlde_Mesolithic=25.4%
Russia_Vologda_Mesolithic=55.7%
p_value: 0.980515
chi_sq: 0.181524
df: 3
max_abs_z: 0.34791
status: PASS
with fitted weights of roughly:
- 55.7%
Russia_Vologda_Mesolithic - 25.4%
Georgia_KotiasKlde_Mesolithic - 19%
Turkey_Cayonu_PPN
That is exactly the kind of result this script is meant to surface: a small number of source combinations that look geometrically plausible in f4 space and are worth testing more formally.
When I ran the same model in qpAdm, I got:
target left weight se z
<chr> <chr> <dbl> <dbl> <dbl>
1 Russia_Samara_EBA_Yamnaya Turkey_Cayonu_PPN 0.180 0.0733 2.46
2 Russia_Samara_EBA_Yamnaya Georgia_KotiasKlde_Mesolithic 0.287 0.0651 4.40
3 Russia_Samara_EBA_Yamnaya Russia_Vologda_Mesolithic 0.533 0.0361 14.8
$rankdrop
# A tibble: 3 × 7
f4rank dof chisq p dofdiff chisqdiff p_nested
<int> <int> <dbl> <dbl> <int> <dbl> <dbl>
1 2 4 11.2 2.45e- 2 6 61.2 2.55e-11
2 1 10 72.4 1.51e-11 8 359. 1.09e-72
3 0 18 431. 2.53e-80 NA NA NA
The fitted proportions are fairly close between the two approaches, which suggests that the screening script is capturing a real ancestry direction in f4 space. However, this qpAdm model does not formally pass: the rankdrop p-value for the 3-way model is 0.0245, below the conventional 0.05 threshold. So this source combination should be treated as a useful screening hit rather than as a validated admixture model.
One possible reason is that qpAdm can become overly conservative when the right set is larger than necessary or contains redundant reference populations. Harney et al. specifically caution that too many right populations can bias qpAdm p-values downward and lead to rejection of otherwise plausible models. In practice, this means the model may be worth re-evaluating with a thinner, more informative right set, chosen on methodological grounds rather than tuned ad hoc to force a pass.
In a later sensitivity check, thinning the right set (outgroup + references) by removing
Iraq_PPNAandIran_BeltCave_Mesolithicmade the same 3-source qpAdm model marginally pass (p = 0.0606) underallsnps = FALSE, reinforcing the point that model fit is conditional on right-population choice. RemovingMbutiincreased the fit further (p = 0.279).