The origins of the Proto-Anatolians are often treated as one of the more obscure problems in Indo-European archaeogenetics, but the genetic data may be less ambiguous than this framing suggests. Since Anatolian is widely regarded as the earliest-splitting branch of Indo-European, the relevant question is whether the earlier Eneolithic steppe-related ancestry behind Yamnaya, especially the Caucasus-Lower Volga or CLV component, also moved south of the Caucasus and into Anatolia. For this purpose, I use Progress-2 specifically as a practical proxy for the north Caucasus-facing part of this Eneolithic steppe-related ancestry.
Languages are obviously not genetics, and ancient DNA does not identify speech communities by itself. But this caveat should not become a license to ignore demographic evidence whenever it points in an inconvenient direction. A repeated, statistically supported ancestry signal along a coherent geographic route is not proof of language, but it is evidence about movement and population formation. If competing scenarios are allowed to rest on linguistic reconstruction, archaeological interpretation, and historical inference, then genetic evidence should not be excluded simply because it is probabilistic rather than deductive.
In this post, I will show possible evidence for an eastern route into Anatolia, through or around the Caucasus.
The Caucasus Route and Arrival from the East
To test whether Eneolithic steppe-related ancestry appeared in the Southern Caucasus by the Chalcolithic, I will begin with -statistics in the following arrangement:
This test asks whether the target shares more alleles with Progress-2 than the Neolithic baseline does. A positive result would mean that the target is shifted toward Eneolithic north Caucasus steppe-related ancestry relative to that baseline.
A significantly positive result would therefore suggest that the target cannot be explained as simply lying on the Southern Caucasus Neolithic cline, but instead carries additional affinity to Eneolithic steppe-related ancestry. This provides a first test of whether ancestry related to groups north of the Caucasus had already entered the Southern Caucasus by the Chalcolithic.
With Areni-1 Chalcolithic in the target position
As a first example, I place Areni-1 Chalcolithic in the target position:
| Target | SE | Z | P | |
|---|---|---|---|---|
| Areni-1 Chalcolithic | 0.00219 | 0.000541 | 4.05 | 5.03e-5 |
This indicates that Areni-1 Chalcolithic shares significantly more alleles with Progress-2 than Mentesh Tepe Neolithic does. So, it does not behave as a simple continuation of the Southern Caucasus Neolithic baseline, but instead shows excess affinity to Eneolithic steppe-related ancestry from north of the Caucasus.
The point here is that this ancestry signal is already visible south of the Caucasus by the Chalcolithic. This supports the first step of the eastern-route argument: before turning to Anatolia itself, we can already observe a detectable movement, or at least possible gene flow, linking Eneolithic steppe-related groups north of the Caucasus with populations on its southern side.
With Arslantepe Late Chalcolithic in the target position
As the next step, I place Arslantepe Late Chalcolithic in the target position. Arslantepe is relevant here because it lies in eastern Anatolia, to the west of the Caucasus, and because a Late Chalcolithic individual (ART038) from the site carries Y-DNA haplogroup R1b-V1636. This is the same paternal lineage found among the men buried in the kurgans at Progress-2, the Eneolithic steppe-related population used here as the northern Caucasus-facing reference. Arslantepe is therefore an obvious test case for whether the ancestry signal seen south of the Caucasus also becomes detectable in the Upper Euphrates region.
First, using Çayönü as an eastern Anatolian Neolithic baseline:
| Baseline | Target | SE | Z | P | |
|---|---|---|---|---|---|
| Çayönü Neolithic | Arslantepe Late Chalcolithic | 0.00160 | 0.000390 | 4.11 | 3.94e-5 |
This result is clearly positive. Arslantepe Late Chalcolithic shares significantly more alleles with the Eneolithic steppe-related source than Çayönü Neolithic does, suggesting that it cannot be modeled as a simple continuation of the local eastern Anatolian Neolithic baseline.
The same test can also be repeated against the Southern Caucasus Neolithic baseline used above:
| Baseline | Target | SE | Z | P | |
|---|---|---|---|---|---|
| Mentesh Tepe Neolithic | Arslantepe Late Chalcolithic | 0.00132 | 0.000579 | 2.28 | 0.0223 |
Here the signal is weaker, but it still points in the same direction. Arslantepe Late Chalcolithic shows more affinity to Eneolithic steppe-related ancestry than the Southern Caucasus Neolithic baseline, although the result is only suggestive rather than strongly significant.
Taken together, these two tests are important because they move the signal from the Southern Caucasus into eastern Anatolia. Against the local eastern Anatolian Neolithic baseline, the excess is clear; against the Southern Caucasus Neolithic baseline, it is more modest but still positive. This fits the expectation of an eastern route, where steppe-related ancestry first appears south of the Caucasus and then becomes detectable in eastern Anatolia during the Late Chalcolithic, potentially already in a gradually diluted form.
Quantification of Eneolithic Steppe-related ancestry in Anatolia
After demonstrating the gradual appearance of steppe-related allele frequencies along a Caucasus route, I will now try to formally quantify Eneolithic steppe-related ancestry with qpAdm in several northern Near Eastern groups relevant to this question.
My preference is to treat qpAdm as a convex ancestry-modeling problem. In this setup, the left populations are the proposed ancestry sources, while the right populations serve as external anchors that are differentially related to those sources. I therefore prefer a well-constrained and stable right-population setup over qpAdm rotation strategies, where increasingly close or related populations are moved to the right side in search of a better fit. In my view, a carefully chosen right set should test the model rather than optimize it.
For the following models, I use this right-population set:
Ju_hoan_North
Iraq_PPNA
Georgia_KotiasKlde_Mesolithic
Russia_Vologda_Mesolithic
Switzerland_Epipaleolithic
Tajikistan_Mesolithic
Turkey_Epipaleolithic
Israel_Natufian
Iran_BeltCave_Mesolithic
Arslantepe Late Chalcolithic
For Arslantepe Late Chalcolithic, I model the target as a two-way mixture of Çayönü PPN and Progress-2 Eneolithic Steppe.
The model is accepted with a good fit:
| Target | Source | Weight | SE | Z |
|---|---|---|---|---|
| Arslantepe Late Chalcolithic | Çayönü PPN | 0.866 | 0.0255 | 34.0 |
| Arslantepe Late Chalcolithic | Progress-2 Eneolithic Steppe | 0.134 | 0.0255 | 5.25 |
| Model | rank | dof | chisq | P |
|---|---|---|---|---|
| Çayönü PPN + Progress-2 Eneolithic Steppe | 1 | 7 | 6.26 | 0.510 |
The model estimates Arslantepe Late Chalcolithic as approximately 86.6% Çayönü PPN-related and 13.4% Progress-2 Eneolithic Steppe-related, with the steppe-related component being clearly significant.
The popdrop results are also informative:
| Dropped source | dof | chisq | P | Interpretation |
|---|---|---|---|---|
| None | 7 | 6.26 | 0.510 | Full model accepted |
| Progress-2 Eneolithic Steppe | 8 | 33.6 | 4.81e-5 | Steppe source required |
| Çayönü PPN | 8 | 672 | 7.87e-140 | Local Anatolian source required |
When Progress-2 Eneolithic Steppe is removed, the model fails, which shows that the steppe-related source is not simply decorative but necessary for the fit. At the same time, the overwhelming failure after removing Çayönü PPN confirms that most of the ancestry remains local eastern Anatolian-related.
This result matches the earlier -statistics well. Arslantepe Late Chalcolithic carries a mostly local eastern Anatolian ancestry profile, but with a significant Eneolithic steppe-related contribution. In quantitative terms, this contribution is modest, around 13%, but it is statistically required and fits the pattern expected from gradual dilution along an eastern route into Anatolia.
Arslantepe38, Royal Tomb
The same model can also be applied to ART038, the R1b-V1636 individual from the Arslantepe Royal Tomb:
| Target | Source | Weight | SE | Z |
|---|---|---|---|---|
| ART038 | Çayönü PPN | 0.885 | 0.0460 | 19.2 |
| ART038 | Progress-2 Eneolithic Steppe | 0.115 | 0.0460 | 2.49 |
| Model | f4 rank | dof | chisq | P |
|---|---|---|---|---|
| Çayönü PPN + Progress-2 Eneolithic Steppe | 1 | 7 | 4.48 | 0.723 |
| Dropped source | dof | chisq | P | Interpretation |
|---|---|---|---|---|
| None | 7 | 4.48 | 0.723 | Full model accepted |
| Progress-2 Eneolithic Steppe | 8 | 12.1 | 0.148 | Çayönü-only model still accepted |
| Çayönü PPN | 8 | 363 | 1.24e-73 | Local Anatolian source required |
The model estimates ART038 as about 88.5% Çayönü PPN-related and 11.5% Progress-2 Eneolithic Steppe-related. The steppe-related component approaches significance, with a Z-score of 2.49, and the addition of this source improves the fit strongly, lowering the chisq from 12.1 in the Çayönü-only model to 4.48 in the two-way model.
At the same time, the steppe source is not strictly required here, since the Çayönü-only model remains formally acceptable with . This should be interpreted cautiously, especially because this is a single individual rather than a population average, making the result more vulnerable to quality-related issues and individual-level noise. Still, given the improved fit, the direction of the estimate, and the paternal link to Progress-2 through R1b-V1636, including Eneolithic steppe-related ancestry in the model is reasonable, though not required in this individual case.
Tilbeşar Höyük (Gaziantep) Bronze Age, I14649
A further relevant case is I14649 from Bronze Age Tilbeşar Höyük, a R1b-V1636 individual from the Gaziantep region. The site lies roughly 50 km west of Carchemish, the later Neo-Hittite capital. Historically, this sample predates written evidence for Anatolian speakers in the region, so it cannot be treated as linguistically identifiable in any direct sense.
What makes this individual interesting is the apparent mobility of R1b-V1636-bearing groups only a few centuries after its appearance at Late Chalcolithic Arslantepe. By the Bronze Age, the same paternal lineage is found farther southwest at Tilbeşar Höyük, suggesting that the movement did not end at Arslantepe. Instead, it may reflect a broader movement of people, or at least male-mediated ancestry, from the Anatolian Upper Euphrates region into southeastern Anatolia and the northern Levantine frontier zone, a region that later also becomes relevant for Luwian-speaking groups.
| Target | Source | Weight | SE | Z |
|---|---|---|---|---|
| Tilbeşar Höyük BA, I14649 | Çayönü PPN | 0.852 | 0.0534 | 16.0 |
| Tilbeşar Höyük BA, I14649 | Progress-2 Eneolithic Steppe | 0.148 | 0.0534 | 2.77 |
| Model | rank | dof | chisq | P |
|---|---|---|---|---|
| Çayönü PPN + Progress2 Eneolithic Steppe | 1 | 7 | 9.47 | 0.221 |
This gives an estimate of roughly 85.2% Çayönü PPN-related and 14.8% Progress-2 Eneolithic Steppe-related ancestry. The model passes with , and the steppe-related component is significant with .
I do not claim that this is necessarily the best or most realistic model for this individual. The purpose here is more limited: even with a constrained right-population setup, the model passes and detects a meaningful Eneolithic steppe-related component. It is also notable that a single-source Çayönü model does not pass, making the addition of a steppe-related source difficult to dismiss in this specific test.
Oylum Höyük Middle Bronze Age
A further test can be made with the Middle Bronze Age average from Oylum Höyük. This is useful because it moves the analysis beyond single individuals and asks whether a similar signal is also visible at the population level in southeastern Anatolia.
| Target | Source | Weight | SE | Z |
|---|---|---|---|---|
| Oylum Höyük MBA | Çayönü PPN | 0.912 | 0.0251 | 36.3 |
| Oylum Höyük MBA | Progress-2 Eneolithic Steppe | 0.0878 | 0.0251 | 3.49 |
| Model | rank | dof | chisq | P |
|---|---|---|---|---|
| Çayönü PPN + Progress-2 Eneolithic Steppe | 1 | 7 | 11.1 | 0.135 |
| Dropped source | dof | chisq | P | Interpretation |
|---|---|---|---|---|
| None | 7 | 11.1 | 0.135 | Full model accepted |
| Progress-2 Eneolithic Steppe | 8 | 23.5 | 0.00275 | Steppe source required |
| Çayönü PPN | 8 | 815 | 1.21e-170 | Local Anatolian source required |
The model estimates Oylum Höyük Middle Bronze Age as approximately 91.2% Çayönü PPN-related and 8.8% Eneolithic Steppe-related. It passes with , while the steppe-related component is statistically significant with .
As before, I do not claim that this is necessarily the best model for Oylum Höyük MBA. There may be better alternatives using more intermediate populations closer in time and geography.
It should also be noted that Çayönü PPN is not being used here as a generic Mesopotamian Neolithic source, like Shanidar PPNB. Çayönü is closer to the broader Chalcolithic and Bronze Age Anatolian and Levantine profile, which makes it a more appropriate regional baseline for this test.
Kalehöyük, Kārum and Old Hittite Periods
The same two-way model can also be applied to Kalehöyük, first in the Kārum period and then in the Old Hittite period. This is especially relevant because the Old Hittite period, roughly 1750 to 1500 BCE, now falls within a historically Anatolian-speaking context.
| Target | Source | Weight | SE | Z |
|---|---|---|---|---|
| Kalehöyük Kārum Period | Çayönü PPN | 0.872 | 0.0304 | 28.6 |
| Kalehöyük Kārum Period | Progress-2 Eneolithic Steppe | 0.128 | 0.0304 | 4.20 |
| Model | rank | dof | chisq | P |
|---|---|---|---|---|
| Çayönü PPN + Progress-2 Eneolithic Steppe | 1 | 7 | 7.92 | 0.340 |
| Dropped source | dof | chisq | P | Interpretation |
|---|---|---|---|---|
| None | 7 | 7.92 | 0.340 | Full model accepted |
| Progress-2 Eneolithic Steppe | 8 | 25.7 | 0.00119 | Steppe source required |
| Çayönü PPN | 8 | 555 | 1.10e-114 | Local Anatolian source required |
For the Kārum period average, the model estimates about 87.2% Çayönü PPN-related and 12.8% Progress-2 Eneolithic Steppe-related ancestry. The model passes with , and the steppe-related component is significant with . Removing the steppe source causes the model to fail.
| Target | Source | Weight | SE | Z |
|---|---|---|---|---|
| Kalehöyük Old Hittite Period | Çayönü PPN | 0.817 | 0.0315 | 26.0 |
| Kalehöyük Old Hittite Period | Progress-2 Eneolithic Steppe | 0.183 | 0.0315 | 5.81 |
| Model | rank | dof | chisq | P |
|---|---|---|---|---|
| Çayönü PPN + Progress-2 Eneolithic Steppe | 1 | 7 | 3.08 | 0.878 |
| Dropped source | dof | chisq | P | Interpretation |
|---|---|---|---|---|
| None | 7 | 3.08 | 0.878 | Full model accepted |
| Progress-2 Eneolithic Steppe | 8 | 36.7 | 1.32e-5 | Steppe source required |
| Çayönü PPN | 8 | 524 | 5.50e-108 | Local Anatolian source required |
For the Old Hittite period average, the estimate rises to about 18.3% Progress-2 Eneolithic Steppe-related ancestry, and the fit is very strong with . Again, the steppe-related source is required, since removing it causes the model to fail.
The Old Hittite period fit is especially noteworthy. A model using a rather eastern Anatolian Neolithic source such as Çayönü PPN might not be the first expectation for central Anatolia, yet it produces a good fit, especially given that the right-population set is fairly constrained.
Replacing Çayönü PPN with a more central Anatolian Neolithic source, Tepecik-Çiftlik, does not improve the situation. In fact, the two-way model with Tepecik-Çiftlik and Eneolithic Steppe fails with , even though the estimated steppe-related ancestry remains around 12.2%. This makes the stronger Çayönü-based fit more notable.
Ovaören MA2213, Early Bronze Age II
One final central Anatolian case is Ovaören MA2213 from Early Bronze Age II. Interestingly, the full Ovaören average of three samples does not pass in this setup, but MA2213 individually does. This is also the Ovaören individual reported to share an IBD link of 15.2 cM with Vonyucka-1 in the North Caucasus steppes, making it especially useful for testing Steppe-related connections into Bronze Age Anatolia.
| Target | Source | Weight | SE | Z |
|---|---|---|---|---|
| Ovaören MA2213, EBA II | Çayönü PPN | 0.853 | 0.0355 | 24.0 |
| Ovaören MA2213, EBA II | Progress-2 Eneolithic Steppe | 0.147 | 0.0355 | 4.14 |
| Model | rank | dof | chisq | P |
|---|---|---|---|---|
| Çayönü PPN + Progress-2 Eneolithic Steppe | 1 | 7 | 6.42 | 0.491 |
| Dropped source | dof | chisq | P | Interpretation |
|---|---|---|---|---|
| None | 7 | 6.42 | 0.491 | Full model accepted |
| Progress-2 Eneolithic Steppe | 8 | 27.7 | 5.30e-4 | Steppe source required |
| Çayönü PPN | 8 | 486 | 6.43e-100 | Local Anatolian source required |
The model estimates MA2213 as roughly 85.3% Çayönü PPN-related and 14.7% Progress-2 Eneolithic Steppe-related. The model passes comfortably with p=0.491, and the steppe-related component is significant with . Removing the steppe source causes the model to fail.
Summary
| Target | Steppe-related estimate | Z | P-Value |
|---|---|---|---|
| Arslantepe LC | 13.4% | 5.25 | 0.510 |
| ART038 | 11.5% | 2.49 | 0.723 |
| Tilbeşar BA I14649 | 14.8% | 2.77 | 0.221 |
| Oylum MBA | 8.8% | 3.49 | 0.135 |
| Kalehöyük Kārum | 12.8% | 4.20 | 0.340 |
| Kalehöyük Old Hittite | 18.3% | 5.81 | 0.878 |
| Ovaören MA2213 | 14.7% | 4.14 | 0.491 |
Conclusion
Taken together, the results point to a consistent pattern. Eneolithic steppe-related ancestry is first detectable south of the Caucasus, then appears in eastern Anatolia, and later remains visible in several Chalcolithic and Bronze Age Anatolian contexts. The signal is not large, and in some cases it is clearly diluted, but it is repeatedly detectable and often statistically required. Nor should the signal necessarily be expected to be large. If early Proto-Anatolian speakers first existed as a subculture within a mostly local Anatolian environment, rather than as a mass population already spreading across Anatolia, then even a modest ancestry signal could be historically meaningful. This is worth keeping in mind, especially against exaggerated maps of Luwian or Anatolian-speaking territory that project much later distributions too far back, often with an excessive focus on western Anatolia.
This does not mean that every individual carrying such ancestry was necessarily an Anatolian speaker, nor that a simple two-way qpAdm model is the final word on the ancestry of these populations. More intermediate sources and more regionally specific models may improve individual fits in some cases. But the broader direction of the evidence is difficult to ignore: the eastern route is not merely a theoretical possibility. It is supported by a trail of genetic signals moving from the Eneolithic steppe and northern Caucasus zone, through the Southern Caucasus, and into Anatolia.
In opposition, the Balkan route remains harder to reconcile with this pattern. It would require the relevant ancestry to enter Anatolia from the west or northwest, yet the clearest signals discussed here appear first in the Southern Caucasus, eastern Anatolia, and later central and southeastern Anatolia.