The origins of the Proto-Anatolians are often treated as one of the more obscure problems, but the genetic data may be not that ambigous. Anatolian is regarded as the earliest-splitting branch of “Indo-European”, and its divergence is deep enough that some linguists distinguish a pre–Proto-Indo-European stage, sometimes called “Indo-Anatolian”, from the Proto-Indo-European reconstructed from the non-Anatolian branches. Under either framing, the relevant question is the same: whether the earlier Eneolithic steppe-related ancestry behind Yamnaya, particularly the Caucasus–Lower Volga (CLV) component, also moved south of the Caucasus into Anatolia. For this purpose, I use Progress-2 specifically as proxy for the north Caucasus-facing part of this Eneolithic steppe-related ancestry, since it sits directly at the northern end of the Caucasus and therefore serves as a good proxy for groups that may have passed through the region.

Languages are obviously not genetics, and ancient DNA does not identify speech communities by itself. But this caveat should not become a license to ignore genetic evidence whenever it points in a direction one does not prefer. A repeated, statistically supported ancestry pattern along a coherent geographic route is not proof of language, but it is evidence about movement and population formation. If competing scenarios are allowed to rest on linguistic reconstruction, archaeological interpretation, and historical inference, then genetic evidence should not be excluded. If anything, it is more grounded than any of these.

In this post, I will show possible evidence for an eastern route into Anatolia, through or around the Caucasus.


The Caucasus Route and Arrival from the East

To test whether Eneolithic steppe-related ancestry appeared in the Southern Caucasus by the Chalcolithic, I will begin with f4f_4-statistics in the following arrangement:

f4(Outgroup,Progress-2;Neolithic baseline,Target) f_4(\text{Outgroup}, \text{Progress-2}; \text{Neolithic baseline}, \text{Target})

This test asks whether the target shares more alleles with Progress-2 than the Neolithic baseline does. A positive result would mean that the target is shifted toward Eneolithic north Caucasus steppe-related ancestry relative to that baseline.

A significantly positive result in this arrangement would therefore suggest that the target cannot be explained as simply lying on the Southern Caucasus Neolithic cline, but instead carries additional affinity to Eneolithic steppe-related ancestry. This provides a first test of whether ancestry related to groups north of the Caucasus had already entered the Southern Caucasus by the Chalcolithic.


With Areni-1 Chalcolithic in the target position

As a first example, I place Areni-1 Chalcolithic in the target position:

Targetf4f_4SEZP
Areni-1 Chalcolithic0.002190.0005414.055.03e-5

This indicates that Areni-1 Chalcolithic shares significantly more alleles with Progress-2 than Mentesh Tepe Neolithic (6000-4000 BCE) does. So, it does not behave as a simple continuation of the Southern Caucasus Neolithic baseline, but instead shows excess affinity to Eneolithic steppe-related ancestry from north of the Caucasus.

This supports the first step of the eastern-route argument: before turning to Anatolia itself, we can already observe a detectable movement, or at least possible gene flow, linking Eneolithic steppe-related groups north of the Caucasus with populations on its southern side by the Chalcolithic.


With Arslantepe Late Chalcolithic in the target position

As the next step, I place Arslantepe Late Chalcolithic (average of the samples: ART020, ART027, ART017) in the target position. Arslantepe is relevant here because it lies in eastern Anatolia, to the west of the Caucasus, and because a Late Chalcolithic individual (ART038) from the site carries Y-DNA haplogroup R1b-V1636. This is the same paternal lineage found among the men buried in the kurgans at Progress-2, the Eneolithic steppe-related population used here as the northern Caucasus-facing reference. Arslantepe is therefore an obvious test case for whether the ancestry affinity seen south of the Caucasus also becomes detectable in the Upper Euphrates region.

First, using Çayönü as an eastern Anatolian Neolithic baseline:

f4(Ju_hoan_North,Progress-2;C¸ayo¨nu¨ Neolithic,Arslantepe Late Chalcolithic) f_4(\text{Ju\_hoan\_North}, \text{Progress-2}; \text{Çayönü Neolithic}, \text{Arslantepe Late Chalcolithic})
BaselineTargetf4f_4SEZP
Çayönü NeolithicArslantepe Late Chalcolithic0.001600.0003904.113.94e-5

This result is clearly positive. Arslantepe Late Chalcolithic shares significantly more alleles with the Eneolithic steppe-related source than Çayönü Neolithic does, suggesting that it cannot be modeled as a simple continuation of the local eastern Anatolian Neolithic baseline.

These results should not be read as requiring a simple direct migration from Areni-1 to Arslantepe. Rather, Areni-1 shows that Progress-2-related ancestry was already present south of the Caucasus by the Chalcolithic. Arslantepe then shows that a related signal also appears farther west in eastern Anatolia. The point is therefore not that the same population moved step by step from Areni into Anatolia.


After demonstrating the gradual appearance of steppe-related allele frequencies along a Caucasus route, I will now try to formally quantify Eneolithic steppe-related ancestry with qpAdm in several northern Near Eastern groups relevant to this question.

My preference is to treat qpAdm as a convex ancestry-modeling problem. In this setup, the left populations are the proposed ancestry sources, while the right populations serve as external anchors that are differentially related to those sources. I’ll therefore prefer a well-constrained and stable right-population setup over heavy reliance on rotation, in which potential sources are gradually shifted into the right populations. Rotation is often defended as a stress test, and used carefully it can be exactly that. In practice, however, this can slide into an artificial optimisation, where increasingly close or related populations to other sources are moved to the right side until a preferred model becomes feasible. In my view, a carefully chosen right set should test the model rather than be tuned until the model passes.

For the following models, I use this right-population set:

Ju_hoan_North
Iraq_PPNA
Georgia_KotiasKlde_Mesolithic
Russia_Vologda_Mesolithic
Switzerland_Epipaleolithic
Tajikistan_Mesolithic
Turkey_Epipaleolithic
Israel_Natufian
Iran_BeltCave_Mesolithic

In keeping with the convex framing, all of these right populations temporally precede the sources used on the left, but only immediately so rather than by a wide margin, which is what lets them function as informative anchors instead of distant outgroups.


Arslantepe Late Chalcolithic

For Arslantepe Late Chalcolithic, I model the target as a two-way mixture of Çayönü PPN and Progress-2 Eneolithic Steppe.

The model is accepted with a good fit:

TargetSourceWeightSEZ
Arslantepe Late ChalcolithicÇayönü PPN0.8660.025534.0
Arslantepe Late ChalcolithicProgress-2 Eneolithic Steppe0.1340.02555.25
Modelf4f_4 rankdofchisqP
Çayönü PPN + Progress-2 Eneolithic Steppe176.260.510

The model estimates Arslantepe Late Chalcolithic as approximately 86.6% Çayönü PPN-related and 13.4% Progress-2 Eneolithic Steppe-related, with the steppe-related component being clearly significant.

The popdrop results are also informative:

Dropped sourcedofchisqPInterpretation
None76.260.510Full model accepted
Progress-2 Eneolithic Steppe833.64.81e-5Steppe source required
Çayönü PPN86727.87e-140Local Anatolian source required

When Progress-2 Eneolithic Steppe is removed, the model fails, which shows that the steppe-related source is not simply decorative but necessary for the fit. At the same time, the overwhelming failure after removing Çayönü PPN confirms that most of the ancestry remains local eastern Anatolian-related.

This result aligns with the earlierf4f_4-statistic. Arslantepe Late Chalcolithic carries a mostly local eastern Anatolian ancestry profile, but with a significant Eneolithic steppe-related contribution. In quantitative terms, this contribution is modest, around 13%, but it is statistically required.


Arslantepe38

The same model can also be applied to ART038, the R1b-V1636 individual from the Arslantepe Royal Tomb:

TargetSourceWeightSEZ
ART038Çayönü PPN0.8850.046019.2
ART038Progress-2 Eneolithic Steppe0.1150.04602.49
Modelf4 rankdofchisqP
Çayönü PPN + Progress-2 Eneolithic Steppe174.480.723
Dropped sourcedofchisqPInterpretation
None74.480.723Full model accepted
Progress-2 Eneolithic Steppe812.10.148Çayönü-only model still accepted
Çayönü PPN83631.24e-73Local Anatolian source required

The model estimates ART038 as about 88.5% Çayönü PPN-related and 11.5% Progress-2 Eneolithic Steppe-related. The steppe-related component approaches significance, with a Z-score of 2.49, and the addition of this source improves the fit strongly (p=0.723p=0.723), lowering the chisq from 12.1 in the Çayönü-only model to 4.48 in the two-way model.

At the same time, the steppe source is not strictly required here, since the Çayönü-only model remains formally acceptable with p=0.148p=0.148. This result should be regarded as tentative, especially because this is a single ancient individual rather than a population average, making the result more prone to ploidy-related artifacts, coverage issues, and individual-level variation. Still, given the improved fit, the direction of the estimate, and the paternal link to Progress-2 through R1b-V1636, including Eneolithic steppe-related ancestry in the model is reasonable, though not required in this individual case.


Tilbeşar Höyük (Gaziantep) Bronze Age, I14649

A further noteworthy sample is I14649 from Bronze Age Tilbeşar Höyük, represented here by the Gaziantep Bronze Age label. The site lies roughly 50 km west of Carchemish, the later Neo-Hittite capital. Historically, this sample predates written evidence for Anatolian speakers in the region, so it cannot be treated as linguistically identifiable in any direct sense.

What makes this individual interesting is the apparent mobility of R1b-V1636-bearing groups around the time of, and shortly after, their appearance at Late Chalcolithic Arslantepe. By the Bronze Age, the same paternal lineage is found farther southwest at Tilbeşar Höyük, showing that this lineage was not confined to the Upper Euphrates zone. Rather than requiring a simple movement directly from Arslantepe to Tilbeşar, this may reflect a broader dispersal of related groups across central and southeastern Anatolia, including the northern Levantine frontier zone, a region that later also becomes relevant for Luwian-speaking groups.

TargetSourceWeightSEZ
Turkey Southeast Gaziantep BATurkey Central Tepecik Ciftlik Neolithic0.8320.062213.4
Turkey Southeast Gaziantep BAProgress-2 Eneolithic Steppe0.1680.06222.69
Modelf4 rankdofchisqP
Turkey Central Tepecik Ciftlik Neolithic + Russia_Eneolithic_Steppe173.650.820

This gives an estimate of roughly 83.2% Tepecik-Çiftlik Neolithic-related and 16.8% Progress-2 Eneolithic Steppe-related ancestry. The model passes comfortably with p=0.820p=0.820, and the steppe-related component is significant with Z=2.69Z=2.69.

Removing the steppe component reduces the model fit to p=0.218p=0.218, so while it is not strictly required with these right populations, it still notably improves the fit.


Kalehöyük Old Hittite Period

The same two-way model can also be applied to Old Hittite Period Kalehöyük:

TargetSourceWeightSEZ
Kalehöyük Old Hittite PeriodÇayönü PPN0.8170.031526.0
Kalehöyük Old Hittite PeriodProgress-2 Eneolithic Steppe0.1830.03155.81
Modelf4f_4 rankdofchisqP
Çayönü PPN + Progress-2 Eneolithic Steppe173.080.878
Dropped sourcedofchisqPInterpretation
None73.080.878Full model accepted
Progress-2 Eneolithic Steppe836.71.32e-5Steppe source required
Çayönü PPN85245.50e-108Local Anatolian source required

For the Old Hittite period average, the estimate rises to about 18.3% Progress-2 Eneolithic Steppe-related ancestry, and the fit is very strong with p=0.878p=0.878. Again, the steppe-related source is required, since removing it causes the model to fail.

The Old Hittite period fit is especially noteworthy. A model using a rather eastern Anatolian Neolithic source such as Çayönü PPN might not be the first expectation for central Anatolia, yet it produces a good fit, given that the right-population set is fairly constrained.

Replacing Çayönü PPN with a more central Anatolian Neolithic source, Tepecik-Çiftlik, does not improve the situation. In fact, the two-way model with Tepecik-Çiftlik and Eneolithic Steppe fails with p=0.00134p=0.00134, even though the estimated steppe-related ancestry remains around 12.2%. This makes the stronger Çayönü-based fit more notable.


Summary

TargetSteppe-related estimateZP-Value
Arslantepe LC13.4%5.250.510
ART03811.5%2.490.723
Tilbeşar BA I1464916.8%2.690.820
Kalehöyük Old Hittite18.3%5.810.878

F4 PCA of Relevant Populations

F4 PCA of relevant ancient Anatolian, Caucasus, Levantine, Mesopotamian, and steppe populations

As a visual addition, the relevant Anatolian samples used above fall within the broader Anatolian Chalcolithic and Bronze Age cluster. They are also noticeably separated from the Kura-Araxes-related groups. A PCA is obviously only useful for showing broad affinities along the variance-maximizing components of the f4f_4-statistics, not for proving a specific ancestry model. Still, the broad pattern is clear: the Anatolian samples used here fall within the Anatolian Chalcolithic and Bronze Age cluster and remain noticeably separated from the Kura-Araxes-related groups. Explaining the Progress-2-related ancestry therefore does not seem to require mass Kura-Araxes migration into Anatolia, as someone might object.

This separation is also supported by a direct f4f_4-test. f4(Chimp,Turkey_Cayonu_PPN;Armenia_DzhoghazBerkaber_EBA_KuraAraxes,Arslantepe Late Chalcolithic)f_4(\text{Chimp}, \text{Turkey\_Cayonu\_PPN}; \text{Armenia\_DzhoghazBerkaber\_EBA\_KuraAraxes}, \text{Arslantepe Late Chalcolithic}) is effectively zero and non-significant (f4=0.000029, Z=0.09, p=0.931f_4=-0.000029,\ Z=-0.09,\ p=0.931).

Thus, Arslantepe does not show detectable excess affinity to Kura-Araxes groups relative to the local eastern Anatolian Çayönü baseline, while the same Arslantepe average does show excess affinity to Progress-2 in the earlier test, so the signal appears to be steppe-related rather than simply Kura-Araxes-related.


Conclusion

These results point to a consistent pattern. Eneolithic steppe-related ancestry is first detectable south of the Caucasus and in eastern Anatolia, and later remains visible in several Chalcolithic and Bronze Age Anatolian contexts. The contribution is not large, and in some cases it is clearly diluted, but it is repeatedly detectable and often statistically required. Nor does the contribution need to be large.

If early Proto-Anatolian speakers first existed as a subculture within a mostly local Anatolian environment, rather than as a mass population already spreading across Anatolia, then even a modest ancestry component could be historically meaningful. This is worth keeping in mind, especially against exaggerated maps of Luwian or Anatolian-speaking territory that project much later distributions too far back, often with an excessive focus on western and southwestern Anatolia.

This does not mean that every individual carrying such ancestry was necessarily an Anatolian speaker, nor that a simple two-way qpAdm model is the final word on the ancestry of these populations. More intermediate sources and more regionally specific models may improve individual fits in some cases. A separate supporting clue comes from IBD evidence: Ovaören MA2213 has been reported to share a 15.2 cM segment with Vonyuchka-1 from the North Caucasus steppe, pointing to a genealogical connection across the same broad northern Caucasus and Anatolian interaction zone.

The eastern route is supported by a trail of genetic evidence moving from the Eneolithic steppe and northern Caucasus zone, through the Southern Caucasus, and into Anatolia.

In opposition, the Balkan route remains harder to reconcile with the genetic evidence. It would require the relevant ancestry to enter Anatolia from the west or northwest, yet the clearest indications discussed here appear first in the Southern Caucasus, eastern Anatolia, and later central and southeastern Anatolia.