SmartPCA Tutorial: How to Run PCA on Genetic Data (EIGENSOFT)

This post is a continuation of the previous one, where I demonstrated how to perform PCA with PLINK. While PLINK’s PCA is great for quick, exploratory analysis, smartpca (part of the EIGENSOFT toolset) is more commonly used in published genetic studies. Smartpca needs to be compiled on Linux or macOS. I covered how to install and prepare the toolchain on Linux in this earlier post: From EIGENSTRAT to PACKEDPED. As before, I’ll use a small subset. The focus here is on the technical process, not on interpreting the results. One key difference in this post is that I’ll perform Linkage Disequilibrium (LD) pruning, which helps reduce SNP redundancy and improves the detection of population structure in PCA. ...

July 30, 2025

Converting EIGENSTRAT to PACKEDPED

The files downloaded in the previous blog post are in EIGENSTRAT format. In this post, we’ll look at how to convert them to PACKEDPED format. PACKEDPED format allows for easier downstream processing using the PLINK toolset. With PLINK, it becomes straightforward to extract sample subsets, filter SNPs, and perform a wide range of analyses. Downloading PLINK I use PLINK 1.9. While there is a newer version (2.0), I prefer 1.9 because it includes several features that were deprecated or removed in the newer release. ...

July 29, 2025