Lines of Code Breakdown: A Compositional Analysis
Results of analyzing 26542872.0 million lines of code across the largest Open Source projects
There are a lot of tools that provide stats on lines of code (LoC). Conventional wisdom has long held that these metrics are fraught, but absent hard data, it has only been possible to gesture toward the disadvantages of relying on LoC, without statistical proof.
GitClear has previously asserted that only 5% of lines of code meaningfully evolve the repo's code base. Because it is an extraordinary claim that 95% of LoC is noise, it is beholden upon us to substantiate this claim with data. That is the purpose of this page.
The funnel below aggregates real world lines of code measurement across 264,117 commits in 0 open source repos between January 1, 2026 and March 31, 2026. On desktop, hover on a funnel step to get more details about it.
First step: All changed code lines
All changed code lines
Distinct commits
Distinct: Ignore duplicated fragments
Effecting
Effecting: Remove semantic lines
Substantive
Substantive: Negate batch operations
Purposeful
Purposeful: Rinse commit artifacts
💎
Result
Important code line changes
How much noise does your analysis tool let through?
Since other git stat tools (including those that profess to offer "Engineering Insights") neglect to process some or all of the steps above, the "insights" that they offer are as likely as not to be false positives or commit artifacts.
If you would like to extract the fractional lines of code that correspond to meaningful work by developers, consider signing up for a free GitClear trial, or a demo.