A Summary of Census II: Open Source Software Application Libraries the World Depends On
Jason Perlow | 07 March 2022
It has been estimated that Free and Open Source Software (FOSS) constitutes 70-90% of any given piece of modern software solutions. FOSS is an increasingly vital resource in nearly all industries, public and private sectors, among tech and non-tech companies alike. Therefore, ensuring the health and security of FOSS is critical to the future of nearly all industries in the modern economy.
In March of 2022, The Linux Foundation, in partnership with the Laboratory for Innovation Science at Harvard (LISH), released the final results of an ongoing study, “Census II of Free and Open Source Software – Application Libraries.” This follows the preliminary release, “Vulnerabilities in the Core,’ a Preliminary Report and Census II of Open Source Software” in February 2020 and now identifies more than one thousand of the most widely deployed open source application libraries found from scans of commercial and enterprise applications. This study informs what open source projects are commonly used in applications warrant proactive analysis of operations and security support.
The completed report from the Census II study identifies the most commonly used free and open source software (FOSS) components in production applications. It begins to examine the components’ open source communities, which can inform actions to sustain FOSS’s long-term security and health. The stated objectives were:
- Identify the most commonly used free and open source software components in production applications.
- Examine for potential vulnerabilities in these projects due to:
- Widespread use of outdated versions;
- Understaffed projects
- Use this information to prioritize investments and other resources needed to support the security and health of FOSS
What did the Linux Foundation and Harvard learn from the Census II study?
The study was the first to analyze the security risks of open source software used in production applications. It is in contrast to the earlier Census I study that primarily relied on Debian’s public repository package data and factors that would identify the profile of each package as a potential security risk.
To better understand the commonality, distribution, and usage of open source software within organizations, the study used software composition analysis (SCA) data supplied by Snyk, Synopsys, and FOSSA. SCA is the process of automating visibility into any software, and these tools are often used for risk management, security, and license compliance. SCA solution providers routinely scan codebases used by private and public sector organizations. The scans and audits provide a deep insight into what open source is being used in production applications.
With this data, the study created a baseline and unique identifiers for common packages and software components used by large organizations, which were then tied to a specific project. This baselining effort allowed the study to identify which packages and components were the most widely deployed.
Census II includes eight rankings of the 500 most used FOSS packages among those reported in the private usage data contributed by SCA partners. The analysis performed is based on 500,000 observations of FOSS usage in 2020.
These include different slices of the data based on versions, structure, and packaging system. For example, this research enables identification of the top 10 version-agnostic packages available on the npm package manager that were called directly in applications:
Other slices of the data examined in the study include versioned versus version agnostic, npm versus non-npm, direct versus indirect (and direct) packages. All eight top 500 lists are included in an open data repository on Data.World.
Observations and analysis of these specific metrics led the study to come to certain conclusions. These were:
- Software components need to be named in a standardized schema for security strategies to be effective. The study determined that a lack of naming conventions used by packages and components across repositories was highly inconsistent. Thus, any ongoing effort to create software security and transparency strategies without industry participation would have limited effect and slow such efforts.
- The complexities associated with package versioning. In addition to the need for standardized naming schema mentioned above, Software Bill of Materials (SBOM) guidance will need to reflect versioning information consistent with the public “main” repository for that package, rather than private repositories. Many of the versions that our data partners reported did not exist in the public repositories for those packages because developers maintained internal forks of the code.
- Developer accounts must be secured. The analysis of the software packages with the highest levels of usage found that many were hosted on individual (personal) developer accounts. Lax developer security practices have considerable implications for large organizations that use these software packages because they have fewer protections and less granularity of associated permissions. The OpenSSF encourages MFA tokens or organizational accounts to achieve greater account security.
- Legacy open source is pervasive in commercial solutions. Many production applications are being deployed that incorporate legacy open source packages. This prevalence of legacy packages is an issue as they are often no longer supported or maintained by the developers or have known security vulnerabilities. They often lack updates for known security issues both in their codebase or in the codebase of dependencies they require to operate.
- Apache log4j, version 1.x, for example, was ten times more prevalent than log4j 2.x (the version requiring recent remediation), and 1.x still has known unpatched disclosed vulnerabilities because the software was declared end-of-life (EOL) in 2015.
- Legacy packages present a vulnerability to the companies deploying them in their environments — it means they will need to know what open source packages they have deployed and where to maintain and update these codebases over time.
- The prevalence of “supercoders” in the FOSS community. Much of the most widely used FOSS is developed by only a handful of contributors – results in one dataset show that 136 developers were responsible for more than 80% of the lines of code added to the top 50 packages. Additionally, as stated in the Census II preliminary results in 2020, project atrophy and contributor abandonment is a known issue with legacy open source software. The number of developer contributors who work on projects to ensure updates for feature improvements, security, and stability decreases over time as they prioritize other software development work in their professional lives or decide to leave the project for any number of reasons. Therefore, it is much more likely that these communities may face challenges without sufficient developers to act as maintainers as time goes by.
What resources exist to better understand and mitigate potential problem areas in Open Source Software development?
The Linux Foundation’s community and other open source projects initiatives offer important standards, tooling, and guidance that will help organizations and the overall open source community gain better insight into and directly address potential issues in their software supply chain.
Software Bill of Materials: Adopt the ISO/IEC 5962:2021 SPDX SBOM Standard
An actionable recommendation from Census II is to adopt Software Bill of Materials (SBOM) within your organization. SBOMs serve as a record that delineates the composition of software systems. Software Package Data Exchange (SPDX) is an open international standard for communicating SBOM information that supports accurate identification of software components, explicit mapping of relationships between components, and the association of security and licensing information with each component.
Many enterprises concerned about software security are making SBOMs a cornerstone of their cybersecurity strategy. The Linux Foundation recently published a separate study on SBOM readiness within organizations, The State of Software Bill of Materials (SBOM) and Cybersecurity Readiness. The report offers fresh insight into the state of SBOM readiness by enterprises across the globe, identifying patterns from innovators, early adopters, and procrastinators.
Differentiated by region and revenue, these organizations identified current SBOM production and consumption levels and the motivations and challenges regarding their present and future adoption. This report is for organizations looking to better understand SBOMs as an important tool in securing software supply chains and why it is now time to adopt them.
Take the free training on secure software development
The Open Source Security Foundation (OpenSSF) has developed a trio of free courses on how to develop secure software. These courses are part of the Secure Software Development Fundamentals Professional Certificate program. There’s a fee if you want to try to earn a certificate (to prove that you learned the material). However, if you just want to learn the material without earning a certificate, that’s free; simply audit the course. All three courses are available on the edX platform.
The courses included in the program are:
- Secure Software Development: Requirements, Design, and Reuse (LFD104x)
- Secure Software Development: Implementation (LFD105x)
- Secure Software Development: Verification and More Specialized Topics (LFD106x)
Focus on building security best practices into your open source projects
The OpenSSF develops and hosts its Best Practices badging program for open source software developers. This initiative was one of the first outputs produced as a result of the Census I, completed in 2015. Since then, over 4,000 open source software projects have engaged, started, or completed obtaining a Best Practices Badge.
Projects that conform to OpenSSF best practices can display a badge on their GitHub page or their own web pages and other material. In contrast, consumers of the badge can quickly assess which FLOSS projects are following best practices and, as a result, are more likely to produce higher-quality and secure software. Additionally, a Badge API exists that allows developers and organizations to query the practice score of a specific project, such as Silver, Gold, and Passing. This means any organization can do an API check within their workflow to check against the open source packages they’re using and see if that project’s community has obtained a badge.
More information on the OpenSSF Best Practices Badging program, including background and criteria, is available on GitHub. The projects page shows participating projects and supports queries (such as a list of projects that have a passing badge). Project statistics and criteria statistics are available.
Understand the vulnerability vectors of your software supply chain
In addition to reviewing the Census II findings, we encourage you to read the Linux Foundation’s Open Source Supply Chain Security Whitepaper. This publication explores vulnerabilities in the open source software ecosystem through historical examples of weaknesses in known infrastructure components (such as lax developer security practices and end-user behavior, poorly secured dependency package repositories, package managers, and incomplete vulnerability databases). It provides a set of recommendations for organizations to navigate potential problem areas.
The Census II study shows that even the most widely deployed open source software packages can have issues with security practices, developer engagement, contributor exodus, and code abandonment. Therefore, open source projects require supporting toolsets, infrastructure, staffing, and proper governance to act as a stable and healthy upstream project for your organization.
Cloud Computing Compliance and Security 2023 Projects Linux How-To Diversity & Inclusion Events Open Source Best Practices 2022 Cross Technology Open Source Training and Certification LFX Blockchain Research Legal Networking and Edge Blog Data Governance LF Energy LF Research OpenChain System Administration