Blog | Linux Foundation

Charting a path forward for global collaboration in open source AI: Key takeaways from the GOSIM open source AI strategy forum

Written by Cailean Osborne | Jul 18, 2025 7:14:42 PM

The open source AI ecosystem has reached a pivotal moment. There are now almost 2 million models on the Hugging Face Hub and open models, including a growing number of small but mighty ones, are rapidly catching up to proprietary alternatives in performance. Beyond models, researchers and developers across the world are collectively democratizing AI by sharing and collaboratively developing a range of open technologies, from open source frameworks and open standards for building AI agents and robots to open pretraining datasets and benchmarks for specialized domains and underrepresented languages.

Yet significant challenges remain that could hinder open collaboration in the global open source AI community from geopolitical tensions and regulatory fragmentation to release and licensing practices that fall short of full openness and enterprise concerns about trust and safety.

Recently, the GOSIM Open Source AI Strategy Forum brought together experts from industry, academia, civil society, and open source communities to address these challenges, among others, head-on. These discussions, captured in our new report, reveal both the potential and critical barriers facing global collaboration in open source AI.

N.B. As the forum discussions took place under the Chatham House rule, all participants have been anonymized in the report and this post.

 

The challenges facing the open source AI ecosystem

The forum participants discussed major challenges facing the open source AI community today, which include the following:

Limited openness in “open source” AI: There are divergent practices of openness when it comes to model releases and licensing. In many cases, model releases are limited to the sharing of weights and in some cases they are released under restrictive licenses that contradict open source principles. Participants highlighted that access to documentation about a model’s training process as well as to training code and data are crucial for enabling reproducibility, safety auditing, iterative development, and adoption.

Regulatory fragmentation: Divergent regulations risk creating barriers to global collaboration and fragmenting the open source ecosystem. While the EU's AI Act provides the first comprehensive framework, uncertainty remains about regulatory alignment elsewhere. In addition, awareness of the implications of AI regulations like the AI Act for open source AI developers in both Europe and elsewhere is low.

Enterprise adoption gaps: Despite the lower costs and customization benefits of open models, enterprises remain hesitant to adopt them due to concerns about trust and safety. Open models often lack detailed documentation that enterprises require to conduct rigorous due diligence, and there is a major gap between the performance of models on benchmarks in controlled research settings versus real-world deployment scenarios in regulated industries.

Resource bottlenecks: Limited access to public research infrastructure, in particular compute, is a major bottleneck for academic researchers and grassroots initiatives seeking to participate in open source AI development.

Strategic recommendations for key stakeholders

The forum participants identified priorities and pathways forward across five critical areas:

Promoting standards of openness in open source AI

The open source AI community should rally around standards of openness and transparency that uphold the four freedoms of open source: use, study, modify, and redistribute. The community should champion open release practices that uphold these freedoms, while promoting permissive licenses like the new OpenMDW license for models, Apache 2.0 or MIT for software, and the CDLA v2 or CC BY licenses for datasets.

 

Strengthening digital sovereignty through openness in AI

Participants highlighted how open source represents a tool for digital sovereignty. By embracing and investing in openness in AI, governments can simultaneously strengthen their AI capabilities and sovereignty, while also fostering participation and collaboration in the global open source AI community. Initiatives in Europe like OpenLLM Europe and OpenGPT-X exemplify how the collaborative development of open multilingual models can build up regional know-how in AI and provide sovereign alternatives that are specialized in regional languages and cultures. 

 

Advancing research and reproducibility

Open source is a pillar of AI research and reproducibility, yet the general lack of transparency in AI R&D severely limits scientific progress. Research institutes like BAAI and Ai2 and grassroots initiatives like EleutherAI, the BigScience Workshop, and LAION are concrete examples of the implementation of open science principles in open source AI development, from documenting their processes to sharing weights, code, and datasets for others to use, scrutinize, learn from, and build on their work. Looking forward, participants advocated for designing public research grants for AI R&D that incentivize require open science practices, as well as investing in public research infrastructure and open source AI applications in public benefit domains.

 

Facilitating enterprise adoption of open source AI solutions

Participants highlighted the need to bridge the gap between how well models perform on benchmarks in controlled research settings versus real-world, regulated contexts. This includes developing evaluation tools and benchmarks, as well as improving the minimum level of transparency provided about model training processes to facilitate due diligence. In addition, companies should collaborate on the development of open source frameworks and open standards in emerging domains like AI agents before being prematurely locked into proprietary systems. Open source foundations were highlighted as organizations that provide neutral platforms and open governance that can facilitate such collaborations.

 

Promoting responsible practices in the open source community

Open source provides inherent advantages through transparency and distributed auditing, yet the speed of AI innovation demands proactive approaches to responsible development. This includes developing open source evaluation frameworks and benchmarks for AI safety, creating educational resources on responsible practices and regulatory compliance, and leading by example as practitioners. For example, the BigCode project trained StarCoder2, an open LLM for code, using Software Heritage's code archives in compliance with the latter’s ethical principles about training LLMs on their code archive. Subsequently, BigCode released the ethically trained model, the filtered Stack v2 training dataset, as well as a research paper detailing their code filtering process for others to use, study, learn from, and build on. 

 

A collaborative path forward

The forum's most important insight may be that these challenges require collaborative solutions. No single stakeholder group can address the challenges facing the open source AI ecosystem alone. Instead, success depends on building bridges between researchers and industry, governments and open source communities, local initiatives and global platforms. Towards this end, the recommendations in the report provide signposts that can guide the global community towards fostering a vibrant, collaborative, and sustainable open source AI ecosystem. 

 

Cailean Osborne, PhD, is a Senior Researcher at the Linux Foundation, where he conducts strategic research and advocacy for promoting openness in AI. He has a PhD in Social Data Science from the University of Oxford, where he wrote his thesis on collaboration dynamics in the open source AI ecosystem.