Some thoughts on real open source Artificial Intelligence

We are in the midst of a hype around Artificial Intelligence (AI) and the market is trying to get ahead of each other in all sorts of ways. One way is to claim that their AI is open source. So far, there has been a lot of open washing, meaning that they claim they are open but failing to apply common practices, like licenses approved by the Open Source Institute (OSI) to make that clear. This is so common, some weekly newsletters even have recurring segments listing all perpetrators.

Adding to this, there is an ongoing discussion about what open source for AI should mean, and OSI is even drafting a new definition for this. As late as today at the OSPOs for good conference, some big companies tried to claim that there is a gradient of open source, and if not, threatened not to be supportive of open source at all, and that seems just disingenuous to me. Through various venues, webinars and chats, I have tried to make a point of something that seems obvious to me, which leans back on the original four software freedoms. So before going further, let’s just remind ourselves of these.

The four freedoms of free software

  • The freedom to run the program as you wish, for any purpose (freedom 0).
  • The freedom to study how the program works, and change it so it does your computing as you wish (freedom 1). Access to the source code is a precondition for this.
  • The freedom to redistribute copies so you can help your neighbor (freedom 2).
  • The freedom to distribute copies of your modified versions to others (freedom 3). By doing this you can give the whole community a chance to benefit from your changes. Access to the source code is a precondition for this.

These freedoms are not a gradient, all four are needed, or it is not free and open source software.

It might be worth mentioning that OSI’s definition of open source is quite aligned with the spirit of these, and in some of their points even clearer. For example, their second point about source code includes the following clarification:

The source code must be the preferred form in which a programmer would modify the program. Deliberately obfuscated source code is not allowed. Intermediate forms such as the output of a preprocessor or translator are not allowed.

The Open Source Definition – Source Code

These two definitions have led me to the following arguments.

My arguments about the freedoms

In various venues, I have had arguments essentially like the one below.


Me: So to be able to study the AI and to modify it in any way, I need access to the data that the models have been trained on. If I can’t see, remove, add, or modify the data, I don’t have full freedom to study the AI or change it to suit me. Essentially, it is a black box, and it doesn’t matter if the weights are free because I can’t change what is weighted.

Them: But we cannot open source the data because we don’t own the copyright of the data we trained the models on.

Me: …


So they openly admit that the system as a whole is not free.

In my opinion, such AI systems per definition are not, and cannot be, viewed to be free open source software.

Another similar discussion starts with me doing the same rant, but is followed in a slightly different way.


Them: But we cannot publish the data because even though we made sure we own the copyright of the data, it includes private data and personal information.

Me. …


Now, I have to give it to them that we are a bit closer. But I still don’t have freedom 1, I can for example not clean the data from biased or erroneous private data to make the model better. There is also a risk that distributing the model would distribute access to these private data if I change other parts of the system, and thus the risk of me breaking laws doesn’t really give me freedom 3.

Where will we end up?

In conclusion, the essence of truly open source AI lies not only in the accessibility of the code or the weights but also in the freedom to access, modify, and distribute the data upon which these models are built. Without freely licensed and fully accessible training data, the promises of transparency, collaboration, and improvement inherent in the open source ethos remain unfulfilled. The four freedoms that define open source software are undermined when data remains proprietary or restricted. As the discourse around open source AI continues to evolve, it is imperative that we push for these clear values of freedom and openness, or we risk heading into a future we rely on black boxes, blind trust and reduced possibilities to shape the software as we wish.

In addition, we should perhaps find a term that can be used for other types of AI systems. Reusable, gratis, or shareware might be terms to use for that. If you have ideas, please let me know.

My sustainability June 2024

June was busy and fun! Just check all the things that happened.

Vodcast

WikiAfrica Hour had the theme: #36: Does the Wikimedia movement contribute to the SDGs? and I was a guest representing the user group. It went well in my opinion and I think it might be an inspiring episode for people who see it.

User group meeting

We had a good and productive meeting, and another member of the user group organized it. That was a lovely feeling. Minutes are published.

Affiliate health

The Affiliations Committee published new criteria for judging the health of the affiliates, and based on that I made a table to see how well Wikimedians for Sustainable Development meet them. The table makes it clear that we have some room for improvement, and makes it very actionable what we need to do.

Goals and strategy

One very concrete thing we are missing are measurable goals. So I started a page for us to collect them. When doing that, I thought it would be necessary to connect them to the movement strategy, and set up a strategy page for the user group to do that connection. Of course, both of these are just empty placeholders for now, but at least we have some concrete things for the agenda for our upcoming meetings.

Voting on the Movement Charter

The user group may vote on the adoption of the Movement Charter, so I started a page for our vote and got nominated as the person to submit it on the behalf of the user group.

Newsletter

I sent another monthly newsletter, and this one was full of stuff, both from the user group and from around the movement.

This is the first half of my sixth monthly report of my New Year’s resolutions.

Course in Climate Leadership in Politics and Public Administration

This spring I took a short course in Climate Leadership in Politics and Public Administrations (archived), 3.0 ECTS credits, remotely at Uppsala University. The course was inspiring, and some of the professors were excellent. The grades have now been reported to the central system, LADOK, and I passed!

It feels good to know a bit about possible regulatory instruments, and even if this may not be directly applicable in any of my current assignments in Open By Default, I would love to help on issues in this topic space in the future.

Re-launching Open By Default

In 2016, I started a sole proprietorship in Sweden that I called Open By Default. It was a lot of fun, but when I got full-time employment in the European Parliament and moved to Brussels, I closed it down.

But recently, on 15 May, I started it up again, now as an “eenmanszaak” in the Netherlands. So if you need any help with anything related to openness, you can hire me.

My Fediverse May 2024

This month has been my least active so far this year. Even so, I have continued to boost information about Fediverse apps, news and tips and tricks. But besides that, I have not much to report.

What I am currently hesitating on, is setting up a PeerTube place to stream to. I am still not sure if I should host one myself, or just join an existing instance. A bit more research is needed.

This is the second half of my fifth monthly reports of my New Year’s resolutions.

My sustainability May 2024

Podcast episode

Surprisingly, the tables were turned when after an interview at the Wikimedia summit in April, Eva Martin offered to interview me. I agreed, and here is the episode where I elaborate on the experience of the summit with the perspective of being there as a representative for the Wikimedians for Sustainable Development. While it is a bit specific for the summit, it still is a good introduction to the user group.

User group meeting

I announced a user group meeting, but it was not many attendees. I still took some notes and published the minutes.

On a more positive note, another user group member offered to help announce the meetings and already scheduled one for 16 June after a bit of coaching. This is precisely what I hoped for, and with just three or four more members taking on small tasks like this, it will turn into a lively group quickly.

Newsletter

There were also plenty of cool things happening in the community, and the newsletter for May was fun to write.

Grant

Last month, I reported I submitted a grant for starting a secretariat. Unfortunately, it was declined.

This is the first half of my fifth monthly reports of my New Year’s resolutions.

Thoughts on “finding” open source to contribute to

I was watching Open Source Fridays streamed on Github’s YouTube channel a little more than a week ago and was struck by how they went about recommending people find projects to contribute too. They were discussing metrics about projects, so I left a comment in the chat.

I would not recommend that way of selecting a project to contribute to. Much better is to contribute to something that you use and where you like to see an improvement.

44:15 Open Source Friday with OpenSauced – redefining the meaning of open source

Even though my pushback was well-received, I feel my point was missed. The host only went so far as to defining “use” as cloning it and getting it running.

Is there a right way to contribute to Open Source?

Yesterday, Edoardo Dusi published a needed blog post on opensource.net with thoughts aligned with mine. He titled it There is a right way to contribute to Open Source and delves deep into the hype surrounding stars and likes. He also provides a great list of other ways of contributing that are not reflected in the most common metrics. Go read it; it is well-written and what sparked this blog post.

Where to contribute

While Dusi touches upon contributing to projects he is familiar with, I want to emphasize the point more clearly. Perhaps it felt so obvious to him, he didn’t feel the need to state it. But as a Wikipedian, I am used to stating the obvious so let me delve a bit deeper into it.

The point that I tried to make in the livestream, and what came so naturally to Dusi, is that it is much easier to contribute to an open source project if you are familiar with it because whatever they build is part of your workflow and that the workflow depends on it working. Knowing what the software is trying to do and the aims of at least one end user (that being you) can really help you along the way when making contributions.

But we are not there yet. Because if you are anything like me, and mostly rely on open source tools, it may not narrow it down much. In that case, I think there are basically three strategies to pick from: your need, your joy, and their need. These were ordered, and I’ll delve into each of them and explain why I think this is the order to consider. I will also mention some, in my important, properties of codebases related to this.

Your need for a change in the codebase

I believe this is partly what Dusi was talking about when he mentioned business motivation. But whereas he described a larger tit-for-tat scenario that would lead to long-time gains in the codebase, I really mean something more direct, as referred to in “scratch your own itch”. By solving a problem in a workflow or tool that you are experiencing yourself, not only do you have in-depth knowledge of the problem, you are also properly motivated to solve it. The reward becomes inherently tangible because you will reap it yourself. And while sometimes it may not actually be worth the time, the joy of seeing your improvement every time you are in that workflow may be very satisfying.

Example – Wikidata SPARQL service

I often use the Wikidata SPARQL service to create map queries that I later used on Wikipedia. But to add them to an article, there was always a step in reformatting the query as Mediawiki did not accept the line breaks similarly to the query service. Therefore, in a hackathon, I wrote a tiny conversion tool and got it added to the code snippets export functionality so that I now just need to copy and paste every time I do a new query.

The mapframe code snippet.

Your joy of making a contribution

This motivation may be a variation of the former, but I distinguish it separately because often there might not be a direct reward in some of your workflows. What I group in this category are projects that you are charmed by and just want to exist in the world. It could because they are just fun ideas that tickle your mind, or a civictech project that you feel is important to the world somehow. In this group, I would also place most of the motivations that Dusi mentioned, the long-term view of improving some part knowing that others will improve other parts down the line, making the entire project better.

Example – Weeklypedia

Weeklypedia is an automated weekly statistics generator, showing which articles on a language version of Wikipedia got the most edits last week. I don’t really use this knowledge for anything, but I think it is a fun tool, and it gives me a peek into what is on my fellow editors’ minds this week. Here, I could translate the interface to Swedish, and now I get the newsletter delivered to my inbox in my native language. Easier to read for me, and it feels great that it might also lower the barriers for others.

Their need of help

Perhaps surprisingly, the next option in my recommended order is to look at young or small communities rather than the big and “healthy” ones from within the software you are using. My reasons are two-fold.

First, in a small community, even a tiny contribution can have a lot of impact. Not only because you might actually be accelerating the development by a considerable amount, but also because in a smaller community, someone else caring might raise the spirits in the community by orders of magnitude.

Secondly, if your plan was to start an “open source career”, in a small community it may be a shorter step to be delegated more responsibilities and have an impact on the direction of development. Now, keep in mind that not all small communities are looking for contributions and collaboration; it might be a single person’s pet project, so check that your help is wanted before you get started.

Example – OpenRefine

I have been an OpenRefine user for many years, and even did a few video tutorials showing my workflows with the tool. Two years ago, the advisory committee needed a new member, and since one of the staff knew that I was showcasing the tool, I was asked if I would consider helping. Now, OpenRefine is neither small nor new, but there was clearly a need from their side. Since I had experience from being on NGO boards from before, it felt like an excellent way for me to help the community, even though technical contributions here is beyond my skills.

Other properties to consider

Even within these groups, you might have several codebases that you are considering contributing to, and then I think there are some properties that make sense to review.

Ease of collaboration

It might not be surprising that I think that ease of collaboration is an important property of a codebase; after all, for the last five years I was working on the Standard for Public Code, which is all about making it easier to contribute a codebase. Besides the obvious benefit of making it easier for yourself when contributing, it is also a strong signal of a community who wants more people to join them. So if a codebase has a well crafted contributing file or other ways that guide a new contributor into the community and make them (and you!) feel welcome, I would suggest it is a codebase well worth investing your time in.

External rewards

Only lastly, I want to acknowledge the external rewards. Especially if you are looking for a professional career, there are signals that a future employer might be looking for. Now, I want to emphasize that I believe that these in themselves are poor criteria to start with when you are looking for a project to help. But if you are looking at two projects that are equal in all other aspects, it would be naive to suggest that these “fame metrics” would not matter, whether we like it or not.

To be fair, I admit having participated in Hacktober fest and got the t-shirt, and to have submitted codebases I am working in to it and other campaigns. But today, I see these phenomena more as a way to explore new tools and to do outreach, rather than a path for impactful contributions and fame.

Conclusion

In conclusion, contributing to open source is not merely about following metrics or seeking external validation. It’s about finding alignment between your own needs, passions, and the needs of the projects you engage with. As Edoardo Dusi suggested, there is indeed a right way to contribute to open source—one that transcends the superficial measures of popularity and, in my opinion, starts from within you and your needs.

Whether you’re addressing your own pain points, finding joy in nurturing projects close to your heart, or answering the call for help in communities in need, the essence of open source contribution lies in the depth of your engagement and the authenticity of your motivations.

Let that be your guiding star. 🌠

Guest in my own podcast

I have been podcasting for Wikipediapodden for almost five years now. The show is a weekly news run down with recurring segments. However, often when I go to Wikimedia events, I bring my gear and record special episodes, interviews with some of the attendees of the event. I did this again for the Wikimedia Summit in Berlin this year too, when I participated as the representative for Wikimedians for Sustainable Development.

After I recorded the interview with Eva Martin, from the organizing team in Wikimedia Deutschland, she asked if she could interview me with the same questions I was using. I accepted, and we recorded straight away. The result was published today.

Image: Matthias WörleCC BY-SA 4.0

My sustainability April 2024

April didn’t see much on-wiki activity for the Wikimedians for Sustainable Development user group, but I did do two large activities. Plus, I got my act together and sent a newsletter for March and April.

Wikimedia Summit 2024

First, I went to the Wikimedia Summit as a representative for the user group. It was a lot of work, but really focused on the Wikimedia Movement Charter, which is so generic it won’t have much direct impact on the user group activities (but possible on the governance). I was interviewed for a podcast in my role as a user group representative, but that episode will be published in early May.

Grant writing

Second, I wrote an application to the O’Shaughnessy Fellowships to work on setting up a secretariat for the user group. While the likelihood it is approved is fairly low, if successful, it would give me the opportunity to work full-time in it, so keep your fingers crossed.

As it was a fellowship, much of the application is focused on me, and not relevant to share, but I also made an action plan that might be useful for someone thinking about similar progress for their affiliate. I posted that in my own Ideas repository, along with the video pitch I recorded.

This is the second half of my fourth monthly reports of my New Year’s resolutions.