IBM bets on open source with DataStax acquisition
Plus: Shock-horror! Most commercial codebases still contain open source vulnerabilities.
In issue #6 of Forkable, we look at Big Blue’s continued investment in the commercial open source ecosystem with Apache Cassandra vendor and contributor DataStax.
Elsewhere, 86% of commercial codebases anlyzed by Black Duck contained open source vulnerabilities; AI chatbots can expose private GitHub repositories if they were once public; and an interespective perspective on “open source AI” and the “illusion of openness.”
If you haven’t subscribed to Forkable already, please do so now to receive new posts direct to your inbox each week.
Paul
Open issue
‘Big Blue’ buys DataStax — what that means for Apache Cassandra
IBM is one of several legacy software companies riding the AI and cloud computing wave to great riches, currently sitting at an all-time high valuation of $240 billion.
Open source has been a key component of IBM’s efforts to reinvent itself over the past 25 years, throwing its weight behind Linux at the turn of the century before going on to acquire enterprise open source company Red Hat for a cool $34 billion six years back.
Just this week, IBM’s $6.4 billion bid for Terraform-creator HashiCorp crossed the line (note: HashiCorp abandoned its open source foundations in 2023). However, IBM this week also announced it was buying Apache Cassandra services vendor DataStax. Terms of the deal weren’t disclosed at the time of writing, but DataStax was last valued at $1.6 billion three years ago.
DataStax helps businesses such as Macy’s, Audi, and Intuit deploy and manage Apache Cassandra, an open source NoSQL database designed for handling massive amounts of data across disparate locations with minimal downtime. The company offers enterprise-grade features spanning security, performance enhancements, analytics, multi-cloud support, and more.
While IBM will of course be putting this acquisition to use across its own suite of products, including its cloud-based AI and data platform WatsonX, the deal naturally raises questions about the future of Cassandra itself.
While Cassandra is a fully independent open source project under the auspices of the Apache Software Foundation (ASF), DataStax is a core contributor. DataStax executive and Cassandra committer Patrick McFadin says that the deal is actually a positive indicator for the open source project.
“IBM’s planned acquisition of DataStax signals a strategic bet on the future of Cassandra,” McFadin wrote. “IBM’s resources, combined with the incredible momentum already happening in the Cassandra project, will open up new possibilities.”
IBM, for its part, stressed that it would “continue to support, engage, and innovate” with the various open source communities that DataStax currently contributes to, including Apache Cassandra, Langflow, Apache Pulsar, and OpenSearch.
Read more: IBM to Acquire DataStax
The rundown
Codebase chaos
It’s no secret that much of the modern tech stack is built from open source, and that the software supply chain is somewhat porous.
With that in mind, a new report from application security company Black Duck noted that of the 965 commercial codebases it analyzed across 16 sectors, 86% of these codebases contained some form of open source vulnerability.
More specifically, 81% had high- or critical-risk vulnerabilities.
GitHub exposed
It turns out that AI chatbots such as Microsoft Copilot can expose private GitHub respositories to the internet.
As per TechCrunch, security researchers at Lasso found that if a repository was once public, data from that repository can linger in chatbots even after the repository is made private. The reason, according to the report, is due to Bing indexing and caching the data.
Lasso found more than 20,000 private GitHub repositories — that were once public — were still accessible through Copilot, impacting organizations such as Amazon, Google, and Microsoft itself.
Though some might argue, if a repository was in fact ever public, then anyone could have legitimately forked it and kept it public for eternity anyway.
Open source AI and the “illusion of openness”
A couple of weeks back, Forkable pontificated about how “open source” can be something of an illusion with regards to vendor-led projects, and how the “spirit of open source” means much more than a license.
Ted Shelton, chief operating officer at Inflection AI, published similar sentiments in a long-read where he compares open source AI to the “cargo cults” of indigineous South Pacific Islanders in the 1940s, who tried to replicate the “miracle” of military planes by hand-crafting — out of wood — all the airbase infrastructure (runways, radios, etc) that enabled this. Needless to say it was all just a facade and didn’t bring the planes back.
Shelton draws parallels between this, and two “open source AI” purveyors in Meta and DeepSeek.
“Both are touted as open-source triumphs of AI, freely available to all,” Shelton writes. “The reality, however, is that these projects are less a gift to the community and more strategic competitive maneuvers – a modern cargo cult, inspiring an imitation of the form of grassroots open source without the same spirit or intent.”
It’s an interesting perspective, and you can read all about it here: Cargo Cults and the Illusion of Openness.
Patch notes
French privacy-focused tech firm Murena launched a “de-Googled” tablet, which ships with its own custom e/os operating system that’s forked from LineageOS (which is itself an Android fork). Costing €539, the Murena Pixel Tablet bundles many “open source alternative” apps out of the box.
The open source security foundation announced the Open Source Project Security Baseline, which amounts to a set of best practices — “aligned with international cybersecurity frameworks, standards, and regulations” — to improve open source security posture.
Electronic Arts open sourced the code for some old Command and Conquer games, following a similar move involving two games back in 2020. The games are available for anyone to “mod” under a copyleft GPL license.