DeepSeek deepens the "open source AI" debate
Plus: Google gives Pebble smartwatch a lifeline; Jack Dorsey's Block toots its open source trumpet; and a fork in the road for Semgrep.
In issue #2 of Forkable, we look at OpenAI rival DeepSeek’s rise to the top of the AI tree, with its open source credentials sharply in focus.
Elsewhere, Google open sourced the Pebble smartwatch source code to (potentially) spur a new generation of devices; Jack Dorsey’s Block is going big on open source; and Semgrep’s recent licensing changes spawn a new community fork.
If you haven’t subscribed already, please do so below — and share far and wide.
Happy weekend (when it comes).
Paul
Open issue
The “open source AI” debate deepens: A Sputnik or Google moment?
Although AI isn’t Forkable’s core focus, it has been impossible to ignore the viral sensation that is DeepSeek, which unseated ChatGPT at the top of the app stores this week.
Spawned by Chinese hedge fund High-Flyer, DeepSeek has been bubbling away for a while already, gaining a reputation as a very capable AI developer that’s lowering costs through a different approach to training its models. At the start of last week, DeepSeek released its R1 reasoning model under a permissive MIT license, sending Silicon Valley and Wall Street into meltdown. The reason(ing) was DeepSeek’s cost-to-performance ratio — less data requirements, cheaper to run, with little compromise vs. the incumbents (and reportedly better in some cases).
The work that DeepSeek is doing is important, as it addresses concerns around the viability of “scaling laws,” which in this context means that to improve an AI model you need to throw more data and compute at it. DeepSeek has seemingly upended that notion, leaning substantively on reinforcement learning — this means a model interacting with its environment, going through feedback loops, and making adjustments. Trial and error, rinse and repeat.
Mark Andreessen called DeepSeek’s R1 launch “AI’s Sputnik moment,” a reference to the Soviet Union’s 1957 satellite launch which catalyzed the subsequent “space race” and led the U.S. to invest heavily in a domestic space program that put a human on the moon before any other nation.
However, Yishan Wong, ex-CEO of Reddit and a former engineer at PayPal and Facebook, countered the Sputnik analogy by comparing the R1 launch to something Google did two decades ago. In a 2004 S-1 filing, Google teased an in-house distributed system it had built from a “combination of off-the-shelf and custom software running on clusters of commodity computers,” enabling it to generate “substantial computing resources at low cost.” Google eventually published an extensive blueprint of this work in 2006 via these MapReduce and BigTable papers.
This is why Wong reckons the Google comparison is more apt, because the Soviet Union didn’t really explain how it built and launched Sputnik. DeepSeek, for its part, published a technical paper explaining its methodology and rationale, allowing others to go about replicating the work, exactly as Google did 19 years ago.
“DeepSeek is MUCH more like the Google moment, because Google essentially described what it did and told everyone else how they could do it too,” Wong wrote. “In Google's case, a fair bit of time elapsed between when they revealed to the world what they were doing and when they published papers showing everyone how to do it. Deepseek, in contrast, published their paper alongside the model release.”
Much has also been said about DeepSeek’s Chinese roots, and how this signals some sort of power-shift in the global AI wars. But Meta’s chief AI scientist, Yann LeCun, called this interpretation poppycock (paraphrased), noting that the real story here is the power of open source AI.
“To people who see the performance of DeepSeek and think, — ‘China is surpassing the U.S. in AI’, — you are reading this wrong. The correct reading is: ‘Open source models are surpassing proprietary ones’,” LeCun wrote.
He added that DeepSeek itself has benefited greatly from other open source AI tools — including Meta’s own PyTorch (machine learning library) and Llama (large language models).
“They [DeepSeek] came up with new ideas and built them on top of other people's work,” LeCun continued. “Because their work is published and open source, everyone can profit from it. That is the power of open research and open source.”
This doesn’t mean that everyone will agree that DeepSeek is truly open source, as per the “official” definition. In many ways, DeepSeek underscores the problems in transposing a software licensing paradigm onto AI — sure, it released its models under a recognized, permissive open source license (MIT) for anyone to use, but it hasn’t divulged the exact source training data, among other components. And this is why researchers at Hugging Face are already trying to create an even more open version of DeepSeek R1 it’s calling “Open-R1”.
“The R1 model is impressive, but there’s no open data set, experiment details, or intermediate models available, which makes replication and further research difficult,” Hugging Face engineer Elie Bakouch told TechCrunch. “Fully open-sourcing R1’s complete architecture isn’t just about transparency — it’s about unlocking its potential.”
The rundown
Google gives Pebble a new lease of life… sort of
Remember Pebble, the smartwatch that smashed all manner of crowdfunding records back in 2012? Well, it’s getting a new lease of life as an open source project.
With Pebble facing insolvency and struggling against a slew of well-financed tech titans, Fitbit swooped in back in 2016 and bought most of Pebble’s assets in a firesale. Fast-forward five years, and Google acquired Fitbit — including Pebble’s assets.
Earlier this week, Google announced it was open sourcing the original Pebble software. Available on GitHub under an Apache 2.0 license, Google has stripped out some of the proprietary codebase “for licensing reasons,” including most of the bluetooth stack and the heart-rate monitor driver. For this reason, Google concedes it will require a “non-trivial amount of work” to get PebbleOS up and working again as a functional piece of firmware.
However, Pebble’s original founder and CEO Eric Migicovsky has revealed that he wants to have another stab at a Pebble-like product, while the existing Rebble community are also delighted at the Google release.
Block’s golden goose
Jack Dorsey’s Block, parent company of Square and Cash App, is tooting its open source trumpet with the launch of its Open Source Program Office (OSPO) and a bunch of related initiatives, such as an open source donation program hosted on Thanks.dev.
However, the bigger news is that Block has also formally launched an “interoperable AI agent framework” dubbed Goose, available on GitHub under an Apache 2.0 license. Block says it worked with Anthropic to build the open Model Context Protocol, which helps users connect AI assistants to systems where data lives, and which underpins its new framework.
With Goose, users can connect any large language model (LLM) to real-world actions. The first use case lies in the software engineering domain, with Block noting that it has some 1,000 engineers using it internally already to “reduce time spent on maintenance and repetitive tasks.”
License to fork
Code security company Semgrep is the latest VC-backed startup to switch up its open source licensing, sparking a consortium of companies to combine their collective resources and launch a fork.
Semgrep had raised some $90 million in VC cash for a static code analysis tool that helps companies such as Snowflake and Lyft identify vulnerabilities in their software. The underlying Semgrep engine, which the company says is something akin to Google Search for code, remains LGPL-licensed. However, Semgrep had previously open-sourced certain “rules” that are basically configurations that help identify security vulnerabilities and related issues — these rules can define what to look for, which languages to apply them to, and how the results should be reported.
Semgrep shifted these rules to its own license, as per a company announcement last month. At the same time, Semgrep migrated some key features from the open source project over to the commercial edition, much to the chagrin of 10 security companies which last week launched a Semgrep fork called Opengrep.
“Open-source license changes by private vendors can disrupt contributors and communities that help build these projects,” the Opengrep manifesto reads. “Semgrep's rebranding and license shift signal a departure from its commitment to democratize code security for developers.”
Semgrep founder and chief product officer Luke O'Malley took to LinkedIn to clarify matters, noting that only those bundling and reselling Semgrep would be affected by its changes.
Patch notes
Passbolt, an open source password manager for teams, raised a $8 million Series A round of funding.
PayPal Ventures co-led a $21 million investment into Formance, a French startup building open source infrastructure for financial applications.
Y Combinator (YC) alum Pipeshift raised $2.5 million to help enterprises build and deploy applications using open source AI components.
Openvibe, an app that integrates “decentralized” open social networks such as Bluesky, Mastodon, Nostr, and Threads, received an $800,000 investment from the likes of WordPress.com parent Automattic.
Microsoft last week open-sourced DocumentDB, a document database platform designed for NoSQL workloads built atop PostrgreSQL.
Cloud infrastructure stalwart Mirantis launched a new open source project called Rockoon, designed to simplify the management of OpenStack on Kubernetes.