Anthropic’s announcement of Claude 3.7 Sonnet notwithstanding, the breakneck pace of major AI announcements seemed to slow down through February. That gave us some time to look at some other topics. Two important posts about programming appeared: Salvatore Sanfilippo’s “We Are Destroying Software” and Rob Pike’s slide deck “On Bloat.” They’re unsurprisingly similar. Neither mentions […]
Anthropic’s announcement of Claude 3.7 Sonnet notwithstanding, the breakneck pace of major AI announcements seemed to slow down through February. That gave us some time to look at some other topics. Two important posts about programming appeared: Salvatore Sanfilippo’s “We Are Destroying Software” and Rob Pike’s slide deck “On Bloat.” They’re unsurprisingly similar. Neither mentions AI; both address the question of why our hardware is getting faster and faster but our applications aren’t. We’ve also noted the return of Pebble, the first smart watch, and an AI-driven table lamp from Apple Research that looks like it came from Pixar’s logo. Fun, perhaps, but don’t look for it in Apple Stores.
Artificial Intelligence
- Anthropic has released Claude 3.7 Sonnet, the company’s first reasoning model. It’s a “hybrid model”; you can tell it whether you want to enable its reasoning capability. You can also control its thinking “budget” by limiting the number of tokens it generates for the reasoning process.
- The Computer Agent Arena is a platform for crowdsourced agent testing. It allows anyone to run an agent using two different AI models, observe what the agent is doing, and rate the results. Results are summarized on a leaderboard; right now, Claude 3.5 Sonnet is at the top.
- Google is developing a “co-scientist” that suggests hypotheses for scientists to investigate. The hypotheses are based on the scientist’s goals, ideas, and past research. The company’s looking for researchers to help with testing.
- GitHub has upgraded agent mode for Copilot. It will now iterate on buggy code until it delivers correct results, and can add new subtasks to the original if they’re needed to accomplish the user’s goal.
- Open-R1 is a new project that intends to create a fully open reproduction of DeepSeek R1. In addition to code and weights, this project will release all tools and synthetic data used to train the model.
- Moshi is a new conversational (speech-to-speech) language model that is constantly listening and can handle interjections like “uh huh” without getting confused.
- Codename Goose is a new open source framework for developing agentic AI applications. It uses Anthropic’s Model Context Protocol for communicating with systems that have data, and can discover new data sources on the fly.
- The University of Surrey will be building a language model for sign language. One focus will be translating between spoken language and sign language. The goal is to ensure that the deaf community isn’t left behind by the explosion of AI tools.
- Galileo is an agentic toolset for detecting when an AI model is hallucinating. It’s particularly important for agentic systems, where an error by one agent leads to misbehavior by others downstream.
- A group of researchers released s1, a 32B reasoning model with near state-of-the-art performance. s1 cost only $6 to train. A very small set of training data (only 1,000 reasoning samples) proved sufficient when the model was forced to take extra time for reasoning.
- Some researchers published How to Scale Your Model, a book on how to scale large language models. The book is apparently internal documentation from Google DeepMind.
- OpenAI has released o3-mini, a small and cost-efficient language model based on its (still unreleased) o3 reasoning model.
- Anthropic has deployed its Constitutional Classifier for adversarial testing by the public. The classifier is a system that protects Claude models from jailbreaks and attempts to get Claude to answer questions that aren’t allowed. Early results look very good.
- The lesson to learn from DeepSeek R1 is that, given a good foundation model, it’s less difficult than many thought to develop a reasoning model. In the coming months, expect many open alternatives.
- OpenAI has introduced DeepResearch, an application based on its o3 model that claims the ability to synthesize large amounts of information and perform multistep research tasks.
- Sam Altman has acknowledged that OpenAI is on the “wrong side of history” as far as open source AI but also said that addressing the issues was not a high priority.
- Alibaba has launched Qwen2.5-Max, another large language model with performance on the same level as GPT-4 and Claude 3.5 Sonnet. It can be accessed through Qwen Chat or Alibaba’s cloud.
- Transformer Lab is a tool for experimenting with, training, fine-tuning, and programming LLM models locally. It’s still installing, but it looks like Ollama on steroids.
- smolGPT is “a minimal PyTorch implementation for training your own small LLM from scratch.”
- Yes, Microsoft is complaining that DeepSeek used OpenAI to generate synthetic training data. Those objections didn’t stop it from making DeepSeek available on Azure.
- Two composers collaborated with Google’s Gemini to create The Twin Paradox, a work for a classical symphony orchestra.
- Alibaba has released two “checkpoints” to its models, Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M. These models have large 1M-token context windows. Alibaba has also open-sourced its inference framework, which the company claims is three to seven times faster.
- TinyZero reproduces DeepSeek’s R1 Zero, a reasoning model with 3B parameters. Training TinyZero cost under US$30. You could download TinyZero, but you could also make your own for less than the cost of an evening out. Do we need expensive models?
Programming
- Tanagram is promising a toolset for helping developers understand and work with complex codebases. So far, there are only demos, but it sounds interesting.
- Harper Reed describes his workflow for programming with AI. Developing a workflow is essential to using AI effectively, and Harper has given the most thorough description we’ve seen.
- Like Linux, Ruby on Rails can run in the browser. This hack uses WebAssembly.
- Linux booting inside a PDF in Chrome. PDF implementations support JavaScript; C can be compiled into a subset of JavaScript (asm.js), which means that a RISC-V emulator can be compiled to JavaScript and run in a PDF in the browser, which then runs Linux. An amazing hack.
- OCR4all provides free and open source optical character recognition software. Should you need it.
- Why does software run no faster than it did 20 or 30 years ago, despite much faster computers? Rob Pike has some thoughts on controlling bloat.
- As the name implies, Architectural Decision Records (ADRs) capture a decision about software architecture and the reason for the decision. All too frequently, this information isn’t captured. It is likely to become more important in the era of AI-assisted software development.
- Jank is a new general purpose programming language. It’s a dialect of Clojure that incorporates ideas from many other languages, including C++ and Rust, and is built on top of the LLVM.
- Here’s a set of patterns for building real-time features into applications.
- Salvatore “antirez” Sanfilippo’s post, “We Are Destroying Software,” is a must-read. (It says nothing about AI.) It starts “We are destroying software by no longer taking complexity into account.”
- Script is a Go library that makes it possible to do shell-like programming in Go. Its biggest contribution is the ability to create pipes; it also has Go functions that are similar to grep, find, head, tail, and other common shell commands.
Security
- Threat actors aligned with Russia are targeting Signal, the secure messaging application, with phishing attacks that link users’ accounts to hostile devices. One group sends QR codes that look legitimate but link to a device under their control; another impersonates an application used by Ukraine’s military. The best protection is to update to the latest version of Signal.
- Two new vulnerabilities in OpenSSH have been found. One exposes OpenSSH servers to man-in-the-middle attacks; the other can lead to denial-of-service attacks. An update has been released; install it.
- DarkMind is a new attack against reasoning language models. It’s possible to build custom applications (like those in the GPT Store) with “hidden triggers” that modify the reasoning process.
- A new kind of supply chain attack involves obtaining abandoned AWS S3 buckets that still hold libraries that are frequently downloaded. The new owner can insert malware into the libraries; the original owner, who abandoned the bucket, can’t patch the corrupted libraries.
- Security is blocking AI adoption, particularly in heavily regulated industries. That’s understandable; many of the questions we ask of secure systems can’t be adequately answered for AI.
- Microsoft’s AI Red Team has published Lessons from Red Teaming 100 Generative AI Products. It’s essential reading for anyone interested in building a secure AI system.
- AI is being used to submit fake feature requests and bug reports on open source projects. Many of these may be inadvertent, but regardless of cause, it’s generating problems for software maintainers.
- Linux has a number of tools for detecting rootkits and other malware. Chkrootkit and LMD (Linux Malware Detect) are worth your attention.
- Time Bandit is a new jailbreak for the GPT models. The attack causes the model to lose track of past, present, and future. Essentially, you ask GPT how someone in the past would do something that can only be done in the present. It’s unclear whether this attack works on other models.
- When the price of bitcoin goes up, so does the frequency of cryptojacking: hijacking computers to form crypto-mining botnets. It’s claimed that for every dollar of crypto that’s mined, the victim incurs $53 in cloud costs.
- A new backdoor to VPNs has been discovered in the wild, giving attackers access to corporate networks. These backdoors stay dormant until they are triggered by a specially constructed “magic packet,” making them difficult to detect.
Web
- As more people ask AI for product recommendations, marketers will need to optimize product perception by language models. Does LLMO replace SEO? Optimizing for an LLM may be the next generation of SEO.
- This article tells you how to opt out of Gemini features in Gmail and other Google Workspace applications. It’s possible to disable Gemini selectively. Unfortunately, it requires you to have access to the administrator’s console.
- JavaScript’s Temporal object is starting to appear in browsers! Temporal is a replacement for the inadequate Date object. It allows programmers to work effectively with dates and times.
- Marginalia is an open source search engine that prioritizes noncommercial resorts.
Quantum Computing
- Microsoft has created a topological qubit on a new quantum chip. While its chip currently has only 8 qubits, Microsoft claims it can scale to millions of qubits. Putting this many qubits on a chip would go a long way to solving the problem of moving quantum data between chips.
- Canadian startup Xanadu has built a quantum computer using photonics. It currently has 12 qubits, but the company believes it can scale to larger systems.
Robotics
- Robotic models of extinct animals are helping paleontologists discover how those animals might have lived: how they walked, swam, and flew in their environments.
Gadgets
- Pebble returns? Remember the crowdfunded Pebble smartwatch that was available long before Apple’s Watch? It’s coming back—maybe. And it will be hackable.
- Something we all need: An engineering team at Apple developed an AI-driven table lamp. Not available in an Apple Store near you.