AI - Thursday, October 24, 2024: Commentary with Notable and Interesting News, Articles, and Papers
Commentary and a selection of the most important recent news, articles, and papers about AI.
Today’s Brief Commentary
Do tech vendors think the general public wants to know the in-the-weeds details of their products? This seems to be the case with both AI and quantum providers. While I understand the details are important to implementors, engineers, and some users, I think most people don't care about the umpteen variations of and algorithms within your latest LLM or the fidelity (whatever that is, they wonder) of your 2-qubit gate (again, what?) operations on your esoteric quantum hardware.
Some providers certainly do this well: a high-level press release that discusses the value of the innovations, a blog with an overview of the technical achievements, and a research paper with the exact details. Regarding that paper, it's more impressive if it is peer-reviewed.
To help with one technical phrase, I've included several links today related to "Mixture of Experts" for machine learning. From its name alone, it's hard to understand what it is. It's not a group of people nodding sagely about how to do something. In fact, it's a technique that goes back 33 years (that is, a third of a century) for dividing and conquering in an internal AI architecture. Rather than using a monolithic, dense neural network, use several smaller, faster components and combine results at the end. It's a nice concept once you get past the historic scientific name.
Packt is running a special sale until November 1 on my quantum computing book Dancing with Qubits, Second Edition.
The book has twenty 5-star ratings, and the paperback will be 20% off between now and the end of the month.
General News, Articles, and Analyses
Denmark Launches Leading Sovereign AI Supercomputer to Solve Scientific Challenges With Social Impact | NVIDIA Blog
https://blogs.nvidia.com/blog/denmark-sovereign-ai-supercomputer/
Author: David Hogan
Commentary: Researchers will also use the system to simulate quantum computing circuits.
(Wednesday, October 23, 2024) “NVIDIA founder and CEO Jensen Huang joined the king of Denmark to launch the country’s largest sovereign AI supercomputer, aimed at breakthroughs in quantum computing, clean energy, biotechnology and other areas serving Danish society and the world.”
Welcome to State of AI Report 2024
https://www.stateof.ai/
“The State of AI Report analyses the most interesting developments in AI. We aim to trigger an informed conversation about the state of AI and its implication for the future. The Report is produced by AI investor Nathan Benaich and Air Street Capital.”
AI Chipsets
Qualcomm brings laptop-class CPU cores to phones with Snapdragon 8 Elite - Ars Technica
(Tuesday, October 22, 2024) “And no 2024 chip announcement would be complete without some kind of AI mention: Qualcomm's image signal processor is now an “AI ISP,” which Qualcomm says “recognizes and enhances virtually anything in the frame, including faces, hair, clothing, objects, backgrounds, and beyond.” These capabilities can allow it to remove objects from the background of photos, among other things, using the on-device processing power of the chip's Hexagon neural processing unit (NPU). The NPU is 45 percent faster than the one in the Snapdragon 8 Gen 3.”
Generative AI and Models
What is mixture of experts? | IBM
https://www.ibm.com/topics/mixture-of-experts
Author: Dave Bergmann
Commentary: This is a good tutorial on the Mixture of Experts technique or architecture for neural networks. It assumes some technical knowledge of machine learning, at least two levels below what a layperson might understand.
(Friday, April 5, 2024) “Mixture of experts (MoE) is a machine learning approach, diving an AI model into multiple “expert” models, each specializing in a subset of the input data.”
IBM’s New Granite 3.0 Generative AI Models Are Small, Yet Highly Accurate and Efficient | NVIDIA Technical Blog
Authors: Maryam Ashoori and Chintan Patel
(Monday, October 21, 2024) “Today, IBM released the third generation of IBM Granite, a collection of open language models and complementary tools. Prior generations of Granite focused on domain-specific use cases; the latest IBM Granite models meet or exceed the performance of leading similarly sized open models across both academic and enterprise benchmarks.”
Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku \ Anthropic
https://www.anthropic.com/news/3-5-models-and-computer-use
Commentary: What do you think about generative AI controlling your computer? Can you use it to increase productivity and improve the accuracy of tasks? Compare with what WalkMe is doing with AI (also linked below in the Related section).
(Tuesday, October 22, 2024) “Today, we’re announcing an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku. The upgraded Claude 3.5 Sonnet delivers across-the-board improvements over its predecessor, with particularly significant gains in coding—an area where it already led the field. Claude 3.5 Haiku matches the performance of Claude 3 Opus, our prior largest model, on many evaluations for the same cost and similar speed to the previous generation of Haiku. We’re also introducing a groundbreaking new capability in public beta: computer use. Available today on the API, developers can direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking buttons, and typing text. Claude 3.5 Sonnet is the first frontier AI model to offer computer use in public beta.”
Technical Papers, Articles, and Preprints
Adaptive Mixtures of Local Experts | MIT Press Journals & Magazine | IEEE Xplore
https://ieeexplore.ieee.org/document/6797059
Authors: Robert A. Jacobs; Michael I. Jordan; Steven J. Nowlan; and Geoffrey E. Hinton
Commentary: This is the classic paper from 1991 that started the discussion about the Mixture of Experts technique for machine learning.
(Friday, March 1, 1991) “We present a new supervised learning procedure for systems composed of many separate networks, each of which learns to handle a subset of the complete set of training cases. The new procedure can be viewed either as a modular version of a multilayer supervised network, or as an associative version of competitive learning. It therefore provides a new link between these two apparently different approaches. We demonstrate that the learning procedure divides up a vowel discrimination task into appropriate subtasks, each of which can be solved by a very simple expert network.”
[2410.05080] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
https://arxiv.org/abs/2410.05080
Authors: Chen, Ziru; Chen, Shijie; Ning, Yuting; Zhang, Qianheng; Wang, Boshi; Yu, Botao; Li, Yifei; Liao, Zeyi; Wei, Chen; ; ...; and Sun, Huan
(Monday, October 7, 2024) “The advancements of language language models (LLMs) have piqued growing interest in developing LLM-based language agents to automate scientific discovery end-to-end, which has sparked both excitement and skepticism about the true capabilities of such agents. In this work, we argue that for an agent to fully automate scientific discovery, it must be able to complete all essential tasks in the workflow. Thus, we call for rigorous assessment of agents on individual tasks in a scientific workflow before making bold claims on end-to-end automation. To this end, we present ScienceAgentBench, a new benchmark for evaluating language agents for data-driven scientific discovery. To ensure the scientific authenticity and real-world relevance of our benchmark, we extract 102 tasks from 44 peer-reviewed publications in four disciplines and engage nine subject matter experts to validate them. We unify the target output for every task to a self-contained Python program file and employ an array of evaluation metrics to examine the generated programs, execution results, and costs. Each task goes through multiple rounds of manual validation by annotators and subject matter experts to ensure its annotation quality and scientific plausibility. We also propose two effective strategies to mitigate data contamination concerns. Using our benchmark, we evaluate five open-weight and proprietary LLMs, each with three frameworks: direct prompting, OpenHands, and self-debug. Given three attempts for each task, the best-performing agent can only solve 32.4% of the tasks independently and 34.3% with expert-provided knowledge. These results underscore the limited capacities of current language agents in generating code for data-driven discovery, let alone end-to-end automation for scientific research.”
Related Articles and Papers
WalkMeˣ | The only context-aware AI copilot
“WalkMeˣ is the only AI copilot with the context to meet every user with the next best action for any workflow, across any application.”