The Batch | DeepLearning.AI (Page 2)

Test 0724-2

Test Trigger Build

Coming soon

This is The Batch, a brand new site by Analytics DeepLearning.AI that's just getting started. Things will be up and running here shortly, but you can subscribe in the meantime if you'd like to stay up to date and receive emails when new content is

Aerial view of a hedge maze with a large black spider at the center, surrounded by trees, benches, and paths.

Building a model for vision and speech: How Cloudflare thwarts unauthorized AI crawlers… by using AI

Nvidia’s Nemotron adds reasoning to Llama models. Does ChatGPT make frequent users more lonely? OpenAI’s o1-pro costs a pretty penny. Mistral Small 3.1 gives Gemma 3 27B some competition.

Children learning programming in a modern classroom with laptops, robots, and screens displaying code, promoting STEM education.

This Aardvark predicts the weather: GPT-4o meets Whisper; OpenAI’s new models

Nvidia gives Project DIGITS a new name. AI models compete to build Minecraft items. Claude chatbot now includes search. A Moore’s law-like regularity for AI agents.

AYA Vision architecture diagram showing vision encoder, multimodal merging, and LLM backbone for AI-powered image processing.

Inside Google’s Co-Scientist, Copyright Office Weighs Generated Works, Multilingual (and Good at All of Them), Diffusion for Materials Design

The Batch AI News and Insights: Last Friday on Pi Day, we held AI Dev 25, a new conference for AI Developers.

Top left: attendees watch a presentation. Top right: crowd at a developer booth. Bottom left: fortune cookie says ‘Build baby build!’ Bottom right: staff check in attendees.

Lessons From Our First AI Dev Conference: How our learners began building their way to AI in everything at AI Dev 25 Tags: Letters, DeepLearning.AI News, Learning & Education

Last Friday on Pi Day, we held AI Dev 25, a new conference for AI Developers.

Scientific diagram of a denoising model generating stable materials from random elements based on chemistry and symmetry

Designer Materials: MatterGen, a diffusion model that designs new materials with specified properties

Materials that have specific properties are essential to progress in critical technologies like solar cells and batteries. A machine learning model designs new materials to order.

GIF of AI-assisted art: A landscape is edited, a cyborg sketch turns photorealistic, and a cat reads a newspaper, showing human input for copyright

Some AI-Generated Works Are Copyrightable: U.S. Copyright Office says that no new laws are needed for AI-generated works

The United States Copyright Office determined that existing laws are sufficient to decide whether a given AI-generated work is protected by copyright, making additional legislation unnecessary.

AI co-scientist workflow diagram showing a research goal assigned to specialized AI agents for hypothesis testing and ranking

Science Research Proposals Made to Order: AI Co-Scientist, an agent that generates research hypotheses, aiding drug discovery

An AI agent synthesizes novel scientific research hypotheses. It's already making an impact in biomedicine.

AYA Vision architecture diagram showing vision encoder, multimodal merging, and LLM backbone for image processing

Equally Fluent in Many Languages: Cohere’s Aya Vision beats multilingual rivals in text & image understanding

Multilingual AI models often suffer uneven performance across languages, especially in multimodal tasks. A pair of lean models counters this trend with consistent understanding of text and images across major languages.

A man pastes “AI GENERATED” posters on a graffiti-covered wall in an urban alley, suggesting a street art or guerrilla marketing act.

ERNIE checks competitors with low prices: AI2’s OLMo2 32B may be the top fully open model

Google’s two new Gemini vision-language-action robotics models. Cohere’s Command A, another lightweight LMM. New China regulations require mandatory labels for AI content. Monitoring reasoning models for reward hacking or unwanted behavior.

A therapy session in a modern office where a patient lies on a couch talking to an AI-powered computer therapist.

AI giants’ U.S. policy proposals: Gemma 3 beats bigger open weight rivals

OpenAI’s new SDK and APIs for agentic workflows. Olympic Coder, two powerful open coding models. Alibaba applies RL to emotion detection. GPT-4.5 and Claude Sonnet 3.7 top a new agent leaderboard.

Illustration of a programmer at a computer displaying PyTorch code, while a smiling colleague gives a thumbs-up in approval.

Letters

Learn the Language of Software: AI won’t kill programming. There has never been a better time to start coding.

Some people today are discouraging others from learning programming on the grounds AI will automate it.

The Batch Newsletter

DeepSeek-R1 Uncensored, QwQ-32B Puts Reasoning in Smaller Model, Phi-4-multimodal Takes Spoken Input, Training AI May Not Be Fair Use

The Batch AI News and Insights: Some people today are discouraging others from learning programming on the grounds AI will automate it.

AI model performance benchmark comparing R1 1776 and DeepSeek-R1 across MMLU, DROP, MATH-500, and AIME 2024 tests.

Tech & Society

DeepSeek-R1 Uncensored: Perplexity launches uncensored version of DeepSeek-R1

Large language models built by developers in China may, in some applications, be less useful outside that country because they avoid topics its government deems politically sensitive. A developer fine-tuned DeepSeek-R1 to widen its scope without degrading its overall performance.

Gavel striking a neural network, symbolizing legal decisions impacting AI and machine learning technologies.

Tech & Society

Judge Upholds Copyright in AI Training Case: U.S. court rejects fair use defense in Thomson Reuters AI lawsuit

A United States court delivered a major ruling that begins to answer the question whether, and under what conditions, training an AI system on copyrighted material is considered fair use that doesn’t require permission.

Phi-4 Mini multimodal architecture integrating vision, audio, and text with token merging and LoRA-adapted weights for AI processing.

Machine Learning Research

Microsoft Tackles Voice-In, Text-Out: Microsoft’s Phi-4 Multimodal model can process text, images, and speech simultaneously

Microsoft debuted its first official large language model that responds to spoken input.

QwQ-32B vs DeepSeek-R1 AI model performance benchmark across AIME24, LiveCodeBench, LiveBench, IFEval, and BFCL datasets.

Machine Learning Research

Compact Reasoning: QwQ-32B challenges DeepSeek-R1 and other larger reasoning models

Most models that have learned to reason via reinforcement learning were huge models. A much smaller model now competes with them.

Futuristic nightclub with neon lights, a dancing crowd, and a supercomputer DJ booth glowing amid fog and lasers.

Data Points

EAGLE-3 speeds up language models: And the 2024 Turing Award goes to…

Music and lyrics in one diffusion model. Manus AI’s impressive demos spark excitement and backlash. OpenAI sees AGI as a gradual evolution. Google unveils its first Gemini-branded embedding models.

A man sitting side by side with his computer at a bar as if they are having a friendly conversation.

Data Points

Qwen’s mid-sized reasoning model scores big: Sesame moves through speech models’ “uncanny valley”

Cohere’s open vision models support many languages. Jamba 1.6’s two hybrid MoE models promise more speed. Anthropic overhauls its developer console for Claude Sonnet 3.7. Mistral brings its multilingual/multimedia skills to OCR.

Diagram of an RQ-Transformer speech system with Helium and Depth Transformers for audio processing.

Letters

Wait Your Turn! Conversation by Voice Versus Text: Text interactions require taking turns, but voices may interrupt or overlap. Here’s how AI is evolving for voice interactions.

Continuing our discussion on the Voice Stack, I’d like to explore an area that today’s voice-based systems mostly struggle with: Voice Activity Detection (VAD) and the turn-taking paradigm of communication.

The Batch Newsletter

GPT-4.5 Goes Big, Claude 3.7 Reasons, Alexa+ Goes Agentic, Generating Text Like an Image

The Batch AI News and Insights: Continuing our discussion on the Voice Stack, I’d like to explore an area that today’s voice-based systems mostly struggle with: Voice Activity Detection (VAD) and the turn-taking paradigm of communication.

Amazon smart display with widgets for recipes, calendar, weather, events, and streaming (Prime Video, Netflix, Disney+).

Tech & Society

Amazon’s Next-Gen Voice Assistant: Alexa+ adds generative AI and agents, using Claude and other models

Amazon announced Alexa+, a major upgrade to its long-running voice assistant.

Table comparing Claude 3.7, 3.5, o1, o3-mini, DeepSeek R1, and Grok 3 Beta on reasoning, coding, tools, visuals, and math.

Machine Learning Research

Budget for Reasoning to the Token: Claude 3.7 Sonnet adds extended thinking mode

Anthropic’s Claude 3.7 Sonnet implements a hybrid reasoning approach that lets users decide how much thinking they want the model to do before it renders a response.

Latest

Test 0724-2

Coming soon

Building a model for vision and speech: How Cloudflare thwarts unauthorized AI crawlers… by using AI

This Aardvark predicts the weather: GPT-4o meets Whisper; OpenAI’s new models

Inside Google’s Co-Scientist, Copyright Office Weighs Generated Works, Multilingual (and Good at All of Them), Diffusion for Materials Design

Lessons From Our First AI Dev Conference: How our learners began building their way to AI in everything at AI Dev 25 Tags: Letters, DeepLearning.AI News, Learning & Education

Designer Materials: MatterGen, a diffusion model that designs new materials with specified properties

Some AI-Generated Works Are Copyrightable: U.S. Copyright Office says that no new laws are needed for AI-generated works

Science Research Proposals Made to Order: AI Co-Scientist, an agent that generates research hypotheses, aiding drug discovery

Equally Fluent in Many Languages: Cohere’s Aya Vision beats multilingual rivals in text & image understanding

ERNIE checks competitors with low prices: AI2’s OLMo2 32B may be the top fully open model

AI giants’ U.S. policy proposals: Gemma 3 beats bigger open weight rivals

Learn the Language of Software: AI won’t kill programming. There has never been a better time to start coding.

DeepSeek-R1 Uncensored, QwQ-32B Puts Reasoning in Smaller Model, Phi-4-multimodal Takes Spoken Input, Training AI May Not Be Fair Use

DeepSeek-R1 Uncensored: Perplexity launches uncensored version of DeepSeek-R1

Judge Upholds Copyright in AI Training Case: U.S. court rejects fair use defense in Thomson Reuters AI lawsuit

Microsoft Tackles Voice-In, Text-Out: Microsoft’s Phi-4 Multimodal model can process text, images, and speech simultaneously

Compact Reasoning: QwQ-32B challenges DeepSeek-R1 and other larger reasoning models

EAGLE-3 speeds up language models: And the 2024 Turing Award goes to…

Qwen’s mid-sized reasoning model scores big: Sesame moves through speech models’ “uncanny valley”

Wait Your Turn! Conversation by Voice Versus Text: Text interactions require taking turns, but voices may interrupt or overlap. Here’s how AI is evolving for voice interactions.

GPT-4.5 Goes Big, Claude 3.7 Reasons, Alexa+ Goes Agentic, Generating Text Like an Image

Amazon’s Next-Gen Voice Assistant: Alexa+ adds generative AI and agents, using Claude and other models

Budget for Reasoning to the Token: Claude 3.7 Sonnet adds extended thinking mode