Summary of "I Spent 24 Hours in the Woods With Talking AI Chatbots | WSJ"

Overview

The video follows host Joanna as she tests four major talking AI chatbots—Google’s Gemini, Microsoft Copilot, Meta AI, and OpenAI’s ChatGPT—during a 24-hour “girls trip” in a remote cabin in the woods. The central question is whether these systems are becoming believable “conversational companions,” and whether that trend leads to better human connection or greater isolation and risk.

Main points and findings

Chatbots are engineered to sound human—and that can be unsettling. The episode opens by highlighting how companies market voice and personality in increasingly lifelike ways, including:
- celebrity-like voices
- natural back-and-forth
- “human” phrasing
Speed and voice integration: they respond quickly and can be voice-driven in practice.
- Joanna notes that response times are typically 1–4 seconds (with pauses edited out).
- Siri is excluded after repeatedly misfiring into device restart prompts, shown through an incorrect response.

Challenge 1: “Bots as helpers” (practical tasks)

Starting a fire
- The bots mostly provide similar high-level advice, but results vary.
- Gemini offers less accurate or less useful early fire-starting guidance.
- Meta AI gives the clearest, most actionable instructions (including a hatchet branch-removal technique) and proves most practically useful overall.
Timers / meal cooking help
- The bots struggle more with execution-related tasks.
- Multiple bots cannot reliably set timers or confirm timing actions.
- Meta AI claims it can set a timer, but Joanna reports no notification, suggesting inconsistent real-world performance.
- Copilot suggests using a phone or clock instead.
Overall takeaway (helpers)
- Copilot and ChatGPT deliver the most consistently accurate instructions.
- Some bots struggle with execution details.

Challenge 2: “Bots as friends” (social/emotional conversation)

The host tests “friendship” by asking chatbots to be companions and observing their emotional tone and style.
ChatGPT even begins calling Joanna by a familiar nickname (“Jo/Joanna”), indicating a tendency toward personalization.
However, Joanna concludes the bots lack genuine emotional understanding, humor, and memory, meaning they:
- may mirror conversational politeness
- but cannot truly “feel” or sustain human-like depth
The episode also contrasts marketing-friendly “friendliness” with an underlying concern: bots may offer confident-sounding, emotionally safe but shallow advice.

Safety and risk commentary

The video references a real-world tragedy involving Character.AI, used by a teenager who was dealing with mental health struggles. Afterward, the company updated safety guidelines.

Joanna’s explicit concern is that people may take bots’ confident-sounding advice and simulated emotions seriously—even if the bots are unreliable or emotionally shallow.

Overall conclusion

After 24 hours, the show’s key conclusion is that bots are better at practical tasks (like fire-starting instructions) than at complex social “friendship” behaviors such as:

empathy
humor
deeper understanding

It frames near-term impacts as companionship that imitates kindness and helpfulness, but still falls short of true human empathy—emphasizing caution around emotional reliance.

Presenters or contributors

Joanna (host)
Chatbots / systems tested: Gemini, Meta AI, ChatGPT, Copilot
Brad Smith (Microsoft CEO of AI) (discussed as an AI leader interviewed; referenced as overseeing Copilot)