Video summary

I Let AI Analyze 5 Years of My Journals… Here's what it found

Main summary

Key takeaways

Technology

Goal / Concept

The creator wants to understand themselves better by analyzing 5 years of personal journals using an AI workflow, rather than relying on memory (which they describe as biased).

The proposed approach:

  • Convert handwritten journal pages into machine-readable text
  • Run an LLM Q&A workflow over the extracted content to ask reflective questions

Input Formats

  • Regular handwritten journal entries (not daily; cadence varies weekly/monthly)
  • Sketch journaling (drawings plus short text)

Vision-Language Model (VLM) Approach

What they plan to do with VLMs

Use a vision-language model to:

  • Read handwritten text (transcribe it)
  • Interpret drawings and convert sketches/diagrams into textual descriptions

Why not “older OCR”

They contrast this with OCR, arguing:

  • OCR historically performs best on typed text
  • Handwriting generally requires modern deep learning–based vision models / VLMs rather than classic OCR

Privacy / Deployment Strategy

  • They avoid using hosted commercial APIs for private journal data (they mention providers like OpenAI and Anthropic but reject them for privacy reasons).
  • Instead, they run open-source models locally on a laptop so the journal content stays on-device.
  • They use Ollama to download and run models locally.
  • Model size is constrained by hardware; they note that smaller variants are needed to fit memory limits on an M1 MacBook Pro.

Implementation Pipeline

  1. Scan journal pages into images (total scan time: ~1 hour)
  2. Create a local Python project with a virtual environment
  3. Use a structured prompt to force consistent, parseable output:
    • The model must return exactly two sections:
      • “Transcription”
        • Copy every word exactly
        • Use “illegible” for unreadable words
      • “Description”
        • List drawings/sketches/diagrams
        • Use “none” if there are no drawings
  4. Save per-image results into extracted_content.json
  5. Concatenate extracted results into a single large context file for Q&A (e.g., journal context.txt)

Model Testing & Results (Key Analysis)

Attempt 1: LLaVA (large model)

  • Worked poorly due to resource constraints
  • The laptop crashed repeatedly

Attempt 2: LLaVA 53 (smaller/faster model)

  • Produced hallucinations
  • It could “make up” content, reducing trust (e.g., “Did I write that?”)

Attempt 3: Qwen 3 VL

  • Stable and accurate
  • Successfully transcribed and interpreted pages

Performance notes

  • Roughly 2–3 minutes per image, depending on text density
  • Handwriting quality declined over the years, but the chosen model still handled it reliably

Q&A Layer (LLM Over Extracted Journal Content)

  • They build a terminal app in ask_questions.py with a streaming UI
  • They use a local Llama 3.2 model (open-source from Meta) for Q&A

System prompt behavior

  • Answer using only the journal content
  • If the answer isn’t present, explicitly say so

Context handling

  • They increase the context window to reduce truncation
  • Truncation would risk incomplete answers

Examples of Output Themes

  • Motivations

    • Personal growth/self-improvement/learning
    • Creativity/artistic expression
    • Financial independence/freedom
    • Helping others and positive impact
    • Relationships and meaningful connections
  • Recurring struggles

    • Overthinking / analysis paralysis
    • Self-doubt / negative self-talk
    • Balancing work and personal life
  • Ikigai-style purpose exercise (ikigai = framework for purpose)

    • The model suggests directions such as:
      • Teaching/coaching/writing/content creation
      • Artistic sharing (potentially later in life)
      • AI/data science consulting

Overall Takeaway

The creator finds the results reassuring and “weirdly refreshing” because:

  • The model reflects patterns from the journal text itself
  • There’s no emotional attachment
  • It reduces the effects of memory editing or external opinions

They also provide the code in a GitHub repo (link mentioned in the video description).

Main Speakers / Sources

  • Main speaker: The video creator (first-person narrator), who scanned journals and built the scripts

Technical sources / models mentioned

  • Ollama (local model runner)
  • VLMs: LLaVA (including “LLaVA 53”), Qwen 3 VL
  • LLM for Q&A: Llama 3.2 (Meta; used locally)
  • Optional hosted providers mentioned: OpenAI, Anthropic

Original video