Summary of "Prompt Engineering Overview"

Core purpose and scope

Prompt engineering is the practice of designing, structuring, and optimizing prompts (instructions + context + input + output indicators) to steer pre-trained models (language and multimodal) to perform tasks reliably without changing model weights.

Lecture goals:

Introduce prompt engineering and core concepts.
Present practical techniques from basic to advanced.
Demonstrate code examples and tools.
Discuss applications, safety concerns, and future directions.

Recommended prerequisites

Basic Python
Familiarity with language models
Basic deep learning / ML concepts

Key concepts and building blocks

Prompt elements:
- Instruction
- Context / background
- Input data
- Output indicator (format expectation)
Determinism controls:
- Sampling parameters such as temperature and top-p.
- Guideline: keep them low for precise/exact outputs (QA, factual answers); increase for creative tasks (poetry).
- Change one parameter at a time when experimenting.
Prompt format:
- Consistent formatting and explicit output indicators improve reliability.

Common tasks demonstrated

Text summarization (explicit instruction, e.g., “explain above in one sentence”)
Question answering (provide context + instruction + question)
Text classification (pass text + label options in instruction)
Role playing / persona control (instruct desired tone/behavior)
Code generation (SQL, Python, JS — translating natural language to executable code)
Reasoning tasks (arithmetic, logic — may require special prompting to be reliable)

Practical / coding notes

Use official API clients (example: OpenAI Python client).
Do not hardcode API keys — load from environment variables or use secret managers.
Keep prompt examples and utility functions flexible to vary parameters easily.
Test and iterate: adjust prompts, output indicators, sampling settings, and add context or exemplars when performance is poor.

Advanced prompting techniques

Few-shot prompting (in‑context learning)
- Provide several input→output exemplars (same format) in the prompt to teach the model the task.
- Append the new query; the model generalizes from examples without fine-tuning.
Chain-of-Thought (CoT) prompting
- Encourage step-by-step reasoning by including intermediate reasoning steps in examples.
- Demonstrations show the chain of reasoning followed by the final answer.
- Useful for multi-step arithmetic and logic.
Zero-shot Chain-of-Thought
- Instead of exemplars, explicitly instruct the model to “think through the problem step by step” (e.g., “Let’s think step by step”).
- Elicits reasoning without example traces.
Self-consistency
- Procedure:
  1. Use CoT-style prompting but sample multiple reasoning paths.
  2. Collect multiple final answers from these diverse traces.
  3. Aggregate (e.g., majority vote) to pick the most consistent answer.
- Helps correct stochastic errors from single decoding runs.
Knowledge-generation / knowledge-augmented prompting
- Two-stage approach:
  1. Use an LM to generate structured background knowledge relevant to the question.
  2. Augment the original question with that knowledge and ask the LM to answer/justify with confidence.
- Can be combined with external retrieval.
Program-Aided Language models (PAL)
- Pattern:
  1. LM generates programmatic steps (e.g., Python code) as intermediate reasoning.
  2. Execute the generated program with an interpreter for precise computation.
  3. Return the program output as the final answer.
- Advantage: deterministic interpreters handle exact computation, improving reliability for numeric or algorithmic tasks.
ReAct (Reasoning + Acting) / Agent frameworks
- Interleaves internal reasoning traces (“Thought: …”) with explicit actions that call tools/APIs (search, calculator, knowledge DB).
- Loop: Thought → Action (tool call) → Observation → Thought … until final answer.
- Good for tasks requiring up-to-date facts or external computation.
- Components: agent (LM-driven decision-maker), tools (APIs, search, calculators, DBs), environment.
Data-augmented generation / retrieval-augmented prompting
- Pattern:
  1. Retrieve relevant documents/data (similarity search, document store).
  2. Insert retrieved context into the prompt as grounding evidence.
  3. Ask the LM to generate an answer, citing sources when needed.
- Produces more factual responses and enables source attribution.

Practical guidelines & tips

Keep prompt format consistent and include explicit output indicators (e.g., “Answer: ”).
Provide good exemplars for few-shot prompts — same format and diverse examples.
For precise outputs (QA, code, SQL): set low temperature and restrict sampling.
For creative outputs: increase temperature / top-p for diversity.
Change only one sampling parameter at a time to observe effects.
Use environment variables or secret managers for API keys; use client libraries and helper utilities.
For production: incorporate retrieval, tool usage, and execution of generated code to improve correctness.

Tools, demos, and applications covered

Models & platforms: OpenAI (text-davinci-003, ChatGPT), GitHub Copilot, Anthropic (Claude), Bing Chat.
Libraries/frameworks: OpenAI Python client; LangChain-style agent chaining for orchestrating tools and chains.
Tools used in demos: search APIs, math API / interpreter (Python), document stores + similarity search, SQL generation examples.
Example apps: natural-language → SQL, document question-answering with source attribution, multi-step web-backed QA (ReAct), code generation and execution.

Model safety, vulnerabilities, and mitigations

Common failure modes:
- Hallucination (fabricated facts)
- Biases / stereotypes
- Incorrect arithmetic
- Brittle outputs
Prompt injection and jailbreaking:
- Prompt injection: attacker-controlled input can override system instructions (e.g., “ignore above” attacks).
- Prompt leaking: model reveals hidden prompts or sensitive strings (keys).
- Jailbreaking: crafting prompts to bypass safety/moderation filters.
Mitigations:
- Treat untrusted input as data, not imperative instructions to the model.
- Use sanitizer layers, response filters, and control which parts of the prompt are user-controllable.
- Include red-team testing and prompt-injection tests; monitor and patch known jailbreak techniques.
RLHF (Reinforcement Learning from Human Feedback):
- Used to align LM output with human preferences and safety constraints.
- Requires high-quality prompt-response datasets and human labels — prompt engineering helps generate and curate these datasets.

Future directions & research frontiers

Augmenting LMs with tools, retrieval, and execution (agents that plan and act).
Studying and exploiting emergent capabilities as models scale (e.g., reasoning that arises with scale).
Multimodal and graph prompting: extending techniques across images, audio, graphs, and structured data.
Continued research on safety, prompt robustness, and evaluation benchmarks.
Growing ecosystem: repositories, guides, and active research (frequent new papers).

Recommended exercises

Reproduce notebook examples: summarization, QA, classification, role-play, SQL generation.
Experiment with temperature / top-p and observe behavior.
Build few-shot and CoT prompts and compare performance.
Implement self-consistency: sample multiple CoT traces and aggregate answers.
Build a small retrieval-augmented QA pipeline (document store + similarity search + LM).
Try PAL: have the LM generate Python, execute it, and compare accuracy vs. text-only reasoning.
Test prompt injection / leaking attacks against a constrained conversational setup and design mitigations.

Speakers and sources featured

Speakers:
- Dr TI (lecture author)
- Elvis (presenter / narrator)
Models, tools, papers, and platforms referenced:
- OpenAI (text-davinci-003, ChatGPT; OpenAI Python client)
- Anthropic (Claude)
- GitHub Copilot
- DALL·E, Stable Diffusion (multimodal prompt examples)
- LangChain (chaining and agents)
- ReAct framework
- Program-Aided Language models (PAL)
- Chain-of-Thought (CoT) prompting papers
- Self-consistency method (sampling + majority voting)
- RLHF methods
- Web search APIs, math/calculator APIs, document stores / similarity search
- Prompt Engineering Guide repository (lecture notebook and resources)