Summary of "#6 Machine Learning Specialization [Course 1, Week 1, Lesson 2]"

Summary — main ideas and lessons

Unsupervised learning: a branch of machine learning where the algorithm is given data without output labels (no “right answer” y for each example) and must discover structure or patterns in the unlabeled data.

Definition and contrast with supervised learning

Unsupervised learning works with inputs only (no input–label pairs).
Supervised learning trains on input–label pairs (for example, patient data labeled benign vs. malignant).
In unsupervised learning the goal is exploratory: find structure, patterns, or groupings in the data rather than predict a provided label.

Core concept: clustering as an example

Clustering algorithms automatically group unlabeled examples into clusters of similar items.
The algorithm determines which features or signals indicate similarity without being told what to look for in advance.
Clusters can correspond to meaningful categories (for example, patient subtypes, news topics, or customer segments).

Illustrated examples and applications

Medical patient data: given features such as tumor size and patient age but no benign/malignant labels, clustering can reveal groups of patients with similar profiles.
Google News: clustering groups related news articles automatically each day by finding co-occurring words (e.g., “panda,” “twin,” “zoo”) and grouping those articles without human-curated rules.
DNA microarray / genomics: with columns as individuals and rows as gene-expression measurements, clustering can group people into biological subtypes (e.g., type 1, type 2, type 3) based on expression patterns.
Market segmentation / customer databases: companies cluster customers into market segments to serve different groups more effectively. Example: DeepLearning.AI found clusters of learners motivated by (a) skill growth/knowledge, (b) career development, or (c) staying updated on AI in their field.

Key takeaways

Unsupervised learning is useful when labels are unavailable or impractical to obtain; its purpose is exploratory discovery.
Clustering is a common unsupervised technique with wide real-world uses (news grouping, bioinformatics, customer segmentation).
The algorithm must infer which features matter for similarity each time (e.g., news topics change daily, so clustering must adapt without human supervision).
There are other types of unsupervised learning beyond clustering (to be covered in subsequent lessons).

Practical, implicit procedure for applying a clustering-style method

Start with unlabeled data (examples with features but no y labels).
Represent each example by appropriate features (e.g., patient measurements, word-occurrence vectors for articles, gene-expression columns for people).
Run a clustering algorithm that groups examples by similarity in feature space.
Inspect and interpret clusters to determine whether they correspond to useful categories (topics, patient subtypes, market segments).
Use identified clusters for downstream tasks (grouping news stories, targeted outreach, biological discovery).