Summary of "Week 3: Storage Services in AWS and Azure"

Week 3 — Storage Services in AWS and Azure

Main ideas and concepts

Analogy: S3/Blob storage behaves like disk/folders on your laptop — there is no fixed schema like SQL tables. For table-like operations you bring data into analytics or database tools.


AWS-specific concepts and linked services

S3 basics

Linked services and how they interact with S3

Typical AWS workflow (lab)

  1. Create an S3 bucket (unique name, choose region, configure access).
  2. Upload CSVs or other files.
  3. Use Athena / S3 Select / Glue (and optionally EMR or Redshift) to query and process data.
  4. Perform object operations (upload, download, delete) and use SQL-like queries to simulate CRUD.

Azure-specific concepts and linked services

Storage account and Blob storage

Linked services and how they interact with Blob storage

Typical Azure workflow (lab)

  1. Create a Storage Account → create a Blob container → upload CSV/dataset.
  2. Load data into Azure SQL, Synapse, or Databricks depending on processing needs (relational SQL vs. big-data Spark).
  3. Run SQL or SparkSQL queries and perform CRUD operations; perform object operations (upload/download/delete).

Practical lab instructions / methodology

  1. Preparation

    • Create or use cloud accounts for AWS and/or Azure.
    • Obtain or create sample data (CSV files are preferred for SQL-like work). You can download dummy CSVs or create one locally.
  2. AWS lab steps

    • Create an S3 bucket:
      • Choose a globally unique bucket name.
      • Select region and configure access controls (public/private, IAM), encryption, and versioning if needed.
    • Upload files (CSV, images, videos, code, etc.).
    • Explore querying options:
      • Use Amazon Athena to define a schema on CSVs and run SQL-like queries.
      • Use AWS Glue crawler to infer schema and create a Data Catalog table.
      • Use S3 Select to query individual objects and extract subsets of data.
    • Optional: use Redshift, EMR, and QuickSight for warehousing, big-data processing, and visualization.
    • Perform object operations: upload, download, delete; use SQL queries to simulate CRUD.
  3. Azure lab steps

    • Create a Storage Account.
    • Create a Blob container and upload CSVs/datasets.
    • Depending on goals:
      • Use Azure SQL Database to import CSVs and run SQL (select, insert, update, delete).
      • Use Azure Synapse Analytics for large-scale ingestion and SQL-like analytics.
      • Use Azure Databricks to run SparkSQL for large-scale processing or transformations.
    • Perform object operations (upload/download/delete) and run SQL/SparkSQL queries.
  4. Tasks to perform and verify

    • Upload different file types and sizes to confirm object storage behavior.
    • Create tables or catalog entries from CSVs (Glue for AWS; import into Azure SQL or Synapse).
    • Run SQL or SparkSQL queries that perform select, insert, update, delete (or equivalents).
    • Download objects to verify retrieval; delete objects to verify removal and access controls.
  5. Notes, tips, and caveats

    • Regions and availability: not all services/features are available in every region — choose regions carefully.
    • Naming and access: follow bucket/container naming best practices and manage access with IAM/Azure RBAC.
    • For large datasets, prefer big-data tools (EMR, Databricks, Synapse) rather than small SQL engines.
    • Monitor costs (storage, queries, data transfer) when using cloud resources.
    • Document the services you explore and their behaviors (permissions, performance, region limits).
    • Practice end-to-end flows: upload CSV → register schema/catalog → query and modify data → visualize or export results.

Learning objectives and takeaways


Speakers / Sources featured

Category ?

Educational


Share this summary


Is the summary off?

If you think the summary is inaccurate, you can reprocess it with the latest model.

Video