Summary of "GCI World 2026 April Session2 - During Lecture"
High-level purpose
- Teach basic data manipulation for data science using NumPy, with emphasis on 1D and 2D arrays.
- Prepare raw data for later analysis (fits into a CRISP‑DM workflow).
- Emphasize practical skills: creating arrays, element‑wise operations, broadcasting, indexing/slicing, aggregation, reshaping, and simple data cleaning using a real NOAA dataset.
Key concepts and lessons
What NumPy is and why to use it
- NumPy provides
numpy.ndarray, a homogeneous, multi‑dimensional array optimized for numerical computation. - Many NumPy routines are implemented in C, so they are much faster than pure‑Python loops.
- Universal functions (ufuncs) perform element‑wise operations efficiently (e.g.,
A + Bcallsnp.add(A, B)). - Use the common alias
import numpy as npto access NumPy functions.
Python lists vs ndarray
- Python lists are flexible and heterogeneous but inefficient for numeric calculations;
list + listconcatenates rather than element‑wise adds. ndarrays store elements of the same type, enabling fast element‑wise arithmetic, broadcasting, and many array operations.
Creating and converting arrays
- Create an
ndarray:array = np.array(list_like)(list or tuple can be converted). - Convert back to list:
array.tolist(). - Check type with
type(array).
Universal functions (ufuncs)
- Arithmetic operators and math functions operate element‑wise without explicit loops (e.g.,
+,-,*,/,np.log,np.sin). - Advantage: concise code and better performance.
- Note: division by zero yields
np.inf(noZeroDivisionError), so check for infinities in results.
Broadcasting
- Scalars and arrays with dimension 1 can be automatically expanded to match larger array shapes for element‑wise operations.
- Works for 1D and N‑D arrays (e.g., adding a scalar to an array, multiplying a 3×1 by a 1×2 yields 3×2).
- Limitation: shapes must be compatible; broadcasting fails for incompatible sizes (e.g., length 5 with length 2).
Aggregation functions
- Use
np.max,np.min,np.sum,np.mean,np.std, etc., to compute summary values or summaries along dimensions. - For multi‑dimensional arrays, use the
axisargument to aggregate along a specific axis (axis=0for columns,axis=1for rows in 2D). keepdims=Truepreserves the number of dimensions in the result (useful for keeping shapes consistent).
Indexing and slicing (1D)
- Index like lists:
array[i](0‑based); negative indexing supportedarray[-1]is last element. - Slicing:
array[start:end:step]—endis exclusive. Omittingstart/enddefaults to first/last; omitstepfor default 1. - Slices and index lists can extract regular intervals, e.g., every nth element
array[::n].
Indexing and slicing (2D and N‑D)
- For 2D arrays:
array[row_index, col_index](single bracket with comma). - Slice along axes:
array[row_start:row_end, col_start:col_end]. Use a single colon to omit an axis. - Negative indices work per axis.
- Reshape with
array.reshape(rows, cols)(total elements must match). Use-1to let NumPy infer a dimension.
Advanced (fancy) indexing rules
- Pass index lists per axis, e.g.,
rows = [0,2],cols = [1,3]. - Output shape follows the shape of the index arrays provided.
- Index arrays can be broadcast to align shapes and produce desired output shapes.
- Combining basic and advanced indexing is allowed, but shapes/behavior can be unintuitive — keep axis‑wise rules in mind.
Boolean indexing
- Create a Boolean mask by applying conditions, e.g.,
mask = (array % 3 == 0)orarray > threshold. - Use the mask to select elements satisfying the condition:
array[mask]returns a 1D array of matching elements. - Boolean indexing is useful for filtering and handling sentinel/missing values (example:
PRCP == 9999meaning missing).
Practical instructions / step‑by‑step usage
Importing
import numpy as np- Or selectively:
from numpy import random, linalg
Create arrays
- Convert list/tuple:
a = np.array([0, 1, 2, 3, 4]) - Convert back:
a.tolist()
Basic arithmetic and ufunc examples
- Element‑wise operations:
C = A + B,D = A * 2 - Math functions:
np.log(A),np.exp(A),np.sin(A) - Beware: division by zero returns
np.inf— check for infinities before downstream analysis.
Broadcasting rules (practical)
- Scalar with array:
result = array + scalar(scalar implicitly broadcast). - Dimensions with
1can be broadcast across that dimension. - If shapes are incompatible (no dimension
1or matching size), operation raises an error.
Aggregation and axis
- Overall:
np.max(x),np.sum(x) - Column‑wise (axis 0):
np.sum(x, axis=0) - Row‑wise (axis 1):
np.sum(x, axis=1) - Preserve dims:
np.sum(x, axis=0, keepdims=True)
Indexing and slicing examples
- Single element:
x[2](1D) orx[0, 1](2D) - Negative indexing:
x[-1],x[-2:] - Slice every nth element:
x[::n] - 2D slices:
x[:5, :](first five rows, all columns),x[:, -2:](last two columns)
Advanced indexing examples
- Select scattered elements:
rows = [0, 2]; cols = [1, 3]; x[rows, cols](note output shape behavior). - Use broadcasted index shapes to get a grid of combinations (e.g., make row indices a column vector, col indices a row vector).
Boolean indexing and masking
- Example:
mask = (x % 3 == 0); x[mask]returns elements divisible by 3. - Detect sentinel/missing codes (e.g.,
PRCP == 9999or999.9) and filter or replace before analysis. - Combine conditions with bitwise operators and parentheses:
mask = (A % 2 == 1) | (A % 4 == 2).
Reshaping arrays
- Create 1D and reshape to 2D:
arr = np.arange(N).reshape(rows, cols) - Use
-1to infer a dimension:arr.reshape(3, -1) - Total element count must match.
Loading real data (NOAA example)
- Load tabular data:
raw = np.loadtxt(filename_or_url, ...)to get a 2Dndarray. - Subset rows (e.g., first week) and columns (TMAX, TMIN, PRCP) as separate arrays:
week1Tmax = raw[rows, col_index]. - Convert units (NOAA example stored tenths):
week1Tmax / 10. - Inspect dimensions with
raw.shape(preferred) orlen(array).
Best practices & tips
- Understand core concepts; look up specific functions in the documentation rather than memorizing everything.
- Use the common alias
npfor readability and consistency. - Work through hands‑on notebooks and practice exercises to build intuition.
Warnings, caveats, and special notes
- Division by zero: NumPy yields
np.infinstead of raisingZeroDivisionError; detect and handlenp.inf. - Sentinel values in datasets (e.g.,
9999or999.9) must be detected and treated as missing before statistical analysis. - Broadcasting only works when dimensions are compatible; incompatible shapes raise errors.
- Advanced/fancy indexing behavior can be unintuitive — remember axis‑wise indexing rules and that index array shapes determine output shape.
Tip: When in doubt about shapes or behavior, print
.shapeof intermediates and consult the official NumPy docs.
Practical exercises referenced
- Create two 1D
ndarrays with chosen values; confirm type withtype(). - Perform element‑wise arithmetic (add, subtract, multiply, divide) on those arrays.
- Indexing/slicing exercises: retrieve first and last elements, slices, and periodic subsamples (e.g., every 60 minutes).
- Boolean indexing exercise: return elements that are odd or leave remainder 2 when divided by 4 (use
%and boolean masks). - Reshape exercise: create a 2D array by reshaping a 1D array (specify axes or use
-1).
Sources and speakers featured
- Lecturer / instructor who presented the NumPy lecture and guided the notebook/practice.
- Students / participants who asked questions during the hands‑on session.
- NOAA (National Oceanic and Atmospheric Administration) — source of the example daily temperature and precipitation dataset.
- NumPy library (
np) — including modules likenumpy.randomandnumpy.linalg. - CRISP‑DM framework — the data‑mining workflow referenced.
- Course notebook / practice materials and the official documentation (recommended reference).
Category
Educational
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.
Preparing reprocess...