Exam 1 Review - Database Design
Database Design
- Overview
- Plain Text Data Formats
- Data Munging
- Spreadsheets
- SQL
- Normalization & Entity-Relationship Diagrams
- Conclusions
Overview
Format
The exam will be composed of two parts:
-
A Google Form, similar to a Quiz (30%)
-
A GitHub Classroom repository, similar to an assignment (70%)
Time
Start anytime and submit anytime within a 24 hour window.
Topics covered
The topics covered on the exam:
- Do you really need a slide about this?
Entering responses in GitHub repository
The GitHub repository asks you to enter your responses to specific questions into a README.md
file.
- A placeholder code block is given for each question where you must enter your responses.
- In the raw Markdown code, this code block looks like this:
- You must enter your response within the code block.
- Do not modify anything outside of the code blocks.
Plain Text Data Formats
Key points
Main topics:
-
common formats: CSV, JSON, XML, HTML, fixed-width column
-
structured data vs unstructured data
-
fixed schema vs loose schema
-
nesting values within values
Data Munging
Key points
Main topics:
-
common data problems
-
reading/writing text files in standard Python
-
character encoding
-
useful modules: csv, json, Beautiful Soup, pandas (it exists)
Spreadsheets
Key points
Main topics:
-
row/column structure
-
import/export CSV
-
formulas
-
simple aggregate statistics
-
more complex statistics with one or more criteria
-
advanced: pivot tables and charts
SQL
Key points
Main topics:
-
terminology: tables, records, fields
-
dot functions
-
primary key/foreign key
-
SQL CRUD tasks
-
simple aggregate statistics
-
more complex grouping, ordering
-
joins: inner and outer
Normalization & Entity-Relationship Diagrams
Key points
Main topics:
-
the purpose of normalization and E-R diagramming
-
first through fourth normal forms
-
symbols in E-R Diagrams
-
cardinality
Conclusions
Thank you. Good luck.