knowledge-kitchen

SQL - Joins (in SQLite)

Database Design

  1. Overview
  2. Example Data Set
  3. Primary Keys and Foreign Keys
  4. Inner Joins
  5. Left Joins
  6. Conclusions

Overview

Data Set

Raw data

Everyone’s loves employment. Government jobs are rumoured to be stable, non-demanding, and flush with benefits.

Let’s take a look at New York City job openings.

Data munging

We will explore job postings and salary ranges. As usual, the data set needs a bit of munging before we can import it into a database and do our analysis.

Create tables

Start by setting up the two tables within SQLite to match the fields in the CSVs. First for jobs:

DROP TABLE IF EXISTS jobs;
CREATE TABLE jobs (
    id INTEGER PRIMARY KEY,
    job_id INTEGER NOT NULL,
    agency_id INTEGER NOT NULL,
    title TEXT NOT NULL,
    salary_range_from INTEGER NOT NULL,
    salary_range_to INTEGER NOT NULL
);

… and then for agencies:

DROP TABLE IF EXISTS agencies;
CREATE TABLE agencies (
	 id INTEGER PRIMARY KEY,
	 title TEXT NOT NULL
);

Import data

Import the data from each CSV file into its corresponding table.

.import /Users/foobar/some_directory/jobs_data.csv jobs
.import /Users/foobar/some_directory/agencies_data.csv agencies

Check the data

Before going any further, check that the data has imported successfully into SQLite.

Inspect the jobs tables:

.headers on
select * from jobs;

The output should appear something like:

id job_id agency_id title salary_range_from salary_range_to
0 454437 0 Executive Front Office Executive Operations Analyst 46856 62480
1 386903 1 Public Health Nurse, Bureau of School Health 40.79 40.79
2 424454 2 Desktop Support Technician 60550 69632
3 373602 3 HR PAYROLL ANALYST 75000 115000

Check the data (continued)

Inspect the agencies tables:

select * from agencies;

The output should appear something like:

id title
0 OFFICE OF MANAGEMENT & BUDGET
1 DEPT OF HEALTH/MENTAL HYGIENE
2 DEPARTMENT OF CORRECTION
3 FINANCIAL INFO SVCS AGENCY
4 TAXI & LIMOUSINE COMMISSION
5 LAW DEPARTMENT

Primary Keys and Foreign Keys

Primary keys

Every table has a primary key - a field (or fields) whose value act as a unique identifier for each record.

Primary key terminology

There are a few terms to describe the different types of unique values that can be chosen as primary key:

Relationships among multiple tables

In this example, as is often the case, we have multiple tables: jobs and agencies. And there is a relationship between the them.

Foreign keys

In order to perform a join to merge data from multiple tables, we must somehow inform the database how to match a record from one table with the related record in the other table.

Inner Joins

Merging records from two tables

There are several ways to merge - or join - records from one table to related records from another table. An inner join is the most common and perhaps simplest technique.

select jobs.title, agencies.title from jobs inner join agencies on jobs.agency_id=agencies.id;

Results

The results of the inner join should look like a merged table including the fields we requested from both tables.

jobs.title agencies.title
Executive Front Office Executive Operations Analyst OFFICE OF MANAGEMENT & BUDGET
Public Health Nurse, Bureau of School Health DEPT OF HEALTH/MENTAL HYGIENE
Desktop Support Technician DEPARTMENT OF CORRECTION
HR PAYROLL ANALYST FINANCIAL INFO SVCS AGENCY
Administrative Assistant to the Assistant Commissioner of Safety & Emissions TAXI & LIMOUSINE COMMISSION

Referential integrity

An inner join maintains referential integrity. In practical terms, this means that if a particular job record refers to an agency that does not exist in the agencies table, then that job will not be included in our merged results.

select jobs.title, agencies.title from jobs inner join agencies on jobs.agency_id=agencies.id where jobs.id=1;
update jobs set agency_id=999 where id=1;
select jobs.title, agencies.title from jobs inner join agencies on jobs.agency_id=agencies.id where jobs.id=1;

Left Joins

Concept

Left joins are similar to inner joins - they merge two related records from two different tables. However, whereas inner joins maintain referential integrity and do not show records with invalid relationships, left joins have no such integrity.

select jobs.id, jobs.title, agencies.title from jobs left join agencies on jobs.agency_id=agencies.id;
select jobs.id, jobs.title, agencies.title from jobs left join agencies on jobs.agency_id=agencies.id where jobs.id=1;

Checking for nothingness

This feature of left joins - that they do not maintain referential integrity - allows us to do something we cannot do with inner joins: we can check for non-existent relationships.

select jobs.id, jobs.title, agencies.title from jobs left join agencies on jobs.agency_id=agencies.id WHERE agencies.id IS NULL;

Conclusions

Thank you. Bye.