knowledge-kitchen

MongoDB - Aggregation Pipelines

Database Design

  1. Overview
  2. Example Data Set
  3. Basics
  4. Aggregation
  5. Conclusions

Overview

Concept

Lorem ipsum.

Example Data Set

Save the data file

This set of slides use data about NYC city government jobs, sourced from the official government jobs site. To follow along, export the data into a CSV file named NYC_Jobs.csv.

Job ID,Agency,Posting Type,# Of Positions,Business Title,Civil Service Title,Title Classification,Title Code No,Level,Job Category,Full-Time/Part-Time indicator,Career Level,Salary Range From,Salary Range To,Salary Frequency,Work Location,Division/Work Unit,Job Description,Minimum Qual Requirements,Preferred Skills,Additional Information,To Apply,Hours/Shift,Work Location 1,Recruitment Contact,Residency Requirement,Posting Date,Post Until,Posting Updated,Process Date
415211,DEPARTMENT OF CORRECTION,External,1,Data Analyst,CITY RESEARCH SCIENTIST,Non-Competitive-5,21744,02,"Technology, Data & Innovation Policy, Research & Analysis",F,Experienced (non-manager),73305,84301,Annual,75-20 Astoria Blvd,Information Systems-Admin,"The New York City Department of Correction (DOC) is an integral part of the City’s evolving criminal justice system, participating in reform initiatives and strategies aimed to move the City towards the smallest jail system possible without compromising public safety. The DOC is solely responsible for maintaining a safe and secure environment for staff, visitors, volunteers and people in custody inside our jails. Importantly, safe jails enable DOC to providing people in custody with the tools and opportunities they need to successfully re-enter their communities. The DOC operates facilities and court commands across the five boroughs with over 12,000 diverse professionals and knowledge experts.  The DOC seeks to recruit a Data Analyst, to assist with all of the aspects of the Data Analytics and Research Unit. Under the direct supervision of the Director of Data Analytics and Research Unit; the selected candidate will be responsible for supporting the analytic and research efforts of the department to further policy, planning and strategic initiatives, including but not limited to:  •	Supporting quantitative analytic and research projects related to program and policy evaluation;  •	Providing technical assistance to units throughout the Department in analysis and reporting;  •	Analyzing databases to assess accessibility and accuracy and identifying    opportunities for new data sources;  •	Performing and automating ad hoc requests and recurring reports;  •	Identifying areas of concern and developing strategies to correct deficiencies or problems;  •	Developing tools and methods to support reproducible and automated reports and analyses.","1.  For Assignment Level I (only physical, biological and environmental sciences and public health) A master's degree from an accredited college or university with a specialization in an appropriate field of physical, biological or environmental science or in public health.  To be appointed to Assignment Level II and above, candidates must have:  1. A doctorate degree from an accredited college or university with specialization in an appropriate field of physical, biological, environmental or social science and one year of full-time experience in a responsible supervisory, administrative or research capacity in the appropriate field of specialization; or  2. A master's degree from an accredited college or university with specialization in an appropriate field of physical, biological, environmental or social science and three years of responsible full-time research experience in the appropriate field of specialization; or  3. Education and/or experience which is equivalent to 1 or 2 above. However, all candidates must have at least a master's degree in an appropriate field of specialization and at least two years of experience described in 2 above. Two years as a City Research Scientist Level I can be substituted for the experience required in 1 and 2 above.    NOTE:  Probationary Period  Appointments to this position are subject to a minimum probationary period of one year.","•	Applied knowledge of R, Python, and related applications/languages; familiarity with SPSS, Stata  •	Practical experience with Tableau and SQL (applications and scripting)  •	Proficiency Microsoft Office Suite (PowerPoint, Word, Excel, Outlook, Access etc.)  •	Knowledge of research concepts, processes, and applied statistics  •	Research, reporting, and analytic experience within NYC government setting  •	Excellent writing, communication, interpersonal, research, problem-solving, and organizational skills  •	Demonstrated ability to initiate and manage complex and interdisciplinary projects; think creatively,    embrace new approaches; prepare and deliver informative and well-organized presentations  •	Experience in a high paced environment, with the ability to manage information and distribute    appropriately.",,For City employees: Go to Employee Self-Service (ESS) - www.nyc.gov/ess and search for Job ID# 415211  For all other applicants: Go to https://a127-jobs.nyc.gov and search for Job ID# 415211  Submission of a resume is not a guarantee that you will receive an interview.,,,,"New York City residency is generally required within 90 days of appointment. However, City Employees in certain titles who have worked for the City for 2 continuous years may also be eligible to reside in Nassau, Suffolk, Putnam, Westchester, Rockland, or Orange County. To determine if the residency requirement applies to you, please discuss with the agency representative at the time of interview.",10/02/2019,,02/07/2020,03/23/2021
435648,TAXI & LIMOUSINE COMMISSION,Internal,1,Programs Data Manager,COMMUNITY COORDINATOR,Non-Competitive-5,56058,00,"Policy, Research & Analysis",F,Experienced (non-manager),54100,83981,Annual,"33 Beaver St, New York Ny",Programs & Operations,"The New York City Taxi and Limousine Commission (TLC) establishes and enforces professional and uniform standards of for-hire transportation service and ensures public safety. TLC licenses and regulates all aspects of New York City’s medallion (yellow) taxicabs, for-hire vehicles (Boro Taxis, community-based liveries, black cars, including app-based services, and luxury limousines), commuter vans, and paratransit vehicles. With over 115,000 licensed vehicles and approximately 170,000 drivers, TLC is the most active taxi and limousine licensing regulatory agency in the United States. To learn more about the TLC, please visit: www.nyc.gov/taxi.  Under the leadership of the Director of Programs, this position will manage quantitative, data projects for the agency’s programs. This position will interface across all internal divisions, including Management Information Systems (MIS), Legal, Fiscal/Accounting, Licensing and Policy, as well as with partner agencies, TLC licensees, and vendors. A successful candidate has a strategic lens and an ability to anticipate oncoming program and data challenges. The candidate must have a background in using large data sets to manage public funds. Responsibilities may include but are not limited to:   •	Project planning and implementation: develop work plans and timelines for data projects, identify milestones, complete tasks, and manage projects to completion, working closely with program managers and project analysts. •	New project development: work with the Director of Programs to develop a programs data agenda, identify new research topics, and explore opportunities for collaboration on data and reporting. •	Programs policy research: design and perform quantitative and qualitative research projects using program data, which may include financial modeling, industry trend analysis, literature review, and conducting outreach and focus groups with driver and owner communities. •	Programs policy development: participate in planning and problem-solving sessions, work with key staff and stakeholders to seek feedback on policy ideas, and develop requirements. •	Overall support: writing reports and policy memos, researching initiatives, preparing PowerPoint presentations, giving presentations to the Commission, licensees, and members of the general public, meeting planning and management, managing day-to-day relationships and communications with industry stakeholders, and taking an active role in supporting the activities of the agency. Respond to telephone and correspondence requests for information.","1. A baccalaureate degree from an accredited college and two years of experience in community work or community centered activities in an area related to the duties described above; or  2. High school graduation or equivalent and six years of experience in community work or community centered activities in an area related to the duties as described above; or  3. Education and/or experience which is equivalent to 1 or 2 above. However, all candidates must have at least one year of experience as described in 1 above.","Candidates should be highly motivated and able to work well independently and as part of a team. Interested candidates should have excellent written and verbal communication skills, effective problem-solving and analytic skills, and the ability to prioritize, manage time, and engage in multiple tasks in a fast-paced environment.   Ability to analyze and manipulate data is a must. Demonstrated experience analyzing and manipulating data using Access/SQL queries and R/Python models is strongly desired.  Although not required, the successful applicant will likely have one or more of the following academic or professional experiences: urban planning, transportation policy, public administration, public policy, business, economics, statistics, political science, environmental studies, or in a related degree program. B.A. or B.S. required. Project management experience (especially management of research projects) strongly desired.",,"Click, APPLY NOW Current city employees must apply via Employee Self-Service (ESS)",,"33 Beaver St, New York Ny",,"New York City residency is generally required within 90 days of appointment. However, City Employees in certain titles who have worked for the City for 2 continuous years may also be eligible to reside in Nassau, Suffolk, Putnam, Westchester, Rockland, or Orange County. To determine if the residency requirement applies to you, please discuss with the agency representative at the time of interview.",02/27/2020,,02/27/2020,03/23/2021
458782,CONSUMER AFFAIRS,Internal,1,Community Affairs Associate,COMMUNITY COORDINATOR,Non-Competitive-5,56058,00,Communications & Intergovernmental Affairs Legal Affairs,F,Experienced (non-manager),54100,62215,Annual,"42 Broadway, N.Y.",External  Affairs,"Job Description The NYC Department of Consumer Affairs (DCA) protects and enhances the daily economic lives of New Yorkers to create thriving communities. DCA licenses more than 81,000 businesses in more than 50 industries and enforces key consumer protection, licensing, and workplace laws that apply to countless more. By supporting businesses through equitable enforcement and access to resources and, by helping to resolve complaints, DCA protects the marketplace from predatory practices and strives to create a culture of compliance. Through its community outreach and the work of its offices of Financial Empowerment and Labor Policy & Standards, DCA empowers consumers and working families by providing the tools and resources they need to be educated consumers and to achieve financial health and work-life balance. DCA also conducts research and advocates for public policy that furthers its work to support New York City’s communities. For more information about DCA and its work, call 311 or visit DCA at nyc.gov/dca or on its social media sites, Twitter, Facebook, Instagram and YouTube. External Affairs (“EA”) is the intergovernmental division of the agency and custodian of DCA’s relationship with City Hall, elected officials, government entities, community and faith-based organizations, and stakeholders with broad policy and public affairs jurisdiction. EA is responsible for these relationships to ensure that the agency speaks with a singular and consistent voice to officials and entities that have touchpoints across DCA’s wide jurisdiction. EA is seeking an individual pursuing a career in public relations, government relations, community relations, or related fields to serve as a Community Affairs Associate. Under the supervision of the Director of Community Affairs the Community Affairs Associate will work with colleagues to advance the agency’s work in consumer and worker protection and education. All candidates should have a strong academic record, be highly organized with a keen attention to detail, possess the ability to manage multiple tasks and most importantly have a commitment to excellence. Responsibilities will include, but are not be limited to: • Develop and maintain external relationships with elected officials, government agencies, industry and community stakeholders, and members of the public; • Accurately and concisely represent the work and policies of the agency in an engaging manner; • Coordinate event logistics, locations, materials, staffing, and trainings; • Effectively link with other departmental divisions to ensure project success; • Work closely with colleagues to help inform the Director of Community Affairs and Executive Director for External Affairs in their formulation and implementation of outreach strategies; and • Provide administrative support and assist in special or ad-hoc projects as needed.","1. A baccalaureate degree from an accredited college and two years of experience in community work or community centered activities in an area related to the duties described above; or  2. High school graduation or equivalent and six years of experience in community work or community centered activities in an area related to the duties as described above; or  3. Education and/or experience which is equivalent to 1 or 2 above. However, all candidates must have at least one year of experience as described in 1 above.","Preferred Skills • Experience cultivating partnerships with local community stakeholders; • Ability to successfully manage multiple tasks simultaneously; • Attention to detail and ability to work quickly under pressure; • Excellent written and oral communication skills; • Proficiency in Microsoft Office, major internet search engines and databases; • Aptitude to quickly learn new computer programs and technology; • Ability to capture and analyze data; • Ability to understand laws and rules; • Multilingual capacity (Spanish or Mandarin) a plus; • Driver’s license a plus; • Flexibility in hours.",,"To Apply For Non-City/External Candidates: Visit the External Applicant NYC Careers site and type “Consumer Affairs” on the search line. Then locate the Job ID number. For Current City Employees: Visit Employee Self Service (ESS) to view and click on Recruiting Activities, Careers, and search by Job ID number. A RESUME AND COVER LETTER ARE REQUIRED. PLEASE INDICATE IN YOUR COVER LETTER HOW YOU HEARD ABOUT THIS POSITION. INCOMPLETE APPLICATIONS WILL NOT BE CONSIDERED.  NO PHONE CALLS, FAXES, E-MAILS OR PERSONAL INQUIRIES PERMITTED. NOTE: ONLY THOSE CANDIDATES UNDER CONSIDERATION WILL BE CONTACTED.  *Appointments are subject to Office of Management and Budget (OMB) approval.  *55-a Candidates: NYC Department of Consumer Affairs is committed to recruiting and retaining a diverse and culturally responsive workforce. We strongly encourage 55-A candidates to apply to our positions in order to be given the opportunity.  **LOAN FORGIVENESS: The federal government provides student loan forgiveness through its Public Service Loan Forgiveness Program (PSLF) to all qualifying public service employees. Working with DCA qualifies you as a public service employee and you may be able to take advantage of this program while working full-time and meeting the program’s other requirements. Please visit the Public Service Loan Forgiveness Program site to view the eligibility requirements: https://studentaid.ed.gov/sa/repay-loans/forgiveness-cancellation/public-service",,,,"New York City residency is generally required within 90 days of appointment. However, City Employees in certain titles who have worked for the City for 2 continuous years may also be eligible to reside in Nassau, Suffolk, Putnam, Westchester, Rockland, or Orange County. To determine if the residency requirement applies to you, please discuss with the agency representative at the time of interview.",02/25/2021,26-APR-2021,02/25/2021,03/23/2021
458064,NYC HOUSING AUTHORITY,Internal,1,Deputy Director Records Management,ADMINISTRATIVE PUBLIC RECORDS,Competitive-1,10041,M3,"Legal Affairs Policy, Research & Analysis",F,Manager,72038,192152,Annual,Records Management,VP of Compliance,"The New York City Housing Authority (NYCHA) is the nation’s largest public housing authority, with an operating budget of $3.3 billion and over 11,000 employees who manage and maintain 302 developments that house about 400,000 residents.  NYCHA also operates the country’s largest Section 8 program, which provides rental assistance to about 200,000 additional people.  Reporting to the Vice President in the Compliance Department, the Deputy Director for the Records Management Office is responsible for work relating to the overall management, planning and operation of an archival or records management program or in an information science and library management program; or other assignments equivalent to that described herein.  Responsibilities include, but are not limited to the following:  •	Direct a unit responsible for significant policy implementation.  Supervise a staff of nine (9) and assist with day-to-day records storage, retention, and retrieval management responsibilities and notify the Vice President and/or Chief Compliance Officer when/if issues arise that require senior management intervention. •	Draft policies and procedures related to record retention. •	Categorize and disseminate descriptive guides to records, collections, and other significant historical documents. •	Permanently preserve records and related materials having historical, research, cultural or other important value; or develop and/or review and evaluate general and specialized professional library programs of considerable scope and complexity; or other assignments equivalent to that described herein. •	Evaluate current internal controls built around records storage, records retention, and records retrieval to ensure that processes are efficient, sufficient, and have the appropriate levels of approval from departments throughout the authority. •	Advise and make recommendations to the agency’s Chief Compliance Officer and other executive staff on questions of law, fact and administrative policy. •	Perform complex computerized research, enter data and generate reports using custom databases. •	Collaborate with business liaisons from designated departments to develop and implement a records retention schedule.  Ensure that affected parties follow governing agency rules, regulations and guidelines as it relates to records storage and retention.  •	Assist in responding to audits and other external compliance reviews of NYCHA.  NOTE:  IF THIS APPOINTMENT IS MADE ON A PROVISIONAL BASIS PURSUANT TO §65 OF THE NYS CIVIL SERVICE LAW, NO TENURE OR PERMANENCE ACCRUES TO AN INCUMBENT IN THIS POSITION BY VIRTUE OF SUCH APPOINTMENT.   NOTE:  This position is open to qualified persons with a disability who are eligible for the 55-a Program.  Please indicate in your cover letter that you would like to be considered for the position under the 55-a Program.  For detailed information regarding the 55-a Program, please visit the following link:  http://www.nyc.gov/html/dcas/downloads/pdf/psb/100_1.pdf   Please read this posting carefully to make certain you meet the qualification requirements before applying to this position.","Qualification Requirements  1. A master's degree from an accredited college in Library Science, Archival Science, American History or Political Science or a related area, plus four years of full-time professional experience in a governmental archives, records management center or library complex, 18 months of which must have been in a supervisory, administrative, consultative, managerial or executive capacity; or    2. Education and/or experience equivalent to 1 above. However, all candidates must have a master's degree and the 18 months supervisory, administrative, consultative, managerial or executive experience as described in 1 above.","1.	7 – 10 years effectively managing a records management program. 2.	Extensive research and writing skills with ability to communicate in different styles and to varied audiences (e.g., memoranda, policies, abstracts, manuscripts, grants, proposals, protocols, presentations, outreach materials, and social media). 3.	Excellent interpersonal and conflict resolution skills. 4.	Ability to manage relationships, balance competing priorities, and manage up and down. 5.	Experience managing high-priority projects and tracking project life-cycles. 6.	Experience working collaboratively to develop and execute project plans. 7.	Ability to make timely and effective decisions and produce results through strategic planning. 8.	Excellent communication skills with ability to engage and facilitate cooperation among multiple stakeholders. 9.	Experience managing vendors. 10.	Firm working knowledge of Microsoft Word, Excel, Outlook, PowerPoint, and Publisher. 11.	Project management skills with experience in managing and supervising administrative projects.  Familiarity with Smartsheet, Basecamp, Lucidchart, Visio, and Projects.","1.	NYCHA employees applying for promotional, title or level change opportunities must have served a period of one year in their current title and level (if applicable). 2.	NYCHA residents are encouraged to apply.",Click the Apply Now button.,,,,NYCHA has no residency requirements.,02/09/2021,,02/09/2021,03/23/2021
458663,PRESIDENT BOROUGH OF MANHATTAN,External,1,Information Technology Manager,COMPUTER SYSTEMS MANAGER,Competitive-1,1005D,00,"Technology, Data & Innovation",F,Experienced (non-manager),80000,100000,Annual,"1 Centre St., N.Y.",Administration,"Please Note: Only candidates serving permanently in the title of Computer Systems Manager or have passed the latest Computer Systems Manager exam (# 9011) and be reachable for appointment from the resulting list will be considered for the position.   Manhattan Borough President Gale A. Brewer seeks an Information Technology Manager to be part of her energetic and collaborative Operations team.  The position reports to the Director of Human Resources and Operations.  The person in this position is responsible for the management of the computers, printers and telephone system within the two locations of the Manhattan Borough President’s Office.   This is a hands-on position that requires the candidate to have demonstrated successful skills in working in a LAN/WAN environment. This is a one-person shop. The IT Manager will provide help desk support to staff, as well as monitor network performance, including software, hardware, power, and communications; work with consultants, vendors and with the City’s Department of Information Technology and Telecommunications (DoITT), the Citywide IT agency -- to which our network is connected and through which we receive our Internet connection -- to ensure that the network is secure; troubleshoot problems with the network and with desktop software, including maintaining security updates for operating system desktop software and security-specific software, such as McAfee and CrowdStrike; coordinate with DoITT for the implementation and use of all Internet, Intranet and Extranet activities; monitor, troubleshoot and upgrade our office Internet connection and WiFi from third parties such as Spectrum on a separate internal network that is retained to provide City Charter-mandated live streaming of meetings that are hosted by and held in the office; work with DoITT to connect to City-provided WiFi for access to online video meetings (for example, Zoom, WebEx).  The IT Manager will also be responsible for: the performance of desktop computers and peripherals; maintain inventory of IT desktop and network equipment; liaison with outside vendors regarding the maintenance and security of our Cisco switches; research problems or errors within the networking system and subsystems; documenting all facets of the IT network operations; recommending and making necessary improvements to network configurations to achieve operating efficiencies; and performing all other duties as assigned. Required familiarity with the Microsoft Office Suite and Microsoft 365; PC hardware/software, desktop and laptop network configuration connectivity troubleshooting, including DHCP, and DNS configurations; some familiarity with Windows Server.  This position will also be responsible for monitoring and troubleshooting the VoIP telephone connections.  In addition, the office has a number of MacBooks and iMacs for which this position will need to provide support in conjunction with DoITT’s Apple Support Team.","1. A master's degree in computer science from an accredited college or university and three (3) years of progressively more responsible, full-time, satisfactory experience in Information Technology (IT) including applications development, systems development, data communications and networking, database administration, data processing, or user services. At least eighteen (18) months of this experience must have been in an administrative, managerial or executive capacity in the areas of applications development, systems development, data communications and networking, database administration, data processing or in the supervision of staff performing these duties; or  2. A baccalaureate degree from an accredited college or university and four (4) years of progressively more responsible, full-time, satisfactory experience as described in 1 above; or  3. A four-year high school diploma or its educational equivalent, and six (6) years of progressively more responsible, full-time, satisfactory experience as described in 1 above; or  4. A satisfactory combination of education and experience equivalent to 1, 2 or 3 above. However, all candidates must have at least a four-year high school diploma or its educational equivalent and must possess at least three (3) years of experience as described in 1 above, including the eighteen (18) months of administrative, managerial, executive or supervisory experience as described in 1 above.   In the absence of a baccalaureate degree, undergraduate credits may be substituted for a maximum of two (2) years of the required experience in IT on the basis of 30 semester credits for six (6) months of the required experience. Graduate credits in computer science may be substituted for a maximum of one (1) year of the required experience in IT on the basis of 30 graduate semester credits in computer science for one (1) year of the required IT experience. However, undergraduate and/or graduate credits may not be substituted for the eighteen (18) months of experience in an administrative, managerial, executive, or supervisory capacity as described in 1 above.","•	Experience with Microsoft Windows Server and Microsoft Office Suite. MCSE certification a plus. Four to six years’ experience in the installation and support of LAN and WAN technologies required. Experience in training end-users is required.  •	Experience with CISCO routers and firewalls, 3COM switches, and related protocols.  Experience with installing, troubleshooting, and maintaining Server and Desktop computer configurations. Administering a BES server and related devices. Strong organizational skills required.  •	Experience required in network design, deployment and troubleshooting. Capable of delivering summary reports of network analysis, including security audits, hardware/software patch updates, and network monitoring; good research and consulting skills necessary; capable of leading meetings and giving presentations to customers; responsible for performing network architecture reviews, developing detailed implementation plans and detailed engineering packages for site implementations. Good documentation skills are a must. 5-7 yrs of network engineering/administration/management, 3-5 yrs LAN/WAN design, 3-5 yrs Cisco LAN/WAN router/switch/hub configuration and troubleshooting skills; excellent written/verbal communication skills; and capable of extensive interaction with team members, program managers and clients.   •	Skilled in project planning, performance analysis and fault management; self-motivated; close interaction with users - must be a team player; results oriented and able to respond quickly; timeliness of response to service requests and thoroughness in problem solving required; professional, courteous handling of employee requests, “Customer Service” attitude; excellent analytical, problem-solving and communication skills.",,Please Note: Only candidates serving permanently in the title of Computer Systems Manager or have passed the latest Computer Systems Manager exam (# 9011) and be reachable for appointment from the resulting list will be considered for the position.   1.	Email a cover letter and resume to resumes@manhattanbp.nyc.gov with  Information Technology Manager  in the subject line AND 						  2.     Apply to this posting,,,,New York City Residency is not required for this position,02/23/2021,,02/23/2021,03/23/2021

Import the data into MongoDB

Import the CSV data file into a MongoDB collection named jobs:

mongoimport --headerline --type=csv --host=database_host --db=your_database_name --collection=jobs --file=./NYC_Jobs.csv --username=your_username --password=your_password

Basics

Connect to the database

Before going into aggregation, let’s check basic queries on the database to ensure they produce the expected results.

mongosh your_database_name --host database_host -u your_username -p

View a list of collections

Let’s verify that a jobs collection has indeed been created in the database.

use your_database_name
show collections

Retrieve and counting documents

Let’s try a few simple retrievals of documents from the jobs collection.

db.jobs.findOne()

Retrieve 10 documents:

db.jobs.find().limit(10)

Retrieve all documents:

db.jobs.find()

Count documents in the result set:

db.jobs.countDocuments()

Find distinct values

The db.collection.distinct() function allows us to retrieve distinct values for a given field.

db.jobs.distinct("Agency")
db.jobs.distinct("Full-Time/Part-Time indicator")

Sorting results

The db.collection.find().sort() function sorts results. For example, sort all documents by the Job ID field, and show just the top 3 results.

db.jobs.find().sort({ "Job ID": 1 }).limit(3)
let fields = { _id: 0, "Job ID": 1, Agency: 1, "Business Title": 1 }
db.jobs.find({}, fields).sort({ "Job ID": 1 }).limit(3)
let filter = { Agency: "POLICE DEPARTMENT" }
db.jobs.find(filter, fields).sort({ "Job ID": 1 })

Sorting results (continued)

Let’s put our database skills to good use and find the 20 top-paying jobs in city government:

let fields = {
  _id: 0,
  "Job ID": 1,
  "Business Title": 1,
  "Posting Date": 1,
  "Salary Range To": 1,
}

let orderBy = { "Salary Range To": -1 }
db.jobs.find({}, fields).sort(orderBy).limit(20)

Aggregation

Concept

As with relational SQL databases, MongoDB supports aggregate functions and grouping of data. There are a few ways to do this in MongoDB.

We will focus on the aggregation pipeline technique.

Aggregation pipeline

The aggregation pipeline is a multi-stage process that transforms documents into an aggregated result.

To start an aggregation, we call the aggregate() function on a collection:

db.collection.aggregate()

Aggregation pipeline (continued)

Into the db.collection.aggregate() function, we pass in an array containing a sequence of aggregation pipeline stages. These stages could include:

These pipeline stages have counterparts in the simple MongoDB Read queries - the criteria argument, the projection, the countDocuments() method, and so on.

Aggregation pipeline (continued again)

There are more aggregation pipeline stage operators, although these are less often used.

A full list of aggregation pipeline operators is available in the docs.

“$fieldName” Syntax

You’ll often see syntax that looks like "$fieldName" - a field name prefixed with dollar, and quoted as a string.

These dollar filed names represent the value at that fieldName.

$match

The $match operator filters documents in the pipeline. Here is the operator unattached to any particular pipeline.

{
  $match: {
    city: "Brooklyn"
  }
}

$count

As the name implies, $count counts the number of documents incoming from the previous stage.

{
  $count: "countPolicyAnalysts"
}

For example, if the $match and $count operators were placed within a pipeline, it might look like:

db.jobs.aggregate([
  { $match: { "Business Title": "Policy Analyst" } },
  { $count: "countPolicyAnalysts" },
])

$project

The $project stage can modify the shape of a document by adding, suppressing and calculating fields.

For example,

let fields = { _id: 0, "Job ID": 1, "Business Title": 1 }
db.jobs.aggregate([{ $project: fields }])

$project (continued)

For example, uppercasing a field called business title →

let projection = {
  _id: 0,
  "Job ID": 1,
  title: { $toUpper: "$Business Title" },
}
db.jobs.aggregate([{ $project: projection }])

$project (continued again)

For example, combining $match and $project to show jobs that have “External” Posting Type with a subset of fields.

First, a projection for only the fields Job ID, Posting Type as ptype, and an uppercased Business Title as title:

var projection = {
  _id: 0,
  "Job ID": 1,
  ptype: "$Posting Type",
  title: { $toUpper: "$Business Title" },
}

Pipe the external job postings to this projection:

db.jobs.aggregate([
  { $match: { "Posting Type": "External" } },
  { $project: projection },
])

More operators

Because a projection can result in calculated fields, there are a large number of aggregation pipeline operators for manipulating strings, dates, performing simple arithmetic, etc.

Some examples:

More operators (continued)

Additional operations… and expressions, include:

{
  $subtract: ["$field1", "$field2"] // i.e. field1 - field2
}

and many more, like $toUpper, $trim, {$arrayElemAt: someArray, someIndex}, etc.

$group

The $group operator creates distinct groups as separate documents.

// counts the number of documents per Agency
{$group: {
  _id: "$Agency",
  listingCount: {$sum: 1}
}}

Pipeline Examples

Match only

Filter such that the minimum Salary Range From in our results is $200,000:

let min_from = { $match: { "Salary Range From": { $gt: 200000 } } }

db.jobs.aggregate([min_from])

Match and project

For all positions that are paid annually (see Salary Frequency), show the top 10 positions that have the largest range of possible salaries.

let salaryGap = { $subtract: ["$Salary Range To", "$Salary Range From"] }

let fields = {
  $project: {
    _id: 0,
    "Business Title": 1,
    "Salary Gap": salaryGap,
  },
}

let annualExternal = {
  $match: {
    "Salary Frequency": "Annual",
    "Posting Type": "External",
  },
}

let sortGapDesc = { $sort: { "Salary Gap": -1 } }

db.jobs.aggregate([annualExternal, fields, sortGapDesc, { $limit: 10 }])

Match, project, and match

Same as previous example, but only show the positions where the salary range is lower than $5,000, but not $0.

let salaryGap = { $subtract: ["$Salary Range To", "$Salary Range From"] }

let fields = {
  $project: {
    _id: 0,
    "Business Title": 1,
    "Salary Gap": salaryGap,
  },
}

let annualExternal = {
  $match: {
    "Salary Frequency": "Annual",
    "Posting Type": "External",
  },
}

let maxDiff = { $match: { "Salary Gap": { $lt: 5000, $gt: 0 } } }

db.jobs.aggregate([annualExternal, fields, maxDiff])

Group

Let’s calculate the average each city agency pays for the low end of the salary range for Career Level “Entry-Level”, without including agencies with an average lower than $30,000.

let matchEntryLevel = { $match: { "Career Level": "Entry-Level" } }

let avgSalaryAgencyGroup = {
  $group: {
    _id: "$Agency",
    avgSalaryFrom: { $avg: "$Salary Range From" },
  },
}

let match30K = {
  $match: {
    avgSalaryFrom: { $gte: 30000 },
  },
}

db.jobs.aggregate([matchEntryLevel, avgSalaryAgencyGroup, match30K])

Group

Let’s see how many jobs were posted for every year

let extractYear = { $arrayElemAt: [{ $split: ["$Posting Date", "/"] }, -1] }

let fields = { $project: { year: extractYear } }

let countByYear = { $group: { _id: "$year", count: { $sum: 1 } } }

let orderByYear = { $sort: { _id: -1 } }

db.jobs.aggregate([fields, countByYear, orderByYear])

Conclusions

Thank you. Bye.