Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
These resources provide a comprehensive pathway for aspiring database engineers and software developers. They cover fundamental database concepts like data modeling, SQL for data manipulation and management, database optimization, and data warehousing. Furthermore, they explore essential software development practices including Python programming, object-oriented principles, version control with Git and GitHub, software testing methodologies, and preparing for technical interviews with insights into data structures and algorithms.
Introduction to Database Engineering
This course provides a comprehensive introduction to database engineering. A straightforward description of a database is a form of electronic storage in which data is held. However, this simple explanation doesn’t fully capture the impact of database technology on global industry, government, and organizations. Almost everyone has used a database, and it’s likely that information about us is present in many databases worldwide.
Database engineering is crucial to global industry, government, and organizations. In a real-world context, databases are used in various scenarios:
Banks use databases to store data for customers, bank accounts, and transactions.
Hospitals store patient data, staff data, and laboratory data.
Online stores retain profile information, shopping history, and accounting transactions.
Social media platforms store uploaded photos.
Work environments use databases for downloading files.
Online games rely on databases.
Data in basic terms is facts and figures about anything. For example, data about a person might include their name, age, email, and date of birth, or it could be facts and figures related to an online purchase like the order number and description.
A database looks like data organized systematically, often resembling a spreadsheet or a table. This systematic organization means that all data contains elements or features and attributes by which they can be identified. For example, a person can be identified by attributes like name and age.
Data stored in a database cannot exist in isolation; it must have a relationship with other data to be processed into meaningful information. Databases establish relationships between pieces of data, for example, by retrieving a customer’s details from one table and their order recorded against another table. This is often achieved through keys. A primary key uniquely identifies each record in a table, while a foreign key is a primary key from one table that is used in another table to establish a link or relationship between the two. For instance, the customer ID in a customer table can be the primary key and then become a foreign key in an order table, thus relating the two tables.
While relational databases, which organize data into tables with relationships, are common, there are other types of databases. An object-oriented database stores data in the form of objects instead of tables or relations. An example could be an online bookstore where authors, customers, books, and publishers are rendered as classes, and the individual entries are objects or instances of these classes.
To work with data in databases, database engineers use Structured Query Language (SQL). SQL is a standard language that can be used with all relational databases like MySQL, PostgreSQL, Oracle, and Microsoft SQL Server. Database engineers establish interactions with databases to create, read, update, and delete (CRUD) data.
SQL can be divided into several sub-languages:
Data Definition Language (DDL) helps define data in the database and includes commands like CREATE (to create databases and tables), ALTER (to modify database objects), and DROP (to remove objects).
Data Manipulation Language (DML) is used to manipulate data and includes operations like INSERT (to add data), UPDATE (to modify data), and DELETE (to remove data).
Data Query Language (DQL) is used to read or retrieve data, primarily using the SELECT command.
Data Control Language (DCL) is used to control access to the database, with commands like GRANT and REVOKE to manage user privileges.
SQL offers several advantages:
It requires very little coding skills to use, consisting mainly of keywords.
Its interactivity allows developers to write complex queries quickly.
It is a standard language usable with all relational databases, leading to extensive support and information availability.
It is portable across operating systems.
Before developing a database, planning the organization of data is crucial, and this plan is called a schema. A schema is an organization or grouping of information and the relationships among them. In MySQL, schema and database are often interchangeable terms, referring to how data is organized. However, the definition of schema can vary across different database systems. A database schema typically comprises tables, columns, relationships, data types, and keys. Schemas provide logical groupings for database objects, simplify access and manipulation, and enhance database security by allowing permission management based on user access rights.
Database normalization is an important process used to structure tables in a way that minimizes challenges by reducing data duplication and avoiding data inconsistencies (anomalies). This involves converting a large table into multiple tables to reduce data redundancy. There are different normal forms (1NF, 2NF, 3NF) that define rules for table structure to achieve better database design.
As databases have evolved, they now must be able to store ever-increasing amounts of unstructured data, which poses difficulties. This growth has also led to concepts like big data and cloud databases.
Furthermore, databases play a crucial role in data warehousing, which involves a centralized data repository that loads, integrates, stores, and processes large amounts of data from multiple sources for data analysis. Dimensional data modeling, based on dimensions and facts, is often used to build databases in a data warehouse for data analytics. Databases also support data analytics, where collected data is converted into useful information to inform future decisions.
Tools like MySQL Workbench provide a unified visual environment for database modeling and management, supporting the creation of data models, forward and reverse engineering of databases, and SQL development.
Finally, interacting with databases can also be done through programming languages like Python using connectors or APIs (Application Programming Interfaces). This allows developers to build applications that interact with databases for various operations.
Understanding SQL: Language for Database Interaction
SQL (Structured Query Language) is a standard language used to interact with databases. It’s also commonly pronounced as “SQL”. Database engineers use SQL to establish interactions with databases.
Here’s a breakdown of SQL based on the provided source:
Role of SQL: SQL acts as the interface or bridge between a relational database and its users. It allows database engineers to create, read, update, and delete (CRUD) data. These operations are fundamental when working with a database.
Interaction with Databases: As a web developer or data engineer, you execute SQL instructions on a database using a Database Management System (DBMS). The DBMS is responsible for transforming SQL instructions into a form that the underlying database understands.
Applicability: SQL is particularly useful when working with relational databases, which require a language that can interact with structured data. Examples of relational databases that SQL can interact with include MySQL, PostgreSQL, Oracle, and Microsoft SQL Server.
SQL Sub-languages: SQL is divided into several sub-languages:
Data Definition Language (DDL): Helps you define data in your database. DDL commands include:
CREATE: Used to create databases and related objects like tables. For example, you can use the CREATE DATABASE command followed by the database name to create a new database. Similarly, CREATE TABLE followed by the table name and column definitions is used to create tables.
ALTER: Used to modify already created database objects, such as modifying the structure of a table by adding or removing columns (ALTER TABLE).
DROP: Used to remove objects like tables or entire databases. The DROP DATABASE command followed by the database name removes a database. The DROP COLUMN command removes a specific column from a table.
Data Manipulation Language (DML): Commands are used to manipulate data in the database and most CRUD operations fall under DML. DML commands include:
INSERT: Used to add or insert data into a table. The INSERT INTO syntax is used to add rows of data to a specified table.
UPDATE: Used to edit or modify existing data in a table. The UPDATE command allows you to specify data to be changed.
DELETE: Used to remove data from a table. The DELETE FROM syntax followed by the table name and an optional WHERE clause is used to remove data.
Data Query Language (DQL): Used to read or retrieve data from the database. The primary DQL command is:
SELECT: Used to select and retrieve data from one or multiple tables, allowing you to specify the columns you want and apply filter criteria using the WHERE clause. You can select all columns using SELECT *.
Data Control Language (DCL): Used to control access to the database. DCL commands include:
GRANT: Used to give users access privileges to data.
REVOKE: Used to revert access privileges already given to users.
Advantages of SQL: SQL is a popular language choice for databases due to several advantages:
Low coding skills required: It uses a set of keywords and requires very little coding.
Interactivity: Allows developers to write complex queries quickly.
Standard language: Can be used with all relational databases like MySQL, leading to extensive support and information availability.
Portability: Once written, SQL code can be used on any hardware and any operating system or platform where the database software is installed.
Comprehensive: Covers all areas of database management and administration, including creating databases, manipulating data, retrieving data, and managing security.
Efficiency: Allows database users to process large amounts of data quickly and efficiently.
Basic SQL Operations: SQL enables various operations on data, including:
Creating databases and tables using DDL.
Populating and modifying data using DML (INSERT, UPDATE, DELETE).
Reading and querying data using DQL (SELECT) with options to specify columns and filter data using the WHERE clause.
Sorting data using the ORDER BY clause with ASC (ascending) or DESC (descending) keywords.
Filtering data using the WHERE clause with various comparison operators (=, <, >, <=, >=, !=) and logical operators (AND, OR). Other filtering operators include BETWEEN, LIKE, and IN.
Removing duplicate rows using the SELECT DISTINCT clause.
Performing arithmetic operations using operators like +, -, *, /, and % (modulus) within SELECT statements.
Using comparison operators to compare values in WHERE clauses.
Utilizing aggregate functions (though not detailed in this initial overview but mentioned later in conjunction with GROUP BY).
Joining data from multiple tables (mentioned as necessary when data exists in separate entities). The source later details INNER JOIN, LEFT JOIN, and RIGHT JOIN clauses.
Creating aliases for tables and columns to make queries simpler and more readable.
Using subqueries (a query within another query) for more complex data retrieval.
Creating views (virtual tables based on the result of a SQL statement) to simplify data access and combine data from multiple tables.
Using stored procedures (pre-prepared SQL code that can be saved and executed).
Working with functions (numeric, string, date, comparison, control flow) to process and manipulate data.
Implementing triggers (stored programs that automatically execute in response to certain events).
Managing database transactions to ensure data integrity.
Optimizing queries for better performance.
Performing data analysis using SQL queries.
Interacting with databases using programming languages like Python through connectors and APIs.
In essence, SQL is a powerful and versatile language that is fundamental for anyone working with relational databases, enabling them to define, manage, query, and manipulate data effectively. The knowledge of SQL is a valuable skill for database engineers and is crucial for various tasks, from building and maintaining databases to extracting insights through data analysis.
Data Modeling Principles: Schema, Types, and Design
Data modeling principles revolve around creating a blueprint of how data will be organized and structured within a database system. This plan, often referred to as a schema, is essential for efficient data storage, access, updates, and querying. A well-designed data model ensures data consistency and quality.
Here are some key data modeling principles discussed in the sources:
Understanding Data Requirements: Before creating a database, it’s crucial to have a clear idea of its purpose and the data it needs to store. For example, a database for an online bookshop needs to record book titles, authors, customers, and sales. Mangata and Gallo (mng), a jewelry store, needed to store data on customers, products, and orders.
Visual Representation: A data model provides a visual representation of data elements (entities) and their relationships. This is often achieved using an Entity Relationship Diagram (ERD), which helps in planning entity-relational databases.
Different Levels of Abstraction: Data modeling occurs at different levels:
Conceptual Data Model: Provides a high-level, abstract view of the entities and their relationships in the database system. It focuses on “what” data needs to be stored (e.g., customers, products, orders as entities for mng) and how these relate.
Logical Data Model: Builds upon the conceptual model by providing a more detailed overview of the entities, their attributes, primary keys, and foreign keys. For mng, this would involve defining attributes for customers (like client ID as primary key), products, and orders, and specifying foreign keys to establish relationships (e.g., client ID in the orders table referencing the clients table).
Physical Data Model: Represents the internal schema of the database and is specific to the chosen Database Management System (DBMS). It outlines details like data types for each attribute (e.g., varchar for full name, integer for contact number), constraints (e.g., not null), and other database-specific features. SQL is often used to create the physical schema.
Choosing the Right Data Model Type: Several types of data models exist, each with its own advantages and disadvantages:
Relational Data Model: Represents data as a collection of tables (relations) with rows and columns, known for its simplicity.
Entity-Relationship Model: Similar to the relational model but presents each table as a separate entity with attributes and explicitly defines different types of relationships between entities (one-to-one, one-to-many, many-to-many).
Hierarchical Data Model: Organizes data in a tree-like structure with parent and child nodes, primarily supporting one-to-many relationships.
Object-Oriented Model: Translates objects into classes with characteristics and behaviors, supporting complex associations like aggregation and inheritance, suitable for complex projects.
Dimensional Data Model: Based on dimensions (context of measurements) and facts (quantifiable data), optimized for faster data retrieval and efficient data analytics, often using star and snowflake schemas in data warehouses.
Database Normalization: This is a crucial process for structuring tables to minimize data redundancy, avoid data modification implications (insertion, update, deletion anomalies), and simplify data queries. Normalization involves applying a series of normal forms (First Normal Form – 1NF, Second Normal Form – 2NF, Third Normal Form – 3NF) to ensure data atomicity, eliminate repeating groups, address functional and partial dependencies, and resolve transitive dependencies.
Establishing Relationships: Data in a database should be related to provide meaningful information. Relationships between tables are established using keys:
Primary Key: A value that uniquely identifies each record in a table and prevents duplicates.
Foreign Key: One or more columns in one table that reference the primary key in another table, used to connect tables and create cross-referencing.
Defining Domains: A domain is the set of legal values that can be assigned to an attribute, ensuring data in a field is well-defined (e.g., only numbers in a numerical domain). This involves specifying data types, length values, and other relevant rules.
Using Constraints: Database constraints limit the type of data that can be stored in a table, ensuring data accuracy and reliability. Common constraints include NOT NULL (ensuring fields are always completed), UNIQUE (preventing duplicate values), CHECK (enforcing specific conditions), and FOREIGN KEY (maintaining referential integrity).
Importance of Planning: Designing a data model before building the database system allows for planning how data is stored and accessed efficiently. A poorly designed database can make it hard to produce accurate information.
Considerations at Scale: For large-scale applications like those at Meta, data modeling must prioritize user privacy, user safety, and scalability. It requires careful consideration of data access, encryption, and the ability to handle billions of users and evolving product needs. Thoughtfulness about future changes and the impact of modifications on existing data models is crucial.
Data Integrity and Quality: Well-designed data models, including the use of data types and constraints, are fundamental steps in ensuring the integrity and quality of a database.
Data modeling is an iterative process that requires a deep understanding of the data, the business requirements, and the capabilities of the chosen database system. It is a crucial skill for database engineers and a fundamental aspect of database design. Tools like MySQL Workbench can aid in creating, visualizing, and implementing data models.
Understanding Version Control: Git and Collaborative Development
Version Control Systems (VCS), also known as Source Control or Source Code Management, are systems that record all changes and modifications to files for tracking purposes. The primary goal of any VCS is to keep track of changes by allowing developers access to the entire change history with the ability to revert or roll back to a previous state or point in time. These systems track different types of changes such as adding new files, modifying or updating files, and deleting files. The version control system is the source of truth across all code assets and the team itself.
There are many benefits associated with Version Control, especially for developers working in a team. These include:
Revision history: Provides a record of all changes in a project and the ability for developers to revert to a stable point in time if code edits cause issues or bugs.
Identity: All changes made are recorded with the identity of the user who made them, allowing teams to see not only when changes occurred but also who made them.
Collaboration: A VCS allows teams to submit their code and keep track of any changes that need to be made when working towards a common goal. It also facilitates peer review where developers inspect code and provide feedback.
Automation and efficiency: Version Control helps keep track of all changes and plays an integral role in DevOps, increasing an organization’s ability to deliver applications or services with high quality and velocity. It aids in software quality, release, and deployments. By having Version Control in place, teams following agile methodologies can manage their tasks more efficiently.
Managing conflicts: Version Control helps developers fix any conflicts that may occur when multiple developers work on the same code base. The history of revisions can aid in seeing the full life cycle of changes and is essential for merging conflicts.
There are two main types or categories of Version Control Systems: centralized Version Control Systems (CVCS) and distributed Version Control Systems (DVCS).
Centralized Version Control Systems (CVCS) contain a server that houses the full history of the code base and clients that pull down the code. Developers need a connection to the server to perform any operations. Changes are pushed to the central server. An advantage of CVCS is that they are considered easier to learn and offer more access controls to users. A disadvantage is that they can be slower due to the need for a server connection.
Distributed Version Control Systems (DVCS) are similar, but every user is essentially a server and has the entire history of changes on their local system. Users don’t need to be connected to the server to add changes or view history, only to pull down the latest changes or push their own. DVCS offer better speed and performance and allow users to work offline. Git is an example of a DVCS.
Popular Version Control Technologies include git and GitHub. Git is a Version Control System designed to help users keep track of changes to files within their projects. It offers better speed and performance, reliability, free and open-source access, and an accessible syntax. Git is used predominantly via the command line. GitHub is a cloud-based hosting service that lets you manage git repositories from a user interface. It incorporates Git Version Control features and extends them with features like Access Control, pull requests, and automation. GitHub is very popular among web developers and acts like a social network for projects.
Key Git concepts include:
Repository: Used to track all changes to files in a specific folder and keep a history of all those changes. Repositories can be local (on your machine) or remote (e.g., on GitHub).
Clone: To copy a project from a remote repository to your local device.
Add: To stage changes in your local repository, preparing them for a commit.
Commit: To save a snapshot of the staged changes in the local repository’s history. Each commit is recorded with the identity of the user.
Push: To upload committed changes from your local repository to a remote repository.
Pull: To retrieve changes from a remote repository and apply them to your local repository.
Branching: Creating separate lines of development from the main codebase to work on new features or bug fixes in isolation. The main branch is often the source of truth.
Forking: Creating a copy of someone else’s repository on a platform like GitHub, allowing you to make changes without affecting the original.
Diff: A command to compare changes across files, branches, and commits.
Blame: A command to look at changes of a specific file and show the dates, times, and users who made the changes.
The typical Git workflow involves three states: modified, staged, and committed. Files are modified in the working directory, then added to the staging area, and finally committed to the local repository. These local commits are then pushed to a remote repository.
Branching workflows like feature branching are commonly used. This involves creating a new branch for each feature, working on it until completion, and then merging it back into the main branch after a pull request and peer review. Pull requests allow teams to review changes before they are merged.
At Meta, Version Control is very important. They use a giant monolithic repository for all of their backend code, which means code changes are shared with every other Instagram team. While this can be risky, it allows for code reuse. Meta encourages engineers to improve any code, emphasizing that “nothing at meta is someone else’s problem”. Due to the monolithic repository, merge conflicts happen a lot, so they try to write smaller changes and add gatekeepers to easily turn off features if needed. git blame is used daily to understand who wrote specific lines of code and why, which is particularly helpful in a large organization like Meta.
Version Control is also relevant to database development. It’s easy to overcomplicate data modeling and storage, and Version Control can help track changes and potentially revert to earlier designs. Planning how data will be organized (schema) is crucial before developing a database.
Learning to use git and GitHub for Version Control is part of the preparation for coding interviews in a final course, alongside practicing interview skills and refining resumes. Effective collaboration, which is enhanced by Version Control, is a crucial skill for software developers.
Python Programming Fundamentals: An Introduction
Based on the sources, here’s a discussion of Python programming basics:
Introduction to Python:
Python is a versatile and high-level programming language available on multiple platforms. It’s used in various areas like web development, data analytics, and business forecasting. Python’s syntax is similar to English, making it intuitive and easy for beginners to understand. Experienced programmers also appreciate its power and adaptability. Python was created by Guido van Rossum and released in 1991. It was designed to be readable and has similarities to English and mathematics. Since its release, it has gained significant popularity and has a rich selection of frameworks and libraries. Currently, it’s a popular language to learn, widely used in areas such as web development, artificial intelligence, machine learning, data analytics, and various programming applications. Python is easy to learn and get started with due to its English-like syntax. It also often requires less code compared to languages like C or Java. Python’s simplicity allows developers to focus on the task at hand, making it potentially quicker to get a product to market.
Setting up a Python Environment:
To start using Python, it’s essential to ensure it works correctly on your operating system with your chosen Integrated Development Environment (IDE), such as Visual Studio Code (VS Code). This involves making sure the right version of Python is used as the interpreter when running your code.
Installation Verification: You can verify if Python is installed by opening the terminal (or command prompt on Windows) and typing python –version. This should display the installed Python version.
VS Code Setup: VS Code offers a walkthrough guide for setting up Python. This includes installing Python (if needed) and selecting the correct Python interpreter.
Running Python Code: Python code can be run in a few ways:
Python Shell: Useful for running and testing small scripts without creating .py files. You can access it by typing python in the terminal.
Directly from Command Line/Terminal: Any file with the .py extension can be run by typing python followed by the file name (e.g., python hello.py).
Within an IDE (like VS Code): IDEs provide features like auto-completion, debugging, and syntax highlighting, making coding a better experience. VS Code has a run button to execute Python files.
Basic Syntax and Concepts:
Print Statement: The print() function is used to display output to the console. It can print different types of data and allows for formatting.
Variables: Variables are used to store data that can be changed throughout the program’s lifecycle. In Python, you declare a variable by assigning a value to a name (e.g., x = 5). Python automatically assigns the data type behind the scenes. There are conventions for naming variables, such as camel case (e.g., myName). You can declare multiple variables and assign them a single value (e.g., a = b = c = 10) or perform multiple assignments on one line (e.g., name, age = “Alice”, 30). You can also delete a variable using the del keyword.
Data Types: A data type indicates how a computer system should interpret a piece of data. Python offers several built-in data types:
Numeric: Includes int (integers), float (decimal numbers), and complex numbers.
Sequence: Ordered collections of items, including:
Strings (str): Sequences of characters enclosed in single or double quotes (e.g., “hello”, ‘world’). Individual characters in a string can be accessed by their index (starting from 0) using square brackets (e.g., name). The len() function returns the number of characters in a string.
Lists: Ordered and mutable sequences of items enclosed in square brackets (e.g., [1, 2, “three”]).
Tuples: Ordered and immutable sequences of items enclosed in parentheses (e.g., (1, 2, “three”)).
Dictionary (dict): Unordered collections of key-value pairs enclosed in curly braces (e.g., {“name”: “Bob”, “age”: 25}). Values are accessed using their keys.
Boolean (bool): Represents truth values: True or False.
Set (set): Unordered collections of unique elements enclosed in curly braces (e.g., {1, 2, 3}). Sets do not support indexing.
Typecasting: The process of converting one data type to another. Python supports implicit (automatic) and explicit (using functions like int(), float(), str()) type conversion.
Input: The input() function is used to take input from the user. It displays a prompt to the user and returns their input as a string.
Operators: Symbols used to perform operations on values.
Math Operators: Used for calculations (e.g., + for addition, – for subtraction, * for multiplication, / for division).
Logical Operators: Used in conditional statements to determine true or false outcomes (and, or, not).
Control Flow: Determines the order in which instructions in a program are executed.
Conditional Statements: Used to make decisions based on conditions (if, else, elif).
Loops: Used to repeatedly execute a block of code. Python has for loops (for iterating over sequences) and while loops (repeating a block until a condition is met). Nested loops are also possible.
Functions: Modular pieces of reusable code that take input and return output. You define a function using the def keyword. You can pass data into a function as arguments and return data using the return keyword. Python has different scopes for variables: local, enclosing, global, and built-in (LEGB rule).
Data Structures: Ways to organize and store data. Python includes lists, tuples, sets, and dictionaries.
This overview provides a foundation in Python programming basics as described in the provided sources. As you continue learning, you will delve deeper into these concepts and explore more advanced topics.
Database and Python Fundamentals Study Guide
Quiz
What is a database, and what is its typical organizational structure? A database is a systematically organized collection of data. This organization commonly resembles a spreadsheet or a table, with data containing elements and attributes for identification.
Explain the role of a Database Management System (DBMS) in the context of SQL. A DBMS acts as an intermediary between SQL instructions and the underlying database. It takes responsibility for transforming SQL commands into a format that the database can understand and execute.
Name and briefly define at least three sub-languages of SQL. DDL (Data Definition Language) is used to define data structures in a database, such as creating, altering, and dropping databases and tables. DML (Data Manipulation Language) is used for operational tasks like creating, reading, updating, and deleting data. DQL (Data Query Language) is used for retrieving data from the database.
Describe the purpose of the CREATE DATABASE and CREATE TABLE DDL statements. The CREATE DATABASE statement is used to create a new, empty database within the DBMS. The CREATE TABLE statement is used within a specific database to define a new table, including specifying the names and data types of its columns.
What is the function of the INSERT INTO DML statement? The INSERT INTO statement is used to add new rows of data into an existing table in the database. It requires specifying the table name and the values to be inserted into the table’s columns.
Explain the purpose of the NOT NULL constraint when defining table columns. The NOT NULL constraint ensures that a specific column in a table cannot contain a null value. If an attempt is made to insert a new record or update an existing one with a null value in a NOT NULL column, the operation will be aborted.
List and briefly define three basic arithmetic operators in SQL. The addition operator (+) is used to add two operands. The subtraction operator (-) is used to subtract the second operand from the first. The multiplication operator (*) is used to multiply two operands.
What is the primary function of the SELECT statement in SQL, and how can the WHERE clause be used with it? The SELECT statement is used to retrieve data from one or more tables in a database. The WHERE clause is used to filter the rows returned by the SELECT statement based on specified conditions.
Explain the difference between running Python code from the Python shell and running a .py file from the command line. The Python shell provides an interactive environment where you can execute Python code snippets directly and see immediate results without saving to a file. Running a .py file from the command line executes the entire script contained within the file non-interactively.
Define a variable in Python and provide an example of assigning it a value. In Python, a variable is a named storage location that holds a value. Variables are implicitly declared when a value is assigned to them. For example: x = 5 declares a variable named x and assigns it the integer value of 5.
Answer Key
A database is a systematically organized collection of data. This organization commonly resembles a spreadsheet or a table, with data containing elements and attributes for identification.
A DBMS acts as an intermediary between SQL instructions and the underlying database. It takes responsibility for transforming SQL commands into a format that the database can understand and execute.
DDL (Data Definition Language) helps you define data structures. DML (Data Manipulation Language) allows you to work with the data itself. DQL (Data Query Language) enables you to retrieve information from the database.
The CREATE DATABASE statement establishes a new database, while the CREATE TABLE statement defines the structure of a table within a database, including its columns and their data types.
The INSERT INTO statement adds new rows of data into a specified table. It requires indicating the table and the values to be placed into the respective columns.
The NOT NULL constraint enforces that a particular column must always have a value and cannot be left empty or contain a null entry when data is added or modified.
The + operator performs addition, the – operator performs subtraction, and the * operator performs multiplication between numerical values in SQL queries.
The SELECT statement retrieves data from database tables. The WHERE clause filters the results of a SELECT query, allowing you to specify conditions that rows must meet to be included in the output.
The Python shell is an interactive interpreter for immediate code execution, while running a .py file executes the entire script from the command line without direct interaction during the process.
A variable in Python is a name used to refer to a memory location that stores a value; for instance, name = “Alice” assigns the string value “Alice” to the variable named name.
Essay Format Questions
Discuss the significance of SQL as a standard language for database management. In your discussion, elaborate on at least three advantages of using SQL as highlighted in the provided text and provide examples of how these advantages contribute to efficient database operations.
Compare and contrast the roles of Data Definition Language (DDL) and Data Manipulation Language (DML) in SQL. Explain how these two sub-languages work together to enable the creation and management of data within a relational database system.
Explain the concept of scope in Python and discuss the LEGB rule. Provide examples to illustrate the differences between local, enclosed, global, and built-in scopes and explain how Python resolves variable names based on this rule.
Discuss the importance of modules in Python programming. Explain the advantages of using modules, such as reusability and organization, and describe different ways to import modules, including the use of import, from … import …, and aliases.
Imagine you are designing a simple database for a small online bookstore. Describe the tables you would create, the columns each table would have (including data types and any necessary constraints like NOT NULL or primary keys), and provide example SQL CREATE TABLE statements for two of your proposed tables.
Glossary of Key Terms
Database: A systematically organized collection of data that can be easily accessed, managed, and updated.
Table: A structure within a database used to organize data into rows (records) and columns (fields or attributes).
Column (Field): A vertical set of data values of a particular type within a table, representing an attribute of the entities stored in the table.
Row (Record): A horizontal set of data values within a table, representing a single instance of the entity being described.
SQL (Structured Query Language): A standard programming language used for managing and manipulating data in relational databases.
DBMS (Database Management System): Software that enables users to interact with a database, providing functionalities such as data storage, retrieval, and security.
DDL (Data Definition Language): A subset of SQL commands used to define the structure of a database, including creating, altering, and dropping databases, tables, and other database objects.
DML (Data Manipulation Language): A subset of SQL commands used to manipulate data within a database, including inserting, updating, deleting, and retrieving data.
DQL (Data Query Language): A subset of SQL commands, primarily the SELECT statement, used to query and retrieve data from a database.
Constraint: A rule or restriction applied to data in a database to ensure its accuracy, integrity, and reliability. Examples include NOT NULL.
Operator: A symbol or keyword that performs an operation on one or more operands. In SQL, this includes arithmetic operators (+, -, *, /), logical operators (AND, OR, NOT), and comparison operators (=, >, <, etc.).
Schema: The logical structure of a database, including the organization of tables, columns, relationships, and constraints.
Python Shell: An interactive command-line interpreter for Python, allowing users to execute code snippets and receive immediate feedback.
.py file: A file containing Python source code, which can be executed as a script from the command line.
Variable (Python): A named reference to a value stored in memory. Variables in Python are dynamically typed, meaning their data type is determined by the value assigned to them.
Data Type (Python): The classification of data that determines the possible values and operations that can be performed on it (e.g., integer, string, boolean).
String (Python): A sequence of characters enclosed in single or double quotes, used to represent text.
Scope (Python): The region of a program where a particular name (variable, function, etc.) is accessible. Python has four main scopes: local, enclosed, global, and built-in (LEGB).
Module (Python): A file containing Python definitions and statements. Modules provide a way to organize code into reusable units.
Import (Python): A statement used to load and make the code from another module available in the current script.
Alias (Python): An alternative name given to a module or function during import, often used for brevity or to avoid naming conflicts.
Briefing Document: Review of “01.pdf”
This briefing document summarizes the main themes and important concepts discussed in the provided excerpts from “01.pdf”. The document covers fundamental database concepts using SQL, basic command-line operations, an introduction to Python programming, and related software development tools.
I. Introduction to Databases and SQL
The document introduces the concept of databases as systematically organized data, often resembling spreadsheets or tables. It highlights the widespread use of databases in various applications, providing examples like banks storing account and transaction data, and hospitals managing patient, staff, and laboratory information.
“well a database looks like data organized systematically and this organization typically looks like a spreadsheet or a table”
The core purpose of SQL (Structured Query Language) is explained as a language used to interact with databases. Key operations that can be performed using SQL are outlined:
“operational terms create add or insert data read data update existing data and delete data”
SQL is further divided into several sub-languages:
DDL (Data Definition Language): Used to define the structure of the database and its objects like tables. Commands like CREATE (to create databases and tables) and ALTER (to modify existing objects, e.g., adding a column) are part of DDL.
“ddl as the name says helps you define data in your database but what does it mean to Define data before you can store data in the database you need to create the database and related objects like tables in which your data will be stored for this the ddl part of SQL has a command named create then you might need to modify already created database objects for example you might need to modify the structure of a table by adding a new column you can perform this task with the ddl alter command you can remove an object like a table from a”
DML (Data Manipulation Language): Used to manipulate the data within the database, including inserting (INSERT INTO), updating, and deleting data.
“now we need to populate the table of data this is where I can use the data manipulation language or DML subset of SQL to add table data I use the insert into syntax this inserts rows of data into a given table I just type insert into followed by the table name and then a list of required columns or Fields within a pair of parentheses then I add the values keyword”
DQL (Data Query Language): Primarily used for querying or retrieving data from the database (SELECT statements fall under this category).
DCL (Data Control Language): Used to control access and security within the database.
The document emphasizes that a DBMS (Database Management System) is crucial for interpreting and executing SQL instructions, acting as an intermediary between the SQL commands and the underlying database.
“a database interprets and makes sense of SQL instructions with the use of a database management system or dbms as a web developer you’ll execute all SQL instructions on a database using a dbms the dbms takes responsibility for transforming SQL instructions into a form that’s understood by the underlying database”
The advantages of using SQL are highlighted, including its simplicity, standardization, portability, comprehensiveness, and efficiency in processing large amounts of data.
“you now know that SQL is a simple standard portable comprehensive and efficient language that can be used to delete data retrieve and share data among multiple users and manage database security this is made possible through subsets of SQL like ddl or data definition language DML also known as data manipulation language dql or data query language and DCL also known as data control language and the final advantage of SQL is that it lets database users process large amounts of data quickly and efficiently”
Examples of basic SQL syntax are provided, such as creating a database (CREATE DATABASE College;) and creating a table (CREATE TABLE student ( … );). The INSERT INTO syntax for adding data to a table is also introduced.
Constraints like NOT NULL are mentioned as ways to enforce data integrity during table creation.
“the creation of a new customer record is aborted the not null default value is implemented using a SQL statement a typical not null SQL statement begins with the creation of a basic table in the database I can write a create table Clause followed by customer to define the table name followed by a pair of parentheses within the parentheses I add two columns customer ID and customer name I also Define each column with relevant data types end for customer ID as it stores”
SQL arithmetic operators (+, -, *, /, %) are introduced with examples. Logical operators (NOT, OR) and special operators (IN, BETWEEN) used in the WHERE clause for filtering data are also explained. The concept of JOIN clauses, including SELF-JOIN, for combining data from tables is briefly touched upon.
Subqueries (inner queries within outer queries) and Views (virtual tables based on the result of a query) are presented as advanced SQL concepts. User-defined functions and triggers are also introduced as ways to extend database functionality and automate actions. Prepared statements are mentioned as a more efficient way to execute SQL queries repeatedly. Date and time functions in MySQL are briefly covered.
II. Introduction to Command Line/Bash Shell
The document provides a basic introduction to using the command line or bash shell. Fundamental commands are explained:
PWD (Print Working Directory): Shows the current directory.
“to do that I run the PWD command PWD is short for print working directory I type PWD and press the enter key the command returns a forward slash which indicates that I’m currently in the root directory”
LS (List): Displays the contents of the current directory. The -l flag provides a detailed list format.
“if I want to check the contents of the root directory I run another command called LS which is short for list I type LS and press the enter key and now notice I get a list of different names of directories within the root level in order to get more detail of what each of the different directories represents I can use something called a flag flags are used to set options to the commands you run use the list command with a flag called L which means the format should be printed out in a list format I type LS space Dash l press enter and this Returns the results in a list structure”
CD (Change Directory): Navigates between directories using relative or absolute paths. cd .. moves up one directory.
“to step back into Etc type cdetc to confirm that I’m back there type bwd and enter if I want to use the other alternative you can do an absolute path type in CD forward slash and press enter Then I type PWD and press enter you can verify that I am back at the root again to step through multiple directories use the same process type CD Etc and press enter check the contents of the files by typing LS and pressing enter”
MKDIR (Make Directory): Creates a new directory.
“now I will create a new directory called submissions I do this by typing MK der which stands for make directory and then the word submissions this is the name of the directory I want to create and then I hit the enter key I then type in ls-l for list so that I can see the list structure and now notice that a new directory called submissions has been created I can then go into this”
TOUCH: Creates a new empty file.
“the Parent Directory next is the touch command which makes a new file of whatever type you specify for example to build a brand new file you can run touch followed by the new file’s name for instance example dot txt note that the newly created file will be empty”
HISTORY: Shows a history of recently used commands.
“to view a history of the most recently typed commands you can use the history command”
File Redirection (>, >>, <): Allows redirecting the input or output of commands to files. > overwrites, >> appends.
“if you want to control where the output goes you can use a redirection how do we do that enter the ls command enter Dash L to print it as a list instead of pressing enter add a greater than sign redirection now we have to tell it where we want the data to go in this scenario I choose an output.txt file the output dot txt file has not been created yet but it will be created based on the command I’ve set here with a redirection flag press enter type LS then press enter again to display the directory the output file displays to view the”
GREP: Searches for patterns within files.
“grep stands for Global regular expression print and it’s used for searching across files and folders as well as the contents of files on my local machine I enter the command ls-l and see that there’s a file called”
CAT: Displays the content of a file.
LESS: Views file content page by page.
“press the q key to exit the less environment the other file is the bash profile file so I can run the last command again this time with DOT profile this tends to be used used more for environment variables for example I can use it for setting”
VIM: A text editor used for creating and editing files.
“now I will create a simple shell script for this example I will use Vim which is an editor that I can use which accepts input so type vim and”
CHMOD: Changes file permissions, including making a file executable (chmod +x filename).
“but I want it to be executable which requires that I have an X being set on it in order to do that I have to use another command which is called chmod after using this them executable within the bash shell”
The document also briefly mentions shell scripts (files containing a series of commands) and environment variables (dynamic named values that can affect the way running processes will behave on a computer).
III. Introduction to Git and GitHub
Git is introduced as a free, open-source distributed version control system used to manage source code history, track changes, revert to previous versions, and collaborate with other developers. Key Git commands mentioned include:
GIT CLONE: Used to create a local copy of a remote repository (e.g., from GitHub).
“to do this I type the command git clone and paste the https URL I copied earlier finally I press enter on my keyboard notice that I receive a message stating”
LS -LA: Lists all files in a directory, including hidden ones (like the .git directory which contains the Git repository metadata).
“the ls-la command another file is listed which is just named dot get you will learn more about this later when you explore how to use this for Source control”
CD .git: Changes the current directory to the .git folder.
“first open the dot get folder on your terminal type CD dot git and press enter”
CAT HEAD: Displays the reference to the current commit.
“next type cat head and press enter in git we only work on a single Branch at a time this file also exists inside the dot get folder under the refs forward slash heads path”
CAT refs/heads/main: Displays the hash of the last commit on the main branch.
“type CD dot get and press enter next type cat forward slash refs forward slash heads forward slash main press enter after you”
GIT PULL: Fetches changes from a remote repository and integrates them into the local branch.
“I am now going to explain to you how to pull the repository to your local device”
GitHub is described as a cloud-based hosting service for Git repositories, offering a user interface for managing Git projects and facilitating collaboration.
IV. Introduction to Python Programming
The document introduces Python as a versatile programming language and outlines different ways to run Python code:
Python Shell: An interactive environment for running and testing small code snippets without creating separate files.
“the python shell is useful for running and testing small scripts for example it allows you to run code without the need for creating new DOT py files you start by adding Snippets of code that you can run directly in the shell”
Running Python Files: Executing Python code stored in files with the .py extension using the python filename.py command.
“running a python file directly from the command line or terminal note that any file that has the file extension of dot py can be run by the following command for example type python then a space and then type the file”
Basic Python concepts covered include:
Variables: Declaring and assigning values to variables (e.g., x = 5, name = “Alice”). Python automatically infers data types. Multiple variables can be assigned the same value (e.g., a = b = c = 10).
“all I have to do is name the variable for example if I type x equals 5 I have declared a variable and assigned as a value I can also print out the value of the variable by calling the print statement and passing in the variable name which in this case is X so I type print X when I run the program I get the value of 5 which is the assignment since I gave the initial variable Let Me Clear My screen again you have several options when it comes to declaring variables you can declare any different type of variable in terms of value for example X could equal a string called hello to do this I type x equals hello I can then print the value again run it and I find the output is the word hello behind the scenes python automatically assigns the data type for you”
Data Types: Basic data types like integers, floats (decimal numbers), complex numbers, strings (sequences of characters enclosed in single or double quotes), lists, and tuples (ordered, immutable sequences) are introduced.
“X could equal a string called hello to do this I type x equals hello I can then print the value again run it and I find the output is the word hello behind the scenes python automatically assigns the data type for you you’ll learn more about this in an upcoming video on data types you can declare multiple variables and assign them to a single value as well for example making a b and c all equal to 10. I do this by typing a equals b equals C equals 10. I print all three… sequence types are classed as container types that contain one or more of the same type in an ordered list they can also be accessed based on their index in the sequence python has three different sequence types namely strings lists and tuples let’s explore each of these briefly now starting with strings a string is a sequence of characters that is enclosed in either a single or double quotes strings are represented by the string class or Str for”
Operators: Arithmetic operators (+, -, *, /, **, %, //) and logical operators (and, or, not) are explained with examples.
“example 7 multiplied by four okay now let’s explore logical operators logical operators are used in Python on conditional statements to determine a true or false outcome let’s explore some of these now first logical operator is named and this operator checks for all conditions to be true for example a is greater than five and a is less than 10. the second logical operator is named or this operator checks for at least one of the conditions to be true for example a is greater than 5 or B is greater than 10. the final operator is named not this”
Conditional Statements: if, elif (else if), and else statements are introduced for controlling the flow of execution based on conditions.
“The Logical operators are and or and not let’s cover the different combinations of each in this example I declare two variables a equals true and B also equals true from these variables I use an if statement I type if a and b colon and on the next line I type print and in parentheses in double quotes”
Loops: for loops (for iterating over sequences) and while loops are introduced with examples, including nested loops.
“now let’s break apart the for Loop and discover how it works the variable item is a placeholder that will store the current letter in the sequence you may also recall that you can access any character in the sequence by its index the for Loop is accessing it in the same way and assigning the current value to the item variable this allows us to access the current character to print it for output when the code is run the outputs will be the letters of the word looping each letter on its own line now that you know about looping constructs in Python let me demonstrate how these work further using some code examples to Output an array of tasty desserts python offers us multiple ways to do loops or looping you’ll Now cover the for loop as well as the while loop let’s start with the basics of a simple for Loop to declare a for loop I use the four keyword I now need a variable to put the value into in this case I am using I I also use the in keyword to specify where I want to Loop over I add a new function called range to specify the number of items in a range in this case I’m using 10 as an example next I do a simple print statement by pressing the enter key to move to a new line I select the print function and within the brackets I enter the name looping and the value of I then I click on the Run button the output indicates the iteration Loops through the range of 0 to 9.”
Functions: Defining and calling functions using the def keyword. Functions can take arguments and return values. Examples of using *args (for variable positional arguments) and **kwargs (for variable keyword arguments) are provided.
“I now write a function to produce a string out of this information I type def contents and then self in parentheses on the next line I write a print statement for the string the plus self dot dish plus has plus self dot items plus and takes plus self dot time plus Min to prepare here we’ll use the backslash character to force a new line and continue the string on the following line for this to print correctly I need to convert the self dot items and self dot time… let’s say for example you wanted to calculate a total bill for a restaurant a user got a cup of coffee that was 2.99 then they also got a cake that was 455 and also a juice for 2.99. the first thing I could do is change the for Loop let’s change the argument to quarks by”
File Handling: Opening, reading (using read, readline, readlines), and writing to files. The importance of closing files is mentioned.
“the third method to read files in Python is read lines let me demonstrate this method the read lines method reads the entire contents of the file and then returns it in an ordered list this allows you to iterate over the list or pick out specific lines based on a condition if for example you have a file with four lines of text and pass a length condition the read files function will return the output all the lines in your file in the correct order files are stored in directories and they have”
Recursion: The concept of a function calling itself is briefly illustrated.
“the else statement will recursively call the slice function but with a modified string every time on the next line I add else and a colon then on the next line I type return string reverse Str but before I close the parentheses I add a slice function by typing open square bracket the number 1 and a colon followed by”
Object-Oriented Programming (OOP): Basic concepts of classes (using the class keyword), objects (instances of classes), attributes (data associated with an object), and methods (functions associated with an object, with self as the first parameter) are introduced. Inheritance (creating new classes based on existing ones) is also mentioned.
“method inside this class I want this one to contain a new function called leave request so I type def Leaf request and then self in days as the variables in parentheses the purpose of the leave request function is to return a line that specifies the number of days requested to write this I type return the string may I take a leave for plus Str open parenthesis the word days close parenthesis plus another string days now that I have all the classes in place I’ll create a few instances from these classes one for a supervisor and two others for… you will be defining a function called D inside which you will be creating another nested function e let’s write the rest of the code you can start by defining a couple of variables both of which will be called animal the first one inside the D function and the second one inside the E function note how you had to First declare the variable inside the E function as non-local you will now add a few more print statements for clarification for when you see the outputs finally you have called the E function here and you can add one more variable animal outside the D function this”
Modules: The concept of modules (reusable blocks of code in separate files) and how to import them using the import statement (e.g., import math, from math import sqrt, import math as m). The benefits of modular programming (scope, reusability, simplicity) are highlighted. The search path for modules (sys.path) is mentioned.
“so a file like sample.py can be a module named Sample and can be imported modules in Python can contain both executable statements and functions but before you explore how they are used it’s important to understand their value purpose and advantages modules come from modular programming this means that the functionality of code is broken down into parts or blocks of code these parts or blocks have great advantages which are scope reusability and simplicity let’s delve deeper into these everything in… to import and execute modules in Python the first important thing to know is that modules are imported only once during execution if for example your import a module that contains print statements print Open brackets close brackets you can verify it only executes the first time you import the module even if the module is imported multiple times since modules are built to help you Standalone… I will now import the built-in math module by typing import math just to make sure that this code works I’ll use a print statement I do this by typing print importing the math module after this I’ll run the code the print statement has executed most of the modules that you will come across especially the built-in modules will not have any print statements and they will simply be loaded by The Interpreter now that I’ve imported the math module I want to use a function inside of it let’s choose the square root function sqrt to do this I type the words math dot sqrt when I type the word math followed by the dot a list of functions appears in a drop down menu and you can select sqrt from this list I passed 9 as the argument to the math.sqrt function assign this to a variable called root and then I print it the number three the square root of nine has been printed to the terminal which is the correct answer instead of importing the entire math module as we did above there is a better way to handle this by directly importing the square root function inside the scope of the project this will prevent overloading The Interpreter by importing the entire math module to do this I type from math import sqrt when I run this it displays an error now I remove the word math from the variable declaration and I run the code again this time it works next let’s discuss something called an alias which is an excellent way of importing different modules here I sign an alias called m to the math module I do this by typing import math as m then I type cosine equals m dot I”
Scope: The concepts of local, enclosed, global, and built-in scopes in Python (LEGB rule) and how variable names are resolved. Keywords global and nonlocal for modifying variable scope are mentioned.
“names of different attributes defined inside it in this way modules are a type of namespace name spaces and Scopes can become very confusing very quickly and so it is important to get as much practice of Scopes as possible to ensure a standard of quality there are four main types of Scopes that can be defined in Python local enclosed Global and built in the practice of trying to determine in which scope a certain variable belongs is known as scope resolution scope resolution follows what is known commonly as the legb rule let’s explore these local this is where the first search for a variable is in the local scope enclosed this is defined inside an enclosing or nested functions Global is defined at the uppermost level or simply outside functions and built-in which is the keywords present in the built-in module in simpler terms a variable declared inside a function is local and the ones outside the scope of any function generally are global here is an example the outputs for the code on screen shows the same variable name Greek in different scopes… keywords that can be used to change the scope of the variables Global and non-local the global keyword helps us access the global variables from within the function non- local is a special type of scope defined in Python that is used within the nested functions only in the condition that it has been defined earlier in the enclosed functions now you can write a piece of code that will better help you understand the idea of scope for an attributes you have already created a file called animalfarm.py you will be defining a function called D inside which you will be creating another nested function e let’s write the rest of the code you can start by defining a couple of variables both of which will be called animal the first one inside the D function and the second one inside the E function note how you had to First declare the variable inside the E function as non-local you will now add a few more print statements for clarification for when you see the outputs finally you have called the E function here and you can add one more variable animal outside the D function this”
Reloading Modules: The reload() function for re-importing and re-executing modules that have already been loaded.
“statement is only loaded once by the python interpreter but the reload function lets you import and reload it multiple times I’ll demonstrate that first I create a new file sample.py and I add a simple print statement named hello world remember that any file in Python can be used as a module I’m going to use this file inside another new file and the new file is named using reloads.py now I import the sample.py module I can add the import statement multiple times but The Interpreter only loads it once if it had been reloaded we”
Testing: Introduction to writing test cases using the assert keyword and the pytest framework. The convention of naming test functions with the test_ prefix is mentioned. Test-Driven Development (TDD) is briefly introduced.
“another file called test Edition dot Pi in which I’m going to write my test cases now I import the file that consists of the functions that need to be tested next I’ll also import the pi test module after that I Define a couple of test cases with the addition and subtraction functions each test case should be named test underscore then the name of the function to be tested in our case we’ll have test underscore add and test underscore sub I’ll use the assert keyword inside these functions because tests primarily rely on this keyword it… contrary to the conventional approach of writing code I first write test underscore find string Dot py and then I add the test function named test underscore is present in accordance with the test I create another file named file string dot py in which I’ll write the is present function I Define the function named is present and I pass an argument called person in it then I make a list of names written as values after that I create a simple if else condition to check if the past argument”
V. Software Development Tools and Concepts
The document mentions several tools and concepts relevant to software development:
Python Installation and Version: Checking the installed Python version using python –version.
“prompt type python dash dash version to identify which version of python is running on your machine if python is correctly installed then Python 3 should appear in your console this means that you are running python 3. there should also be several numbers after the three to indicate which version of Python 3 you are running make sure these numbers match the most recent version on the python.org website if you see a message that states python not found then review your python installation or relevant document on”
Jupyter Notebook: An interactive development environment (IDE) for Python. Installation using python -m pip install jupyter and running using jupyter notebook are mentioned.
“course you’ll use the Jupiter put her IDE to demonstrate python to install Jupiter type python-mpip install Jupiter within your python environment then follow the jupyter installation process once you’ve installed jupyter type jupyter notebook to open a new instance of the jupyter notebook to use within your default browser”
MySQL Connector: A Python library used to connect Python applications to MySQL databases.
“the next task is to connect python to your mySQL database you can create the installation using a purpose-built python Library called MySQL connector this library is an API that provides useful”
Datetime Library: Python’s built-in module for working with dates and times. Functions like datetime.now(), datetime.date(), datetime.time(), and timedelta are introduced.
“python so you can import it without requiring pip let’s review the functions that Python’s daytime Library offers the date time Now function is used to retrieve today’s date you can also use date time date to retrieve just the date or date time time to call the current time and the time Delta function calculates the difference between two values now let’s look at the Syntax for implementing date time to import the daytime python class use the import code followed by the library name then use the as keyword to create an alias of… let’s look at a slightly more complex function time Delta when making plans it can be useful to project into the future for example what date is this same day next week you can answer questions like this using the time Delta function to calculate the difference between two values and return the result in a python friendly format so to find the date in seven days time you can create a new variable called week type the DT module and access the time Delta function as an object 563 instance then pass through seven days as an argument finally”
MySQL Workbench: A graphical tool for working with MySQL databases, including creating schemas.
“MySQL server instance and select the schema menu to create a new schema select the create schema option from the menu pane in the schema toolbar this action opens a new window within this new window enter mg underscore schema in the database name text field select apply this generates a SQL script called create schema mg schema you 606 are then asked to review the SQL script to be applied to your new database click on the apply button within the review window if you’re satisfied with the script a new window”
Data Warehousing: Briefly introduces the concept of a centralized data repository for integrating and processing large amounts of data from multiple sources for analysis. Dimensional data modeling is mentioned.
“in the next module you’ll explore the topic of data warehousing in this module you’ll learn about the architecture of a data warehouse and build a dimensional data model you’ll begin with an overview of the concept of data warehousing you’ll learn that a data warehouse is a centralized data repository that loads integrates stores and processes large amounts of data from multiple sources users can then query this data to perform data analysis you’ll then”
Binary Numbers: A basic explanation of the binary number system (base-2) is provided, highlighting its use in computing.
“binary has many uses in Computing it is a very convenient way of… consider that you have a lock with four different digits each digit can be a zero or a one how many potential past numbers can you have for the lock the answer is 2 to the power of four or two times two times two times two equals sixteen you are working with a binary lock therefore each digit can only be either zero or one so you can take four digits and multiply them by two every time and the total is 16. each time you add a potential digit you increase the”
Knapsack Problem: A brief overview of this optimization problem is given as a computational concept.
“three kilograms additionally each item has a value the torch equals one water equals two and the tent equals three in short the knapsack problem outlines a list of items that weigh different amounts and have different values you can only carry so many items in your knapsack the problem requires calculating the optimum combination of items you can carry if your backpack can carry a certain weight the goal is to find the best return for the weight capacity of the knapsack to compute a solution for this problem you must select all items”
This document provides a foundational overview of databases and SQL, command-line basics, version control with Git and GitHub, and introductory Python programming concepts, along with essential development tools. The content suggests a curriculum aimed at individuals learning about software development, data management, and related technologies.
Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
These resources provide a comprehensive pathway for aspiring database engineers and software developers. They cover fundamental database concepts like data modeling, SQL for data manipulation and management, database optimization, and data warehousing. Furthermore, they explore essential software development practices including Python programming, object-oriented principles, version control with Git and GitHub, software testing methodologies, and preparing for technical interviews with insights into data structures and algorithms.
Introduction to Database Engineering
This course provides a comprehensive introduction to database engineering. A straightforward description of a database is a form of electronic storage in which data is held. However, this simple explanation doesn’t fully capture the impact of database technology on global industry, government, and organizations. Almost everyone has used a database, and it’s likely that information about us is present in many databases worldwide.
Database engineering is crucial to global industry, government, and organizations. In a real-world context, databases are used in various scenarios:
Banks use databases to store data for customers, bank accounts, and transactions.
Hospitals store patient data, staff data, and laboratory data.
Online stores retain profile information, shopping history, and accounting transactions.
Social media platforms store uploaded photos.
Work environments use databases for downloading files.
Online games rely on databases.
Data in basic terms is facts and figures about anything. For example, data about a person might include their name, age, email, and date of birth, or it could be facts and figures related to an online purchase like the order number and description.
A database looks like data organized systematically, often resembling a spreadsheet or a table. This systematic organization means that all data contains elements or features and attributes by which they can be identified. For example, a person can be identified by attributes like name and age.
Data stored in a database cannot exist in isolation; it must have a relationship with other data to be processed into meaningful information. Databases establish relationships between pieces of data, for example, by retrieving a customer’s details from one table and their order recorded against another table. This is often achieved through keys. A primary key uniquely identifies each record in a table, while a foreign key is a primary key from one table that is used in another table to establish a link or relationship between the two. For instance, the customer ID in a customer table can be the primary key and then become a foreign key in an order table, thus relating the two tables.
While relational databases, which organize data into tables with relationships, are common, there are other types of databases. An object-oriented database stores data in the form of objects instead of tables or relations. An example could be an online bookstore where authors, customers, books, and publishers are rendered as classes, and the individual entries are objects or instances of these classes.
To work with data in databases, database engineers use Structured Query Language (SQL). SQL is a standard language that can be used with all relational databases like MySQL, PostgreSQL, Oracle, and Microsoft SQL Server. Database engineers establish interactions with databases to create, read, update, and delete (CRUD) data.
SQL can be divided into several sub-languages:
Data Definition Language (DDL) helps define data in the database and includes commands like CREATE (to create databases and tables), ALTER (to modify database objects), and DROP (to remove objects).
Data Manipulation Language (DML) is used to manipulate data and includes operations like INSERT (to add data), UPDATE (to modify data), and DELETE (to remove data).
Data Query Language (DQL) is used to read or retrieve data, primarily using the SELECT command.
Data Control Language (DCL) is used to control access to the database, with commands like GRANT and REVOKE to manage user privileges.
SQL offers several advantages:
It requires very little coding skills to use, consisting mainly of keywords.
Its interactivity allows developers to write complex queries quickly.
It is a standard language usable with all relational databases, leading to extensive support and information availability.
It is portable across operating systems.
Before developing a database, planning the organization of data is crucial, and this plan is called a schema. A schema is an organization or grouping of information and the relationships among them. In MySQL, schema and database are often interchangeable terms, referring to how data is organized. However, the definition of schema can vary across different database systems. A database schema typically comprises tables, columns, relationships, data types, and keys. Schemas provide logical groupings for database objects, simplify access and manipulation, and enhance database security by allowing permission management based on user access rights.
Database normalization is an important process used to structure tables in a way that minimizes challenges by reducing data duplication and avoiding data inconsistencies (anomalies). This involves converting a large table into multiple tables to reduce data redundancy. There are different normal forms (1NF, 2NF, 3NF) that define rules for table structure to achieve better database design.
As databases have evolved, they now must be able to store ever-increasing amounts of unstructured data, which poses difficulties. This growth has also led to concepts like big data and cloud databases.
Furthermore, databases play a crucial role in data warehousing, which involves a centralized data repository that loads, integrates, stores, and processes large amounts of data from multiple sources for data analysis. Dimensional data modeling, based on dimensions and facts, is often used to build databases in a data warehouse for data analytics. Databases also support data analytics, where collected data is converted into useful information to inform future decisions.
Tools like MySQL Workbench provide a unified visual environment for database modeling and management, supporting the creation of data models, forward and reverse engineering of databases, and SQL development.
Finally, interacting with databases can also be done through programming languages like Python using connectors or APIs (Application Programming Interfaces). This allows developers to build applications that interact with databases for various operations.
Understanding SQL: Language for Database Interaction
SQL (Structured Query Language) is a standard language used to interact with databases. It’s also commonly pronounced as “SQL”. Database engineers use SQL to establish interactions with databases.
Here’s a breakdown of SQL based on the provided source:
Role of SQL: SQL acts as the interface or bridge between a relational database and its users. It allows database engineers to create, read, update, and delete (CRUD) data. These operations are fundamental when working with a database.
Interaction with Databases: As a web developer or data engineer, you execute SQL instructions on a database using a Database Management System (DBMS). The DBMS is responsible for transforming SQL instructions into a form that the underlying database understands.
Applicability: SQL is particularly useful when working with relational databases, which require a language that can interact with structured data. Examples of relational databases that SQL can interact with include MySQL, PostgreSQL, Oracle, and Microsoft SQL Server.
SQL Sub-languages: SQL is divided into several sub-languages:
Data Definition Language (DDL): Helps you define data in your database. DDL commands include:
CREATE: Used to create databases and related objects like tables. For example, you can use the CREATE DATABASE command followed by the database name to create a new database. Similarly, CREATE TABLE followed by the table name and column definitions is used to create tables.
ALTER: Used to modify already created database objects, such as modifying the structure of a table by adding or removing columns (ALTER TABLE).
DROP: Used to remove objects like tables or entire databases. The DROP DATABASE command followed by the database name removes a database. The DROP COLUMN command removes a specific column from a table.
Data Manipulation Language (DML): Commands are used to manipulate data in the database and most CRUD operations fall under DML. DML commands include:
INSERT: Used to add or insert data into a table. The INSERT INTO syntax is used to add rows of data to a specified table.
UPDATE: Used to edit or modify existing data in a table. The UPDATE command allows you to specify data to be changed.
DELETE: Used to remove data from a table. The DELETE FROM syntax followed by the table name and an optional WHERE clause is used to remove data.
Data Query Language (DQL): Used to read or retrieve data from the database. The primary DQL command is:
SELECT: Used to select and retrieve data from one or multiple tables, allowing you to specify the columns you want and apply filter criteria using the WHERE clause. You can select all columns using SELECT *.
Data Control Language (DCL): Used to control access to the database. DCL commands include:
GRANT: Used to give users access privileges to data.
REVOKE: Used to revert access privileges already given to users.
Advantages of SQL: SQL is a popular language choice for databases due to several advantages:
Low coding skills required: It uses a set of keywords and requires very little coding.
Interactivity: Allows developers to write complex queries quickly.
Standard language: Can be used with all relational databases like MySQL, leading to extensive support and information availability.
Portability: Once written, SQL code can be used on any hardware and any operating system or platform where the database software is installed.
Comprehensive: Covers all areas of database management and administration, including creating databases, manipulating data, retrieving data, and managing security.
Efficiency: Allows database users to process large amounts of data quickly and efficiently.
Basic SQL Operations: SQL enables various operations on data, including:
Creating databases and tables using DDL.
Populating and modifying data using DML (INSERT, UPDATE, DELETE).
Reading and querying data using DQL (SELECT) with options to specify columns and filter data using the WHERE clause.
Sorting data using the ORDER BY clause with ASC (ascending) or DESC (descending) keywords.
Filtering data using the WHERE clause with various comparison operators (=, <, >, <=, >=, !=) and logical operators (AND, OR). Other filtering operators include BETWEEN, LIKE, and IN.
Removing duplicate rows using the SELECT DISTINCT clause.
Performing arithmetic operations using operators like +, -, *, /, and % (modulus) within SELECT statements.
Using comparison operators to compare values in WHERE clauses.
Utilizing aggregate functions (though not detailed in this initial overview but mentioned later in conjunction with GROUP BY).
Joining data from multiple tables (mentioned as necessary when data exists in separate entities). The source later details INNER JOIN, LEFT JOIN, and RIGHT JOIN clauses.
Creating aliases for tables and columns to make queries simpler and more readable.
Using subqueries (a query within another query) for more complex data retrieval.
Creating views (virtual tables based on the result of a SQL statement) to simplify data access and combine data from multiple tables.
Using stored procedures (pre-prepared SQL code that can be saved and executed).
Working with functions (numeric, string, date, comparison, control flow) to process and manipulate data.
Implementing triggers (stored programs that automatically execute in response to certain events).
Managing database transactions to ensure data integrity.
Optimizing queries for better performance.
Performing data analysis using SQL queries.
Interacting with databases using programming languages like Python through connectors and APIs.
In essence, SQL is a powerful and versatile language that is fundamental for anyone working with relational databases, enabling them to define, manage, query, and manipulate data effectively. The knowledge of SQL is a valuable skill for database engineers and is crucial for various tasks, from building and maintaining databases to extracting insights through data analysis.
Data Modeling Principles: Schema, Types, and Design
Data modeling principles revolve around creating a blueprint of how data will be organized and structured within a database system. This plan, often referred to as a schema, is essential for efficient data storage, access, updates, and querying. A well-designed data model ensures data consistency and quality.
Here are some key data modeling principles discussed in the sources:
Understanding Data Requirements: Before creating a database, it’s crucial to have a clear idea of its purpose and the data it needs to store. For example, a database for an online bookshop needs to record book titles, authors, customers, and sales. Mangata and Gallo (mng), a jewelry store, needed to store data on customers, products, and orders.
Visual Representation: A data model provides a visual representation of data elements (entities) and their relationships. This is often achieved using an Entity Relationship Diagram (ERD), which helps in planning entity-relational databases.
Different Levels of Abstraction: Data modeling occurs at different levels:
Conceptual Data Model: Provides a high-level, abstract view of the entities and their relationships in the database system. It focuses on “what” data needs to be stored (e.g., customers, products, orders as entities for mng) and how these relate.
Logical Data Model: Builds upon the conceptual model by providing a more detailed overview of the entities, their attributes, primary keys, and foreign keys. For mng, this would involve defining attributes for customers (like client ID as primary key), products, and orders, and specifying foreign keys to establish relationships (e.g., client ID in the orders table referencing the clients table).
Physical Data Model: Represents the internal schema of the database and is specific to the chosen Database Management System (DBMS). It outlines details like data types for each attribute (e.g., varchar for full name, integer for contact number), constraints (e.g., not null), and other database-specific features. SQL is often used to create the physical schema.
Choosing the Right Data Model Type: Several types of data models exist, each with its own advantages and disadvantages:
Relational Data Model: Represents data as a collection of tables (relations) with rows and columns, known for its simplicity.
Entity-Relationship Model: Similar to the relational model but presents each table as a separate entity with attributes and explicitly defines different types of relationships between entities (one-to-one, one-to-many, many-to-many).
Hierarchical Data Model: Organizes data in a tree-like structure with parent and child nodes, primarily supporting one-to-many relationships.
Object-Oriented Model: Translates objects into classes with characteristics and behaviors, supporting complex associations like aggregation and inheritance, suitable for complex projects.
Dimensional Data Model: Based on dimensions (context of measurements) and facts (quantifiable data), optimized for faster data retrieval and efficient data analytics, often using star and snowflake schemas in data warehouses.
Database Normalization: This is a crucial process for structuring tables to minimize data redundancy, avoid data modification implications (insertion, update, deletion anomalies), and simplify data queries. Normalization involves applying a series of normal forms (First Normal Form – 1NF, Second Normal Form – 2NF, Third Normal Form – 3NF) to ensure data atomicity, eliminate repeating groups, address functional and partial dependencies, and resolve transitive dependencies.
Establishing Relationships: Data in a database should be related to provide meaningful information. Relationships between tables are established using keys:
Primary Key: A value that uniquely identifies each record in a table and prevents duplicates.
Foreign Key: One or more columns in one table that reference the primary key in another table, used to connect tables and create cross-referencing.
Defining Domains: A domain is the set of legal values that can be assigned to an attribute, ensuring data in a field is well-defined (e.g., only numbers in a numerical domain). This involves specifying data types, length values, and other relevant rules.
Using Constraints: Database constraints limit the type of data that can be stored in a table, ensuring data accuracy and reliability. Common constraints include NOT NULL (ensuring fields are always completed), UNIQUE (preventing duplicate values), CHECK (enforcing specific conditions), and FOREIGN KEY (maintaining referential integrity).
Importance of Planning: Designing a data model before building the database system allows for planning how data is stored and accessed efficiently. A poorly designed database can make it hard to produce accurate information.
Considerations at Scale: For large-scale applications like those at Meta, data modeling must prioritize user privacy, user safety, and scalability. It requires careful consideration of data access, encryption, and the ability to handle billions of users and evolving product needs. Thoughtfulness about future changes and the impact of modifications on existing data models is crucial.
Data Integrity and Quality: Well-designed data models, including the use of data types and constraints, are fundamental steps in ensuring the integrity and quality of a database.
Data modeling is an iterative process that requires a deep understanding of the data, the business requirements, and the capabilities of the chosen database system. It is a crucial skill for database engineers and a fundamental aspect of database design. Tools like MySQL Workbench can aid in creating, visualizing, and implementing data models.
Understanding Version Control: Git and Collaborative Development
Version Control Systems (VCS), also known as Source Control or Source Code Management, are systems that record all changes and modifications to files for tracking purposes. The primary goal of any VCS is to keep track of changes by allowing developers access to the entire change history with the ability to revert or roll back to a previous state or point in time. These systems track different types of changes such as adding new files, modifying or updating files, and deleting files. The version control system is the source of truth across all code assets and the team itself.
There are many benefits associated with Version Control, especially for developers working in a team. These include:
Revision history: Provides a record of all changes in a project and the ability for developers to revert to a stable point in time if code edits cause issues or bugs.
Identity: All changes made are recorded with the identity of the user who made them, allowing teams to see not only when changes occurred but also who made them.
Collaboration: A VCS allows teams to submit their code and keep track of any changes that need to be made when working towards a common goal. It also facilitates peer review where developers inspect code and provide feedback.
Automation and efficiency: Version Control helps keep track of all changes and plays an integral role in DevOps, increasing an organization’s ability to deliver applications or services with high quality and velocity. It aids in software quality, release, and deployments. By having Version Control in place, teams following agile methodologies can manage their tasks more efficiently.
Managing conflicts: Version Control helps developers fix any conflicts that may occur when multiple developers work on the same code base. The history of revisions can aid in seeing the full life cycle of changes and is essential for merging conflicts.
There are two main types or categories of Version Control Systems: centralized Version Control Systems (CVCS) and distributed Version Control Systems (DVCS).
Centralized Version Control Systems (CVCS) contain a server that houses the full history of the code base and clients that pull down the code. Developers need a connection to the server to perform any operations. Changes are pushed to the central server. An advantage of CVCS is that they are considered easier to learn and offer more access controls to users. A disadvantage is that they can be slower due to the need for a server connection.
Distributed Version Control Systems (DVCS) are similar, but every user is essentially a server and has the entire history of changes on their local system. Users don’t need to be connected to the server to add changes or view history, only to pull down the latest changes or push their own. DVCS offer better speed and performance and allow users to work offline. Git is an example of a DVCS.
Popular Version Control Technologies include git and GitHub. Git is a Version Control System designed to help users keep track of changes to files within their projects. It offers better speed and performance, reliability, free and open-source access, and an accessible syntax. Git is used predominantly via the command line. GitHub is a cloud-based hosting service that lets you manage git repositories from a user interface. It incorporates Git Version Control features and extends them with features like Access Control, pull requests, and automation. GitHub is very popular among web developers and acts like a social network for projects.
Key Git concepts include:
Repository: Used to track all changes to files in a specific folder and keep a history of all those changes. Repositories can be local (on your machine) or remote (e.g., on GitHub).
Clone: To copy a project from a remote repository to your local device.
Add: To stage changes in your local repository, preparing them for a commit.
Commit: To save a snapshot of the staged changes in the local repository’s history. Each commit is recorded with the identity of the user.
Push: To upload committed changes from your local repository to a remote repository.
Pull: To retrieve changes from a remote repository and apply them to your local repository.
Branching: Creating separate lines of development from the main codebase to work on new features or bug fixes in isolation. The main branch is often the source of truth.
Forking: Creating a copy of someone else’s repository on a platform like GitHub, allowing you to make changes without affecting the original.
Diff: A command to compare changes across files, branches, and commits.
Blame: A command to look at changes of a specific file and show the dates, times, and users who made the changes.
The typical Git workflow involves three states: modified, staged, and committed. Files are modified in the working directory, then added to the staging area, and finally committed to the local repository. These local commits are then pushed to a remote repository.
Branching workflows like feature branching are commonly used. This involves creating a new branch for each feature, working on it until completion, and then merging it back into the main branch after a pull request and peer review. Pull requests allow teams to review changes before they are merged.
At Meta, Version Control is very important. They use a giant monolithic repository for all of their backend code, which means code changes are shared with every other Instagram team. While this can be risky, it allows for code reuse. Meta encourages engineers to improve any code, emphasizing that “nothing at meta is someone else’s problem”. Due to the monolithic repository, merge conflicts happen a lot, so they try to write smaller changes and add gatekeepers to easily turn off features if needed. git blame is used daily to understand who wrote specific lines of code and why, which is particularly helpful in a large organization like Meta.
Version Control is also relevant to database development. It’s easy to overcomplicate data modeling and storage, and Version Control can help track changes and potentially revert to earlier designs. Planning how data will be organized (schema) is crucial before developing a database.
Learning to use git and GitHub for Version Control is part of the preparation for coding interviews in a final course, alongside practicing interview skills and refining resumes. Effective collaboration, which is enhanced by Version Control, is a crucial skill for software developers.
Python Programming Fundamentals: An Introduction
Based on the sources, here’s a discussion of Python programming basics:
Introduction to Python:
Python is a versatile and high-level programming language available on multiple platforms. It’s used in various areas like web development, data analytics, and business forecasting. Python’s syntax is similar to English, making it intuitive and easy for beginners to understand. Experienced programmers also appreciate its power and adaptability. Python was created by Guido van Rossum and released in 1991. It was designed to be readable and has similarities to English and mathematics. Since its release, it has gained significant popularity and has a rich selection of frameworks and libraries. Currently, it’s a popular language to learn, widely used in areas such as web development, artificial intelligence, machine learning, data analytics, and various programming applications. Python is easy to learn and get started with due to its English-like syntax. It also often requires less code compared to languages like C or Java. Python’s simplicity allows developers to focus on the task at hand, making it potentially quicker to get a product to market.
Setting up a Python Environment:
To start using Python, it’s essential to ensure it works correctly on your operating system with your chosen Integrated Development Environment (IDE), such as Visual Studio Code (VS Code). This involves making sure the right version of Python is used as the interpreter when running your code.
Installation Verification: You can verify if Python is installed by opening the terminal (or command prompt on Windows) and typing python –version. This should display the installed Python version.
VS Code Setup: VS Code offers a walkthrough guide for setting up Python. This includes installing Python (if needed) and selecting the correct Python interpreter.
Running Python Code: Python code can be run in a few ways:
Python Shell: Useful for running and testing small scripts without creating .py files. You can access it by typing python in the terminal.
Directly from Command Line/Terminal: Any file with the .py extension can be run by typing python followed by the file name (e.g., python hello.py).
Within an IDE (like VS Code): IDEs provide features like auto-completion, debugging, and syntax highlighting, making coding a better experience. VS Code has a run button to execute Python files.
Basic Syntax and Concepts:
Print Statement: The print() function is used to display output to the console. It can print different types of data and allows for formatting.
Variables: Variables are used to store data that can be changed throughout the program’s lifecycle. In Python, you declare a variable by assigning a value to a name (e.g., x = 5). Python automatically assigns the data type behind the scenes. There are conventions for naming variables, such as camel case (e.g., myName). You can declare multiple variables and assign them a single value (e.g., a = b = c = 10) or perform multiple assignments on one line (e.g., name, age = “Alice”, 30). You can also delete a variable using the del keyword.
Data Types: A data type indicates how a computer system should interpret a piece of data. Python offers several built-in data types:
Numeric: Includes int (integers), float (decimal numbers), and complex numbers.
Sequence: Ordered collections of items, including:
Strings (str): Sequences of characters enclosed in single or double quotes (e.g., “hello”, ‘world’). Individual characters in a string can be accessed by their index (starting from 0) using square brackets (e.g., name). The len() function returns the number of characters in a string.
Lists: Ordered and mutable sequences of items enclosed in square brackets (e.g., [1, 2, “three”]).
Tuples: Ordered and immutable sequences of items enclosed in parentheses (e.g., (1, 2, “three”)).
Dictionary (dict): Unordered collections of key-value pairs enclosed in curly braces (e.g., {“name”: “Bob”, “age”: 25}). Values are accessed using their keys.
Boolean (bool): Represents truth values: True or False.
Set (set): Unordered collections of unique elements enclosed in curly braces (e.g., {1, 2, 3}). Sets do not support indexing.
Typecasting: The process of converting one data type to another. Python supports implicit (automatic) and explicit (using functions like int(), float(), str()) type conversion.
Input: The input() function is used to take input from the user. It displays a prompt to the user and returns their input as a string.
Operators: Symbols used to perform operations on values.
Math Operators: Used for calculations (e.g., + for addition, – for subtraction, * for multiplication, / for division).
Logical Operators: Used in conditional statements to determine true or false outcomes (and, or, not).
Control Flow: Determines the order in which instructions in a program are executed.
Conditional Statements: Used to make decisions based on conditions (if, else, elif).
Loops: Used to repeatedly execute a block of code. Python has for loops (for iterating over sequences) and while loops (repeating a block until a condition is met). Nested loops are also possible.
Functions: Modular pieces of reusable code that take input and return output. You define a function using the def keyword. You can pass data into a function as arguments and return data using the return keyword. Python has different scopes for variables: local, enclosing, global, and built-in (LEGB rule).
Data Structures: Ways to organize and store data. Python includes lists, tuples, sets, and dictionaries.
This overview provides a foundation in Python programming basics as described in the provided sources. As you continue learning, you will delve deeper into these concepts and explore more advanced topics.
Database and Python Fundamentals Study Guide
Quiz
What is a database, and what is its typical organizational structure? A database is a systematically organized collection of data. This organization commonly resembles a spreadsheet or a table, with data containing elements and attributes for identification.
Explain the role of a Database Management System (DBMS) in the context of SQL. A DBMS acts as an intermediary between SQL instructions and the underlying database. It takes responsibility for transforming SQL commands into a format that the database can understand and execute.
Name and briefly define at least three sub-languages of SQL. DDL (Data Definition Language) is used to define data structures in a database, such as creating, altering, and dropping databases and tables. DML (Data Manipulation Language) is used for operational tasks like creating, reading, updating, and deleting data. DQL (Data Query Language) is used for retrieving data from the database.
Describe the purpose of the CREATE DATABASE and CREATE TABLE DDL statements. The CREATE DATABASE statement is used to create a new, empty database within the DBMS. The CREATE TABLE statement is used within a specific database to define a new table, including specifying the names and data types of its columns.
What is the function of the INSERT INTO DML statement? The INSERT INTO statement is used to add new rows of data into an existing table in the database. It requires specifying the table name and the values to be inserted into the table’s columns.
Explain the purpose of the NOT NULL constraint when defining table columns. The NOT NULL constraint ensures that a specific column in a table cannot contain a null value. If an attempt is made to insert a new record or update an existing one with a null value in a NOT NULL column, the operation will be aborted.
List and briefly define three basic arithmetic operators in SQL. The addition operator (+) is used to add two operands. The subtraction operator (-) is used to subtract the second operand from the first. The multiplication operator (*) is used to multiply two operands.
What is the primary function of the SELECT statement in SQL, and how can the WHERE clause be used with it? The SELECT statement is used to retrieve data from one or more tables in a database. The WHERE clause is used to filter the rows returned by the SELECT statement based on specified conditions.
Explain the difference between running Python code from the Python shell and running a .py file from the command line. The Python shell provides an interactive environment where you can execute Python code snippets directly and see immediate results without saving to a file. Running a .py file from the command line executes the entire script contained within the file non-interactively.
Define a variable in Python and provide an example of assigning it a value. In Python, a variable is a named storage location that holds a value. Variables are implicitly declared when a value is assigned to them. For example: x = 5 declares a variable named x and assigns it the integer value of 5.
Answer Key
A database is a systematically organized collection of data. This organization commonly resembles a spreadsheet or a table, with data containing elements and attributes for identification.
A DBMS acts as an intermediary between SQL instructions and the underlying database. It takes responsibility for transforming SQL commands into a format that the database can understand and execute.
DDL (Data Definition Language) helps you define data structures. DML (Data Manipulation Language) allows you to work with the data itself. DQL (Data Query Language) enables you to retrieve information from the database.
The CREATE DATABASE statement establishes a new database, while the CREATE TABLE statement defines the structure of a table within a database, including its columns and their data types.
The INSERT INTO statement adds new rows of data into a specified table. It requires indicating the table and the values to be placed into the respective columns.
The NOT NULL constraint enforces that a particular column must always have a value and cannot be left empty or contain a null entry when data is added or modified.
The + operator performs addition, the – operator performs subtraction, and the * operator performs multiplication between numerical values in SQL queries.
The SELECT statement retrieves data from database tables. The WHERE clause filters the results of a SELECT query, allowing you to specify conditions that rows must meet to be included in the output.
The Python shell is an interactive interpreter for immediate code execution, while running a .py file executes the entire script from the command line without direct interaction during the process.
A variable in Python is a name used to refer to a memory location that stores a value; for instance, name = “Alice” assigns the string value “Alice” to the variable named name.
Essay Format Questions
Discuss the significance of SQL as a standard language for database management. In your discussion, elaborate on at least three advantages of using SQL as highlighted in the provided text and provide examples of how these advantages contribute to efficient database operations.
Compare and contrast the roles of Data Definition Language (DDL) and Data Manipulation Language (DML) in SQL. Explain how these two sub-languages work together to enable the creation and management of data within a relational database system.
Explain the concept of scope in Python and discuss the LEGB rule. Provide examples to illustrate the differences between local, enclosed, global, and built-in scopes and explain how Python resolves variable names based on this rule.
Discuss the importance of modules in Python programming. Explain the advantages of using modules, such as reusability and organization, and describe different ways to import modules, including the use of import, from … import …, and aliases.
Imagine you are designing a simple database for a small online bookstore. Describe the tables you would create, the columns each table would have (including data types and any necessary constraints like NOT NULL or primary keys), and provide example SQL CREATE TABLE statements for two of your proposed tables.
Glossary of Key Terms
Database: A systematically organized collection of data that can be easily accessed, managed, and updated.
Table: A structure within a database used to organize data into rows (records) and columns (fields or attributes).
Column (Field): A vertical set of data values of a particular type within a table, representing an attribute of the entities stored in the table.
Row (Record): A horizontal set of data values within a table, representing a single instance of the entity being described.
SQL (Structured Query Language): A standard programming language used for managing and manipulating data in relational databases.
DBMS (Database Management System): Software that enables users to interact with a database, providing functionalities such as data storage, retrieval, and security.
DDL (Data Definition Language): A subset of SQL commands used to define the structure of a database, including creating, altering, and dropping databases, tables, and other database objects.
DML (Data Manipulation Language): A subset of SQL commands used to manipulate data within a database, including inserting, updating, deleting, and retrieving data.
DQL (Data Query Language): A subset of SQL commands, primarily the SELECT statement, used to query and retrieve data from a database.
Constraint: A rule or restriction applied to data in a database to ensure its accuracy, integrity, and reliability. Examples include NOT NULL.
Operator: A symbol or keyword that performs an operation on one or more operands. In SQL, this includes arithmetic operators (+, -, *, /), logical operators (AND, OR, NOT), and comparison operators (=, >, <, etc.).
Schema: The logical structure of a database, including the organization of tables, columns, relationships, and constraints.
Python Shell: An interactive command-line interpreter for Python, allowing users to execute code snippets and receive immediate feedback.
.py file: A file containing Python source code, which can be executed as a script from the command line.
Variable (Python): A named reference to a value stored in memory. Variables in Python are dynamically typed, meaning their data type is determined by the value assigned to them.
Data Type (Python): The classification of data that determines the possible values and operations that can be performed on it (e.g., integer, string, boolean).
String (Python): A sequence of characters enclosed in single or double quotes, used to represent text.
Scope (Python): The region of a program where a particular name (variable, function, etc.) is accessible. Python has four main scopes: local, enclosed, global, and built-in (LEGB).
Module (Python): A file containing Python definitions and statements. Modules provide a way to organize code into reusable units.
Import (Python): A statement used to load and make the code from another module available in the current script.
Alias (Python): An alternative name given to a module or function during import, often used for brevity or to avoid naming conflicts.
Briefing Document: Review of “01.pdf”
This briefing document summarizes the main themes and important concepts discussed in the provided excerpts from “01.pdf”. The document covers fundamental database concepts using SQL, basic command-line operations, an introduction to Python programming, and related software development tools.
I. Introduction to Databases and SQL
The document introduces the concept of databases as systematically organized data, often resembling spreadsheets or tables. It highlights the widespread use of databases in various applications, providing examples like banks storing account and transaction data, and hospitals managing patient, staff, and laboratory information.
“well a database looks like data organized systematically and this organization typically looks like a spreadsheet or a table”
The core purpose of SQL (Structured Query Language) is explained as a language used to interact with databases. Key operations that can be performed using SQL are outlined:
“operational terms create add or insert data read data update existing data and delete data”
SQL is further divided into several sub-languages:
DDL (Data Definition Language): Used to define the structure of the database and its objects like tables. Commands like CREATE (to create databases and tables) and ALTER (to modify existing objects, e.g., adding a column) are part of DDL.
“ddl as the name says helps you define data in your database but what does it mean to Define data before you can store data in the database you need to create the database and related objects like tables in which your data will be stored for this the ddl part of SQL has a command named create then you might need to modify already created database objects for example you might need to modify the structure of a table by adding a new column you can perform this task with the ddl alter command you can remove an object like a table from a”
DML (Data Manipulation Language): Used to manipulate the data within the database, including inserting (INSERT INTO), updating, and deleting data.
“now we need to populate the table of data this is where I can use the data manipulation language or DML subset of SQL to add table data I use the insert into syntax this inserts rows of data into a given table I just type insert into followed by the table name and then a list of required columns or Fields within a pair of parentheses then I add the values keyword”
DQL (Data Query Language): Primarily used for querying or retrieving data from the database (SELECT statements fall under this category).
DCL (Data Control Language): Used to control access and security within the database.
The document emphasizes that a DBMS (Database Management System) is crucial for interpreting and executing SQL instructions, acting as an intermediary between the SQL commands and the underlying database.
“a database interprets and makes sense of SQL instructions with the use of a database management system or dbms as a web developer you’ll execute all SQL instructions on a database using a dbms the dbms takes responsibility for transforming SQL instructions into a form that’s understood by the underlying database”
The advantages of using SQL are highlighted, including its simplicity, standardization, portability, comprehensiveness, and efficiency in processing large amounts of data.
“you now know that SQL is a simple standard portable comprehensive and efficient language that can be used to delete data retrieve and share data among multiple users and manage database security this is made possible through subsets of SQL like ddl or data definition language DML also known as data manipulation language dql or data query language and DCL also known as data control language and the final advantage of SQL is that it lets database users process large amounts of data quickly and efficiently”
Examples of basic SQL syntax are provided, such as creating a database (CREATE DATABASE College;) and creating a table (CREATE TABLE student ( … );). The INSERT INTO syntax for adding data to a table is also introduced.
Constraints like NOT NULL are mentioned as ways to enforce data integrity during table creation.
“the creation of a new customer record is aborted the not null default value is implemented using a SQL statement a typical not null SQL statement begins with the creation of a basic table in the database I can write a create table Clause followed by customer to define the table name followed by a pair of parentheses within the parentheses I add two columns customer ID and customer name I also Define each column with relevant data types end for customer ID as it stores”
SQL arithmetic operators (+, -, *, /, %) are introduced with examples. Logical operators (NOT, OR) and special operators (IN, BETWEEN) used in the WHERE clause for filtering data are also explained. The concept of JOIN clauses, including SELF-JOIN, for combining data from tables is briefly touched upon.
Subqueries (inner queries within outer queries) and Views (virtual tables based on the result of a query) are presented as advanced SQL concepts. User-defined functions and triggers are also introduced as ways to extend database functionality and automate actions. Prepared statements are mentioned as a more efficient way to execute SQL queries repeatedly. Date and time functions in MySQL are briefly covered.
II. Introduction to Command Line/Bash Shell
The document provides a basic introduction to using the command line or bash shell. Fundamental commands are explained:
PWD (Print Working Directory): Shows the current directory.
“to do that I run the PWD command PWD is short for print working directory I type PWD and press the enter key the command returns a forward slash which indicates that I’m currently in the root directory”
LS (List): Displays the contents of the current directory. The -l flag provides a detailed list format.
“if I want to check the contents of the root directory I run another command called LS which is short for list I type LS and press the enter key and now notice I get a list of different names of directories within the root level in order to get more detail of what each of the different directories represents I can use something called a flag flags are used to set options to the commands you run use the list command with a flag called L which means the format should be printed out in a list format I type LS space Dash l press enter and this Returns the results in a list structure”
CD (Change Directory): Navigates between directories using relative or absolute paths. cd .. moves up one directory.
“to step back into Etc type cdetc to confirm that I’m back there type bwd and enter if I want to use the other alternative you can do an absolute path type in CD forward slash and press enter Then I type PWD and press enter you can verify that I am back at the root again to step through multiple directories use the same process type CD Etc and press enter check the contents of the files by typing LS and pressing enter”
MKDIR (Make Directory): Creates a new directory.
“now I will create a new directory called submissions I do this by typing MK der which stands for make directory and then the word submissions this is the name of the directory I want to create and then I hit the enter key I then type in ls-l for list so that I can see the list structure and now notice that a new directory called submissions has been created I can then go into this”
TOUCH: Creates a new empty file.
“the Parent Directory next is the touch command which makes a new file of whatever type you specify for example to build a brand new file you can run touch followed by the new file’s name for instance example dot txt note that the newly created file will be empty”
HISTORY: Shows a history of recently used commands.
“to view a history of the most recently typed commands you can use the history command”
File Redirection (>, >>, <): Allows redirecting the input or output of commands to files. > overwrites, >> appends.
“if you want to control where the output goes you can use a redirection how do we do that enter the ls command enter Dash L to print it as a list instead of pressing enter add a greater than sign redirection now we have to tell it where we want the data to go in this scenario I choose an output.txt file the output dot txt file has not been created yet but it will be created based on the command I’ve set here with a redirection flag press enter type LS then press enter again to display the directory the output file displays to view the”
GREP: Searches for patterns within files.
“grep stands for Global regular expression print and it’s used for searching across files and folders as well as the contents of files on my local machine I enter the command ls-l and see that there’s a file called”
CAT: Displays the content of a file.
LESS: Views file content page by page.
“press the q key to exit the less environment the other file is the bash profile file so I can run the last command again this time with DOT profile this tends to be used used more for environment variables for example I can use it for setting”
VIM: A text editor used for creating and editing files.
“now I will create a simple shell script for this example I will use Vim which is an editor that I can use which accepts input so type vim and”
CHMOD: Changes file permissions, including making a file executable (chmod +x filename).
“but I want it to be executable which requires that I have an X being set on it in order to do that I have to use another command which is called chmod after using this them executable within the bash shell”
The document also briefly mentions shell scripts (files containing a series of commands) and environment variables (dynamic named values that can affect the way running processes will behave on a computer).
III. Introduction to Git and GitHub
Git is introduced as a free, open-source distributed version control system used to manage source code history, track changes, revert to previous versions, and collaborate with other developers. Key Git commands mentioned include:
GIT CLONE: Used to create a local copy of a remote repository (e.g., from GitHub).
“to do this I type the command git clone and paste the https URL I copied earlier finally I press enter on my keyboard notice that I receive a message stating”
LS -LA: Lists all files in a directory, including hidden ones (like the .git directory which contains the Git repository metadata).
“the ls-la command another file is listed which is just named dot get you will learn more about this later when you explore how to use this for Source control”
CD .git: Changes the current directory to the .git folder.
“first open the dot get folder on your terminal type CD dot git and press enter”
CAT HEAD: Displays the reference to the current commit.
“next type cat head and press enter in git we only work on a single Branch at a time this file also exists inside the dot get folder under the refs forward slash heads path”
CAT refs/heads/main: Displays the hash of the last commit on the main branch.
“type CD dot get and press enter next type cat forward slash refs forward slash heads forward slash main press enter after you”
GIT PULL: Fetches changes from a remote repository and integrates them into the local branch.
“I am now going to explain to you how to pull the repository to your local device”
GitHub is described as a cloud-based hosting service for Git repositories, offering a user interface for managing Git projects and facilitating collaboration.
IV. Introduction to Python Programming
The document introduces Python as a versatile programming language and outlines different ways to run Python code:
Python Shell: An interactive environment for running and testing small code snippets without creating separate files.
“the python shell is useful for running and testing small scripts for example it allows you to run code without the need for creating new DOT py files you start by adding Snippets of code that you can run directly in the shell”
Running Python Files: Executing Python code stored in files with the .py extension using the python filename.py command.
“running a python file directly from the command line or terminal note that any file that has the file extension of dot py can be run by the following command for example type python then a space and then type the file”
Basic Python concepts covered include:
Variables: Declaring and assigning values to variables (e.g., x = 5, name = “Alice”). Python automatically infers data types. Multiple variables can be assigned the same value (e.g., a = b = c = 10).
“all I have to do is name the variable for example if I type x equals 5 I have declared a variable and assigned as a value I can also print out the value of the variable by calling the print statement and passing in the variable name which in this case is X so I type print X when I run the program I get the value of 5 which is the assignment since I gave the initial variable Let Me Clear My screen again you have several options when it comes to declaring variables you can declare any different type of variable in terms of value for example X could equal a string called hello to do this I type x equals hello I can then print the value again run it and I find the output is the word hello behind the scenes python automatically assigns the data type for you”
Data Types: Basic data types like integers, floats (decimal numbers), complex numbers, strings (sequences of characters enclosed in single or double quotes), lists, and tuples (ordered, immutable sequences) are introduced.
“X could equal a string called hello to do this I type x equals hello I can then print the value again run it and I find the output is the word hello behind the scenes python automatically assigns the data type for you you’ll learn more about this in an upcoming video on data types you can declare multiple variables and assign them to a single value as well for example making a b and c all equal to 10. I do this by typing a equals b equals C equals 10. I print all three… sequence types are classed as container types that contain one or more of the same type in an ordered list they can also be accessed based on their index in the sequence python has three different sequence types namely strings lists and tuples let’s explore each of these briefly now starting with strings a string is a sequence of characters that is enclosed in either a single or double quotes strings are represented by the string class or Str for”
Operators: Arithmetic operators (+, -, *, /, **, %, //) and logical operators (and, or, not) are explained with examples.
“example 7 multiplied by four okay now let’s explore logical operators logical operators are used in Python on conditional statements to determine a true or false outcome let’s explore some of these now first logical operator is named and this operator checks for all conditions to be true for example a is greater than five and a is less than 10. the second logical operator is named or this operator checks for at least one of the conditions to be true for example a is greater than 5 or B is greater than 10. the final operator is named not this”
Conditional Statements: if, elif (else if), and else statements are introduced for controlling the flow of execution based on conditions.
“The Logical operators are and or and not let’s cover the different combinations of each in this example I declare two variables a equals true and B also equals true from these variables I use an if statement I type if a and b colon and on the next line I type print and in parentheses in double quotes”
Loops: for loops (for iterating over sequences) and while loops are introduced with examples, including nested loops.
“now let’s break apart the for Loop and discover how it works the variable item is a placeholder that will store the current letter in the sequence you may also recall that you can access any character in the sequence by its index the for Loop is accessing it in the same way and assigning the current value to the item variable this allows us to access the current character to print it for output when the code is run the outputs will be the letters of the word looping each letter on its own line now that you know about looping constructs in Python let me demonstrate how these work further using some code examples to Output an array of tasty desserts python offers us multiple ways to do loops or looping you’ll Now cover the for loop as well as the while loop let’s start with the basics of a simple for Loop to declare a for loop I use the four keyword I now need a variable to put the value into in this case I am using I I also use the in keyword to specify where I want to Loop over I add a new function called range to specify the number of items in a range in this case I’m using 10 as an example next I do a simple print statement by pressing the enter key to move to a new line I select the print function and within the brackets I enter the name looping and the value of I then I click on the Run button the output indicates the iteration Loops through the range of 0 to 9.”
Functions: Defining and calling functions using the def keyword. Functions can take arguments and return values. Examples of using *args (for variable positional arguments) and **kwargs (for variable keyword arguments) are provided.
“I now write a function to produce a string out of this information I type def contents and then self in parentheses on the next line I write a print statement for the string the plus self dot dish plus has plus self dot items plus and takes plus self dot time plus Min to prepare here we’ll use the backslash character to force a new line and continue the string on the following line for this to print correctly I need to convert the self dot items and self dot time… let’s say for example you wanted to calculate a total bill for a restaurant a user got a cup of coffee that was 2.99 then they also got a cake that was 455 and also a juice for 2.99. the first thing I could do is change the for Loop let’s change the argument to quarks by”
File Handling: Opening, reading (using read, readline, readlines), and writing to files. The importance of closing files is mentioned.
“the third method to read files in Python is read lines let me demonstrate this method the read lines method reads the entire contents of the file and then returns it in an ordered list this allows you to iterate over the list or pick out specific lines based on a condition if for example you have a file with four lines of text and pass a length condition the read files function will return the output all the lines in your file in the correct order files are stored in directories and they have”
Recursion: The concept of a function calling itself is briefly illustrated.
“the else statement will recursively call the slice function but with a modified string every time on the next line I add else and a colon then on the next line I type return string reverse Str but before I close the parentheses I add a slice function by typing open square bracket the number 1 and a colon followed by”
Object-Oriented Programming (OOP): Basic concepts of classes (using the class keyword), objects (instances of classes), attributes (data associated with an object), and methods (functions associated with an object, with self as the first parameter) are introduced. Inheritance (creating new classes based on existing ones) is also mentioned.
“method inside this class I want this one to contain a new function called leave request so I type def Leaf request and then self in days as the variables in parentheses the purpose of the leave request function is to return a line that specifies the number of days requested to write this I type return the string may I take a leave for plus Str open parenthesis the word days close parenthesis plus another string days now that I have all the classes in place I’ll create a few instances from these classes one for a supervisor and two others for… you will be defining a function called D inside which you will be creating another nested function e let’s write the rest of the code you can start by defining a couple of variables both of which will be called animal the first one inside the D function and the second one inside the E function note how you had to First declare the variable inside the E function as non-local you will now add a few more print statements for clarification for when you see the outputs finally you have called the E function here and you can add one more variable animal outside the D function this”
Modules: The concept of modules (reusable blocks of code in separate files) and how to import them using the import statement (e.g., import math, from math import sqrt, import math as m). The benefits of modular programming (scope, reusability, simplicity) are highlighted. The search path for modules (sys.path) is mentioned.
“so a file like sample.py can be a module named Sample and can be imported modules in Python can contain both executable statements and functions but before you explore how they are used it’s important to understand their value purpose and advantages modules come from modular programming this means that the functionality of code is broken down into parts or blocks of code these parts or blocks have great advantages which are scope reusability and simplicity let’s delve deeper into these everything in… to import and execute modules in Python the first important thing to know is that modules are imported only once during execution if for example your import a module that contains print statements print Open brackets close brackets you can verify it only executes the first time you import the module even if the module is imported multiple times since modules are built to help you Standalone… I will now import the built-in math module by typing import math just to make sure that this code works I’ll use a print statement I do this by typing print importing the math module after this I’ll run the code the print statement has executed most of the modules that you will come across especially the built-in modules will not have any print statements and they will simply be loaded by The Interpreter now that I’ve imported the math module I want to use a function inside of it let’s choose the square root function sqrt to do this I type the words math dot sqrt when I type the word math followed by the dot a list of functions appears in a drop down menu and you can select sqrt from this list I passed 9 as the argument to the math.sqrt function assign this to a variable called root and then I print it the number three the square root of nine has been printed to the terminal which is the correct answer instead of importing the entire math module as we did above there is a better way to handle this by directly importing the square root function inside the scope of the project this will prevent overloading The Interpreter by importing the entire math module to do this I type from math import sqrt when I run this it displays an error now I remove the word math from the variable declaration and I run the code again this time it works next let’s discuss something called an alias which is an excellent way of importing different modules here I sign an alias called m to the math module I do this by typing import math as m then I type cosine equals m dot I”
Scope: The concepts of local, enclosed, global, and built-in scopes in Python (LEGB rule) and how variable names are resolved. Keywords global and nonlocal for modifying variable scope are mentioned.
“names of different attributes defined inside it in this way modules are a type of namespace name spaces and Scopes can become very confusing very quickly and so it is important to get as much practice of Scopes as possible to ensure a standard of quality there are four main types of Scopes that can be defined in Python local enclosed Global and built in the practice of trying to determine in which scope a certain variable belongs is known as scope resolution scope resolution follows what is known commonly as the legb rule let’s explore these local this is where the first search for a variable is in the local scope enclosed this is defined inside an enclosing or nested functions Global is defined at the uppermost level or simply outside functions and built-in which is the keywords present in the built-in module in simpler terms a variable declared inside a function is local and the ones outside the scope of any function generally are global here is an example the outputs for the code on screen shows the same variable name Greek in different scopes… keywords that can be used to change the scope of the variables Global and non-local the global keyword helps us access the global variables from within the function non- local is a special type of scope defined in Python that is used within the nested functions only in the condition that it has been defined earlier in the enclosed functions now you can write a piece of code that will better help you understand the idea of scope for an attributes you have already created a file called animalfarm.py you will be defining a function called D inside which you will be creating another nested function e let’s write the rest of the code you can start by defining a couple of variables both of which will be called animal the first one inside the D function and the second one inside the E function note how you had to First declare the variable inside the E function as non-local you will now add a few more print statements for clarification for when you see the outputs finally you have called the E function here and you can add one more variable animal outside the D function this”
Reloading Modules: The reload() function for re-importing and re-executing modules that have already been loaded.
“statement is only loaded once by the python interpreter but the reload function lets you import and reload it multiple times I’ll demonstrate that first I create a new file sample.py and I add a simple print statement named hello world remember that any file in Python can be used as a module I’m going to use this file inside another new file and the new file is named using reloads.py now I import the sample.py module I can add the import statement multiple times but The Interpreter only loads it once if it had been reloaded we”
Testing: Introduction to writing test cases using the assert keyword and the pytest framework. The convention of naming test functions with the test_ prefix is mentioned. Test-Driven Development (TDD) is briefly introduced.
“another file called test Edition dot Pi in which I’m going to write my test cases now I import the file that consists of the functions that need to be tested next I’ll also import the pi test module after that I Define a couple of test cases with the addition and subtraction functions each test case should be named test underscore then the name of the function to be tested in our case we’ll have test underscore add and test underscore sub I’ll use the assert keyword inside these functions because tests primarily rely on this keyword it… contrary to the conventional approach of writing code I first write test underscore find string Dot py and then I add the test function named test underscore is present in accordance with the test I create another file named file string dot py in which I’ll write the is present function I Define the function named is present and I pass an argument called person in it then I make a list of names written as values after that I create a simple if else condition to check if the past argument”
V. Software Development Tools and Concepts
The document mentions several tools and concepts relevant to software development:
Python Installation and Version: Checking the installed Python version using python –version.
“prompt type python dash dash version to identify which version of python is running on your machine if python is correctly installed then Python 3 should appear in your console this means that you are running python 3. there should also be several numbers after the three to indicate which version of Python 3 you are running make sure these numbers match the most recent version on the python.org website if you see a message that states python not found then review your python installation or relevant document on”
Jupyter Notebook: An interactive development environment (IDE) for Python. Installation using python -m pip install jupyter and running using jupyter notebook are mentioned.
“course you’ll use the Jupiter put her IDE to demonstrate python to install Jupiter type python-mpip install Jupiter within your python environment then follow the jupyter installation process once you’ve installed jupyter type jupyter notebook to open a new instance of the jupyter notebook to use within your default browser”
MySQL Connector: A Python library used to connect Python applications to MySQL databases.
“the next task is to connect python to your mySQL database you can create the installation using a purpose-built python Library called MySQL connector this library is an API that provides useful”
Datetime Library: Python’s built-in module for working with dates and times. Functions like datetime.now(), datetime.date(), datetime.time(), and timedelta are introduced.
“python so you can import it without requiring pip let’s review the functions that Python’s daytime Library offers the date time Now function is used to retrieve today’s date you can also use date time date to retrieve just the date or date time time to call the current time and the time Delta function calculates the difference between two values now let’s look at the Syntax for implementing date time to import the daytime python class use the import code followed by the library name then use the as keyword to create an alias of… let’s look at a slightly more complex function time Delta when making plans it can be useful to project into the future for example what date is this same day next week you can answer questions like this using the time Delta function to calculate the difference between two values and return the result in a python friendly format so to find the date in seven days time you can create a new variable called week type the DT module and access the time Delta function as an object 563 instance then pass through seven days as an argument finally”
MySQL Workbench: A graphical tool for working with MySQL databases, including creating schemas.
“MySQL server instance and select the schema menu to create a new schema select the create schema option from the menu pane in the schema toolbar this action opens a new window within this new window enter mg underscore schema in the database name text field select apply this generates a SQL script called create schema mg schema you 606 are then asked to review the SQL script to be applied to your new database click on the apply button within the review window if you’re satisfied with the script a new window”
Data Warehousing: Briefly introduces the concept of a centralized data repository for integrating and processing large amounts of data from multiple sources for analysis. Dimensional data modeling is mentioned.
“in the next module you’ll explore the topic of data warehousing in this module you’ll learn about the architecture of a data warehouse and build a dimensional data model you’ll begin with an overview of the concept of data warehousing you’ll learn that a data warehouse is a centralized data repository that loads integrates stores and processes large amounts of data from multiple sources users can then query this data to perform data analysis you’ll then”
Binary Numbers: A basic explanation of the binary number system (base-2) is provided, highlighting its use in computing.
“binary has many uses in Computing it is a very convenient way of… consider that you have a lock with four different digits each digit can be a zero or a one how many potential past numbers can you have for the lock the answer is 2 to the power of four or two times two times two times two equals sixteen you are working with a binary lock therefore each digit can only be either zero or one so you can take four digits and multiply them by two every time and the total is 16. each time you add a potential digit you increase the”
Knapsack Problem: A brief overview of this optimization problem is given as a computational concept.
“three kilograms additionally each item has a value the torch equals one water equals two and the tent equals three in short the knapsack problem outlines a list of items that weigh different amounts and have different values you can only carry so many items in your knapsack the problem requires calculating the optimum combination of items you can carry if your backpack can carry a certain weight the goal is to find the best return for the weight capacity of the knapsack to compute a solution for this problem you must select all items”
This document provides a foundational overview of databases and SQL, command-line basics, version control with Git and GitHub, and introductory Python programming concepts, along with essential development tools. The content suggests a curriculum aimed at individuals learning about software development, data management, and related technologies.
Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
The provided text is a SQL tutorial. It covers fundamental SQL commands such as CREATE TABLE, INSERT, UPDATE, DELETE, TRUNCATE, and DROP, explains data types (CHAR vs. VARCHAR), and demonstrates various query techniques, including joins, subqueries, window functions, and the use of CASE statements. The tutorial also introduces stored procedures, triggers, error handling, and pivot/unpivot operations, using practical examples and exercises to illustrate these concepts. Finally, it shows how to create and use user-defined functions.
SQL Study Guide
Quiz
Instructions: Answer each question in 2-3 sentences.
What is the primary role of SQL in data-related fields, and name three specific roles that require it?
Beyond SQL, what other skills are essential for a data analyst, and why are they crucial for success in the job?
Explain the client-server model in the context of SQL, including how requests and responses are exchanged and the role of SQL.
Describe the hierarchical structure of a SQL server, from the server level down to individual data elements.
What is the difference between DDL and DML? Give examples of commands for each.
How do GRANT and REVOKE statements contribute to data security, and who are the typical users for each?
Explain the function of ROLLBACK, COMMIT, and SAVEPOINT in transaction control, and give an example of their purpose.
What is the main difference between the TRUNCATE and DELETE commands, and which is generally faster and why?
Describe the difference between the data types CHAR and VARCHAR, and give examples of their use cases.
What is a primary key constraint, and what two rules must be followed for a value to meet this constraint?
Quiz Answer Key
SQL is a fundamental language for interacting with databases and is essential for data engineers, data analysts, and data scientists. All three roles require a strong understanding of SQL to manage, query, and analyze data.
A data analyst needs business fundamentals alongside SQL, PowerBI, and Tableau because every business operates differently, meaning understanding the industry is essential for drawing meaningful conclusions from data.
In the client-server model, a client (like a management tool) sends SQL requests to a server, which then responds with the requested data. SQL is the language used for this communication between the client and the server.
A SQL server is organized with a server at the top, followed by multiple databases, each containing tables, which are made up of rows and columns. Additionally, schemas define the relationships between these tables.
DDL (Data Definition Language) is used to define the structure of the database, such as CREATE, ALTER, and DROP table commands. DML (Data Manipulation Language) is used to manage the actual data, including INSERT, UPDATE, and DELETE commands.
GRANT and REVOKE statements manage user permissions, granting access to specific database operations while ensuring the correct level of access for different users. Typically, developers get INSERT, UPDATE, and DELETE access, while end-users might only get SELECT.
ROLLBACK undoes recent changes, COMMIT finalizes them, and SAVEPOINT creates intermediate markers to return to. For example, after doing several inserts, ROLLBACK would revert the table to its state before the changes.
TRUNCATE removes all records from a table and recreates it (a DDL operation) and is faster since it does not log each row removal. DELETE removes rows one by one (a DML operation) and is slower.
CHAR is a fixed-size data type that allocates space, regardless of how much data is used (like gender ‘M’ or ‘F’). VARCHAR is a variable-size data type that only uses space based on the size of the stored data (like a name).
A primary key constraint ensures that values are unique and not null. No duplicate values or null values are allowed within a primary key column.
Essay Questions
Instructions: Answer each question in an essay format.
Discuss the different types of SQL commands (DDL, DML, DCL, TCL), explain their purposes, and describe how they are used in a real-world database management context.
Compare and contrast the use of DELETE, TRUNCATE, and DROP commands. Explain when each command should be used and discuss their implications for data and database structure.
Explain how you would use SQL functions to manipulate and extract data, providing examples of string, numerical, and date-related functions, along with real-world use cases.
Describe what window functions are, explain their purpose, and describe the differences between RANK, DENSE_RANK, and ROW_NUMBER, and provide a scenario where using PARTITION BY would be beneficial.
Explain what subqueries are, their purpose, and how they can be used within SQL queries, giving examples of scenarios where they might be used and when they are more useful than a JOIN operation.
Glossary
SQL (Structured Query Language): A standard language for accessing and manipulating databases.
DBMS (Database Management System): Software that manages databases and allows for storage, retrieval, and modification of data.
Server: A computer program or system that provides services to other computer programs (clients).
Client: A computer program that requests services from a server.
SSMS (SQL Server Management Studio): A Microsoft tool used to manage and interact with SQL Server databases.
DDL (Data Definition Language): SQL commands used to define the structure of a database (e.g., CREATE, ALTER, DROP).
DML (Data Manipulation Language): SQL commands used to manipulate data within a database (e.g., INSERT, UPDATE, DELETE).
DCL (Data Control Language): SQL commands used to control access to data and database objects (e.g., GRANT, REVOKE).
TCL (Transaction Control Language): SQL commands used to manage transactions within a database (e.g., COMMIT, ROLLBACK, SAVEPOINT).
Schema: A blueprint or structure of a database, including tables, relationships, and constraints.
Table: A data structure used to store data in rows and columns within a database.
Row: A horizontal set of data in a table, also known as a record.
Column: A vertical set of data in a table, representing a specific attribute or type of data.
Primary Key: A unique identifier for each record in a table and that cannot contain null values.
Foreign Key: A field in a table that refers to the primary key of another table to establish relationships.
Constraint: A rule that enforces the integrity of data in a database (e.g., primary key, foreign key).
CHAR: A fixed-length string data type.
VARCHAR: A variable-length string data type.
Transaction: A sequence of operations performed as a single logical unit of work on a database.
Rollback: The process of undoing changes made during a transaction.
Commit: The process of saving changes made during a transaction.
Savepoint: A point within a transaction to which you can roll back changes.
Truncate: A command that removes all data from a table, and is faster than delete.
Drop: A command that removes an entire database object, such as a table.
Index: A data structure that improves the speed of data retrieval operations on a database table.
Clustered Index: A special type of index that physically sorts and stores the data rows of a table based on the indexed columns. A table can have only one clustered index.
Non-Clustered Index: A data structure that provides a faster access to data based on indexed columns but the data is not stored in a sorted order. A table can have many non-clustered indexes.
View: A virtual table based on the result-set of a SQL statement, not physically stored like a regular table.
Function: A block of code that performs a specific task and may return a value.
Stored Procedure: A set of SQL statements stored in a database for reusable operations.
Trigger: A SQL procedure that is automatically executed in response to certain events on a particular table.
Subquery: A query embedded inside another query, often in the WHERE or FROM clause.
CTE (Common Table Expression): A temporary named result set used within a single query, that is not stored in the database.
Pivot: A process of converting rows to columns to summarize data.
Unpivot: A process of converting columns to rows, often to normalize or standardize data.
Window Functions: Functions that operate on a set of rows (a window) related to the current row, which includes functions like RANK, DENSE_RANK, and ROW_NUMBER.
Moving Average: A calculation of the average of a certain number of data points, that is used for data smoothing.
Epoke Time: A system for tracking points in time as a count of seconds since 1970.
SQL Fundamentals for Data Professionals
Okay, here’s a detailed briefing document summarizing the main themes and important ideas from the provided sources, complete with quotes:
Briefing Document: SQL Fundamentals, Data Roles, and Advanced SQL Concepts
Introduction
This document summarizes key concepts and practical applications of SQL as presented in the provided source material. The focus is on SQL as a foundational skill for various data-related roles, core SQL concepts, and advanced techniques such as window functions, subqueries, views, stored procedures, security, indexing, and data transformation (pivot/unpivot). The training materials highlight the importance of hands-on practice and deep understanding of error messages.
I. SQL as a Core Skill for Data Professionals
SQL is foundational for various data roles: The source emphasizes that SQL is an essential skill for data analysts, data engineers, and data scientists.
“that either you be data engineer or you be a data analyst you need SQL okay”
Specific Tech Stack: Different roles require different tools along with SQL:
Data Analyst: “learn SQL along with learn powerb and Tableau”
Data Engineer: “learn SQL and learn Informatica learn talent talent and Learn Python”
Data Scientist: “learn SQL Learn Python learn machine learning”
Importance of Business Knowledge: SQL skills must be complemented by business acumen:
“data analy job is not only learning SQL what query to write what table to fetch the data from how to build a chart he can do this only if he know a business correct if you don’t know the business you can’t do it”
Purpose-Driven Learning: Learning SQL should be intentional, to understand why and how tools like PowerBI or Python are needed.
“now whenever someone teaches you powerbi you know why powerb I’m learning whenever teacher you python you’ll learn why python I’m learning you you’ll be knowing that beforehand in that case you can write ask him right questions”
SSMS as the primary tool: The course uses SQL Server Management Studio (SSMS) as the primary tool.
II. Core SQL Concepts
SQL Server Architecture: The server has a client component (like SSMS) which makes requests, and the server sends responses. The communication is done using the SQL language.
“so how it works so you send a request to server and server will respond back to you as a response…when server and client is talking even they need a language and that language itself is called SQL structured query language”
Database Hierarchy: A SQL server contains multiple databases, each with multiple tables, and tables contain rows and columns. Related tables form a schema.
“a database server will have multiple components inside it see it will have multiple databases it will have multiple databases database DB1 it could be db2 it could be db3…and the set of tables which are connected to each other with relationship is called what… schema”
SQL Language Subsets: SQL is broken down into:
DDL (Data Definition Language): For defining the structure (skeleton) of the database (e.g., CREATE, ALTER, DROP, TRUNCATE)
“anything which deals with the skeleton of your database like create the table alter the table alter means remove a column from the table add a column from the table drop remove the table truncate the table”
DML (Data Manipulation Language): For working with actual data (e.g., INSERT, UPDATE, DELETE)
“once you have the skeleton next is what you have to populate the data right…insert means add some data delete means remove some data update means change some data”
DCL (Data Control Language): For managing security permissions (e.g., GRANT, REVOKE)
“proper accesses should be given to the right people so Grant and revoke statements will take care about it”
TCL (Transaction Control Language): For managing transactions (e.g., COMMIT, ROLLBACK, SAVEPOINT).
“then TCL is undo redo all those things har what do you mean by undo redo suppose you are executing some commands one 1 2 3 4 5 6 7 eight commands then later you realized oh something went wrong…at that time I will hit a roll back command”
Importance of Error Messages: Reading error messages is critical for learning.
“whenever you hit an error always read the error guys 99% of the new developers ignore this suggestion that’s the reason they will struggle in the initial days”
Data Types: Understanding INT, VARCHAR, CHAR is important:
CHAR is fixed-size storage, and VARCHAR is variable size, where space is only allocated when used.
III. Practical SQL Examples and Hands-On Learning
Table Creation & Manipulation: The source covers how to create tables (using CREATE TABLE) with different data types and how to insert data.
Example: CREATE TABLE countries (countryID INT, countryCode VARCHAR(2), countryName VARCHAR(20));
Data Insertion: Insert data into tables using the INSERT INTO command.
Data Updates: Use UPDATE to change data based on conditions using a WHERE clause.
“update countries so which which column value you want to set country code Type country code country code you have to tell set before that you have to tell set set that’s the keyword”
Data Deletion: Use DELETE to remove rows based on conditions using a WHERE clause.
“delete from countries where this what do you expect after I execute this command only two IDs will get deleted”
Table Truncation & Dropping:TRUNCATE removes all data but keeps the table structure (DDL operation).
“truncate means yes truncate means drop the table and recreate the table two things is happening inside truncate first the table is dropped and then table is recreated”
DROP removes both the table structure and data (DDL operation).
“drop table countries what this will do what this will do now it will delete the data as well as structure both”
Altering Tables: Modify table structures, add or change columns, using the ALTER TABLE statement.
“alter table countries alter column which column I want to alter country code what should be my new data type care of three”
Constraints: Primary keys (PRIMARY KEY) are used to ensure uniqueness and non-null values and help prevent duplicate data.
“primary key what is that keyword doing that was not there earlier when I have written my first table it’s a constraint…not null plus unique”
Importance of Practice: Regular practice is essential for mastering SQL.
“writing it’s very simple but when you try to write it right that’s where you’ll feel difficulty when I’m doing it looks very easy but when you are doing on your own you you you’ll not able you’ll not be able to write even one line in order to overcome that you have to regularly practice there is no substitute for learning SQL other than practice”
IV. Advanced SQL Topics
String & Number Functions: SQL offers functions for string manipulation (e.g., LEFT, RIGHT, SUBSTRING, LEN, UPPER, CONVERT) and numerical manipulations.
Date Functions: SQL has functions for working with dates (e.g., GETDATE, YEAR, MONTH, DAY)
“if I give you a date can you show me which year it is Select year of year of get date what do you think this output will give me”
Window Functions: Used for calculations across sets of rows within a result set (e.g., RANK, DENSE_RANK, ROW_NUMBER) with PARTITION BY.
“now if I execute this you’ll see numbers will continue now you’ll say people if this is descending order so can I make it as ascending order just change this to ASC ASC ascending order”
“what addition thing that I write along with this only this part I have written right correct only that part I have written now I’ll execute and see can you see this 1 2 3 department number 10 1 1 2 3 4 5 6 department number 20 1 1 2 2 3 4 1 and one don’t you see that ranks are repeating after every department so that’s the beauty of Partition by”
Subqueries: Queries nested inside another query:
“placing queries inside another query I can write this query like this select star from orders where s num equals select s num from sales people where s name equals MOA”
“in order to evaluate outer query SQL first have to evaluate inner query that that’s what people were asking”
Joins: Different types of joins are discussed:
INNER JOIN, LEFT JOIN, and RIGHT JOIN for combining data from multiple tables.
“irrespective of coming from which table here I will write e do department number also this is inner joint this is left join and this is right join just by changing one word just by changing one word I’m getting three different outputs”
Table Aliasing: Using aliases to make queries more readable and efficient.
“I will say Al as your table name only EMP as e left join Department as D see I have Al as the table only directly and then rather than using EMP I’ll write e rather than using Department I’ll write d”
Stored Procedures: A reusable block of SQL code that can simplify complex queries and logic.
“start procedure is nothing but just creating your query and store it and give the name so in order to do that create procedure procedure name you can give any name as I was giving you example for the tables and all those things and give you the name correct create procedure”
User-Defined Functions: Functions to encapsulate complex logic and create reusable code:
“function is a piece of code which takes some inputs and generates some output basically so that it can be used across many places not only in one single place why to repeat the code again and again”
Functions can be scalar functions, returning a single value, or table-valued functions, returning a table.
Security: Managing database access with GRANT and REVOKE permissions and user logins.
“Grant select on this to whom to RF user done”
Temporary Tables: Tables that exist only for the duration of the current session or connection (single hash) or across multiple sessions (double hash).
“table created with double hash is accessible in the both the sessions but why the table with the single hash is not accessible in both the sessions”
Views: Virtual tables that represent stored queries; they can enhance security and simplify queries.
“views are virtual tables these doesn’t occupy any space unlike the temporary tables…the views will not occupy any space and you will have a view now see it’s a view”
Indexing: Improving query performance by creating indexes on specific table columns.
“it just improve my performance of my query what it does how it does why should I care about it basically it creates a key value pair”Can be either CLUSTERED (physical order of data) or NON-CLUSTERED (separate lookup structure).
Pivot/Unpivot: Re-shaping data from long to wide format and vice-versa. * “so what name I’m giving to the column which will have numbers is sales and what is the column name that I’m giving for Jan Feb March April May what is the column name I’m giving month name and how many columns I’m taking there I’m taking all the 12”
V. Practical Data Analysis Workflow and Business Application
Data Shaping: The role of the data professional is to shape data for analysis rather than worrying about the collection. Data can be shaped using views, stored procedures, functions, and triggers.
“my data analy job is not to worry about data collection once the data is there in the system shaping the data is in my work scope”
Real World Examples: The training provides practical scenarios, such as data conversion (epoch timestamp to human-readable date) or customer age categorization, to demonstrate how SQL is used in real business settings.
Triggers for Automation: Triggers are used to automatically update a date of first purchase in the customer table after a new order is inserted.
VI. Other Key Takeaways:
Different Styles of Coding: There are various coding styles and there is no absolute right or wrong way as long as you fulfill the requirement.
Importance of syntax: Small things such as missing commas, parenthesis, or spaces can lead to errors.
CTE (Common Table Expression): A common table expression is used to make query shorter and more readable by creating a temporary result set.
Conclusion
The provided materials offer a comprehensive introduction to SQL, from basic syntax to advanced techniques. They underscore the importance of SQL across different data-focused roles, emphasize hands-on practice, and encourage purposeful learning. The content is structured to enable participants to not only write SQL queries but also understand the business context and design solutions to real-world data challenges.
SQL Fundamentals and Data Roles
Frequently Asked Questions About SQL and Data Roles
What is the role of SQL in data-related professions? SQL is a fundamental skill for data analysts, data engineers, and data scientists. Regardless of the specific role, proficiency in SQL is essential for retrieving, manipulating, and managing data. Data analysts use SQL along with business intelligence tools, data engineers use it alongside data integration tools, and data scientists use it with machine learning libraries. In essence, SQL serves as the common language for all roles to interact with data.
What are the core components of SQL Server? SQL Server has two primary components: the server and the client (or management tool). The server stores and manages databases while the client (like SQL Server Management Studio – SSMS) is a tool that allows users to interact with the server. Communication between the client and server happens using SQL, a structured query language. A database server contains multiple databases, and each database is made up of tables. These tables have rows and columns, and relationships between tables make up a schema.
What are the main types of SQL commands and what are their functions? SQL commands can be categorized into Data Definition Language (DDL), Data Manipulation Language (DML), Data Control Language (DCL), and Transaction Control Language (TCL). DDL commands (e.g., CREATE, ALTER, DROP, TRUNCATE) are used to define and modify the structure of the database, such as creating tables, adding columns, or removing tables. DML commands (e.g., INSERT, UPDATE, DELETE) are used to manage the actual data within the tables. DCL commands (e.g., GRANT, REVOKE) handle the security aspects by managing access levels for users. TCL commands (e.g., COMMIT, ROLLBACK, SAVEPOINT) control transactions, allowing for undoing or redoing changes.
What is the difference between DELETE, TRUNCATE, and DROP commands? While all three commands are used for removing data, they differ in how they work. The DELETE command removes records from a table row by row, optionally based on a condition, and is a DML command. The TRUNCATE command is a DDL command and removes all records from a table by dropping and recreating the table, making it faster than DELETE for deleting all records. The DROP command removes the entire table, including both the data and its structure, and is also a DDL command.
What are data types CHAR and VARCHAR, and how are they different? CHAR and VARCHAR are data types used for storing character strings. CHAR is a fixed-size data type, allocating a specified number of spaces regardless of how much data is stored, which can lead to wasted space. VARCHAR is a variable-size data type, allocating only the space needed for the actual data, which saves space but has a limit to how many characters it can hold. For instance, a phone number that must always be 10 digits would be CHAR(10) and a name that varies in length would be VARCHAR(20).
How can SQL ALTER command be used? The ALTER command is used to modify the structure of an existing table. It can change the data type of columns (e.g. changing from CHAR(2) to CHAR(3)), add new columns, or remove existing ones. It’s important to note that when altering a column to a smaller size, SQL will restrict this if the column has data that exceeds the new smaller size. ALTER operations allow changes while preserving existing data in the table, where possible.
What is the purpose of constraints like primary keys in SQL? Constraints define rules for the data in a table. A primary key constraint ensures two things: that all values in the primary key column are unique and not null. This allows for efficient identification of unique records and prevents duplicate or missing records. Primary keys help in making tables and schemas more reliable.
What are SQL Window Functions and how do they differ from other functions?
SQL window functions, such as RANK, DENSE_RANK, and ROW_NUMBER, are used to perform calculations across a set of table rows that are related to the current row. Unlike normal SQL functions, which operate only on a single row, window functions operate on a “window” or frame of data, comparing values from other rows. These functions allow for rank calculations, running totals, and more complex analysis. For example, RANK will assign the same rank to identical values, while DENSE_RANK won’t skip the next value if two rows have the same rank and ROW_NUMBER will simply assign an incremental row number regardless of value. PARTITION BY can be used to define a window of rows based on a given column such as a department, then apply the window function to the partitioned subset of rows.
SQL Command Types and Functions
SQL commands are used to interact with databases, and they can be categorized into four main types: Data Definition Language (DDL), Data Manipulation Language (DML), Data Control Language (DCL), and Transaction Control Language (TCL) [1, 2].
DDL (Data Definition Language) commands are used to define the structure or schema of a database [2]. These commands deal with the skeleton of the database, not the actual data [2].
CREATE is used to create database objects such as tables, views, or indexes [1, 2]. For example, CREATE TABLE countries (country_id INT, country_code VARCHAR(2), country_name VARCHAR(20)); creates a table named “countries” with three columns [3].
ALTER is used to modify existing database objects [2]. This can involve adding, removing, or modifying columns in a table, for instance, ALTER TABLE countries ALTER COLUMN country_code VARCHAR(3); changes the size of the country_code column [3, 4].
DROP is used to remove database objects [2]. For instance, DROP TABLE countries; deletes the “countries” table and its data [3, 4].
TRUNCATE is used to remove all data from a table while keeping the table structure [2, 3]. For example, TRUNCATE TABLE countries; deletes all rows from the “countries” table [3]. It works by dropping the table and recreating it, making it faster than deleting rows one by one [3].
DML (Data Manipulation Language) commands are used to manage the actual data within a database [2].
INSERT is used to add new data into a table [2]. For example, INSERT INTO countries (country_id, country_code, country_name) VALUES (1, ‘US’, ‘United States’); adds a new row to the “countries” table [4].
UPDATE is used to modify existing data in a table [2]. For instance, UPDATE countries SET country_code = ‘USA’ WHERE country_id = 1; changes the country_code for the row where country_id is 1 [4].
DELETE is used to remove data from a table based on a specific condition [2]. For example, DELETE FROM countries WHERE country_id = 1; deletes the row where country_id is 1 [4].
DCL (Data Control Language) commands are used to control access to the data [2]. They deal with security, ensuring that the right people have the right permissions to interact with the database [2].
GRANT is used to give specific permissions to users. For instance, GRANT SELECT ON products_new TO arif_user; allows the user “arif_user” to read data from the table “products_new” [5].
REVOKE is used to take away permissions from users. For example, REVOKE SELECT ON products_new FROM arif_user; removes the “select” permission from the user “arif_user” [5].
TCL (Transaction Control Language) commands are used to manage transactions within a database, allowing for the grouping of several operations into a single unit of work [2].
BEGIN TRANSACTION marks the start of a new transaction [5, 6]. This is needed before using commit or rollback [6].
COMMIT saves all changes made during the transaction [6].
ROLLBACK cancels all changes made during the transaction to return to the initial state [2, 5, 6]. If a transaction is not committed or rolled back, it will be automatically committed when the session is closed or if a DDL command is executed [6].
These SQL commands can be combined with various clauses such as WHERE, GROUP BY, HAVING, ORDER BY to filter, group, and sort data [7, 8]. Functions can be used within SQL queries to perform different types of operations such as string and numerical manipulation and also with date and time [7].
SQL Data Types: A Comprehensive Guide
Data types in SQL specify the type of data that can be stored in a column of a table [1]. Choosing the correct data type is important to ensure data integrity, optimize storage, and improve performance [1].
Here’s a breakdown of common SQL data types, drawing on the information from the sources:
Integer (INT): Used for storing whole numbers [1]. For instance, country_id in the countries table is defined as INT [1].
Character (CHAR): Used to store character strings of a fixed length [1]. For example, CHAR(2) allocates space for two characters, whether or not that space is used [1].
If a CHAR(2) column stores only one character, the remaining space is still allocated and remains empty [1].
Variable Character (VARCHAR): Used for character strings of a variable length [1].
For example, VARCHAR(20) can store up to 20 characters but will only use the space needed [1]. If a VARCHAR(20) column stores only four characters, only four spaces are occupied [1].
VARCHAR is more efficient than CHAR when the length of the strings varies because it does not allocate unnecessary space [1].
VARCHAR is often preferred for storing names and addresses [1].
Date and Time (DATE, DATETIME): Used to store date and time values [2].
DATE stores only the date portion, while DATETIME stores both date and time [2].
GETDATE() is a function that returns the current date and time [2].
Float: Used to store floating point numbers which are numbers that have a decimal point [3].
Additional Considerations:
Case Sensitivity: Microsoft SQL Server is generally not case-sensitive, but Oracle server is case-sensitive [1, 4]. It is a good practice to consider case sensitivity to avoid issues when moving code between different database systems [4].
Fixed vs. Variable Size:Fixed-size data types like CHAR allocate a specific amount of memory regardless of the actual data length, which can lead to wasted space [1].
Variable-size data types such as VARCHAR use only the memory needed to store the actual data, which is more efficient [1].
Data Type Conversion:
The CAST function can be used to convert one data type to another [3, 5].
The CONVERT function can also be used to convert one data type to another [2].
For instance, CONVERT(DATE, GETDATE()) converts the DATETIME output of GETDATE() to just the DATE [2].
Choosing the Right Data Type: When defining data types, it’s important to consider the nature of the data you’re storing [1].
For example, a phone number, which is always ten digits, should use a fixed-size data type, such as CHAR(10) [1].
For gender, a CHAR(1) is sufficient since the values are usually “M” or “F” [1].
Understanding and selecting the appropriate data types is fundamental to efficient database design and management [1].
SQL Table Creation
Table creation in SQL involves using Data Definition Language (DDL) commands to define the structure of a table, which includes specifying column names, data types, and constraints [1]. Here’s a breakdown of how to create tables effectively, drawing from the sources:
Basic Table Creation:
The CREATE TABLE command is the foundation for building a new table [1]. The basic syntax includes specifying the table name and defining its columns within parentheses [1, 2].
Each column definition includes the column name, the data type, and any optional constraints [2].
For example, to create a table called “countries” with columns for country ID, country code, and country name, the following SQL statement is used:
CREATE TABLE countries (
country_id INT,
country_code VARCHAR(2),
country_name VARCHAR(20)
);
This command creates a table named countries with three columns: country_id of type integer (INT), country_code of type variable character string with a maximum length of 2 (VARCHAR(2)), and country_name of type variable character string with a maximum length of 20 (VARCHAR(20)) [3].
Column Definition:
When defining columns, it is necessary to choose appropriate data types [2]. Common data types include INT for integers, VARCHAR for variable-length strings, and CHAR for fixed-length strings [2, 3].
INT is used for numerical data, such as identifiers [3].
VARCHAR is suitable for text that has a varying length, such as names or descriptions [3].
CHAR is more suitable for fixed-length data such as gender which can be represented by “M” or “F” with CHAR(1) [3].
Indentation is very important for readability and tracking code, which should be used when creating tables with multiple columns [3].
Constraints:
Constraints are used to enforce rules on the data within a table [3]. They are important for maintaining data integrity.
Primary Key: The PRIMARY KEY constraint is used to ensure that the values in a column are unique and not null, and this is used to uniquely identify each row in a table [4].
A table can have only one primary key.
For example, in a Department table, department_number could be defined as the primary key, preventing duplicate or null values:
CREATE TABLE Department (
department_number INT PRIMARY KEY,
department_name VARCHAR(20),
location VARCHAR(10)
);
Not Null: The NOT NULL constraint is used to ensure that a column cannot contain null values, ensuring that there is always data present for the column [4].
Other constraints are not discussed in the sources.
Executing the CREATE TABLE command
After writing a CREATE TABLE command, it is necessary to select the command and then click execute. The system will provide a message if the command was successful [3].
If the table does not show up in the tables list, then you may need to right click the tables list and click refresh [3].
If a table with the same name already exists, then the SQL system will throw an error [3]. This is an important error to read and understand to troubleshoot SQL [3].
If there is a syntax error, the system will also give a message, and these messages should be read and understood to correct the SQL code [3].
Additional Considerations:
Data Types: It is important to choose appropriate data types for the columns based on the nature of the data that the column will store [3].
Naming conventions: When creating a column name with a space, it is recommended that you use an underscore instead [5].
Case Sensitivity: SQL Server is not case sensitive, but it is good practice to maintain case sensitivity in code, because other SQL servers such as Oracle server are case sensitive [3, 6].
By understanding and using these SQL commands, data types, and constraints, it is possible to effectively create and manage tables in SQL databases [1-3].
SQL Data Insertion Techniques
Data insertion in SQL involves using Data Manipulation Language (DML) commands to add new rows of data into a table. The primary command for inserting data is INSERT INTO, and it can be used in several ways. Here’s a breakdown of how to insert data effectively, drawing from the sources:
Basic Data Insertion:
The INSERT INTO command is used to add new records (rows) to a table.
The basic syntax of the INSERT INTO command is as follows:
INSERT INTO table_name (column1, column2, column3, …)
VALUES (value1, value2, value3, …);
table_name is the name of the table into which data needs to be inserted.
(column1, column2, column3, …) specifies the columns where data is being inserted, and the order of the columns is important.
VALUES (value1, value2, value3, …) specifies the values to be inserted into the corresponding columns.
For example, to insert a new country into the “countries” table, you might use:
INSERT INTO countries (country_id, country_code, country_name)
VALUES (2, ‘CA’, ‘Canada’);
This command will add a new row to the countries table with country_id as 2, country_code as ‘CA’, and country_name as ‘Canada’.
String values should be enclosed in single quotes, while numeric values do not require single quotes.
Specifying Columns:
It’s good practice to explicitly specify the column names in the INSERT INTO statement. This ensures that the data is inserted into the correct columns, regardless of the order of the columns in the table definition.
If the column names are not specified, the values must be listed in the same order that the columns are defined in the table.
For example, both of the following statements are valid if the columns of the countries table are ordered as country_id, country_code, country_nameINSERT INTO countries (country_id, country_code, country_name)
VALUES (2, ‘CA’, ‘Canada’);
and
INSERT INTO countries
VALUES (2, ‘CA’, ‘Canada’);
Inserting Data with Different Column Order:
It is possible to insert data in a different column order than the order that the columns appear in the table provided you specify the columns explicitly in the INSERT INTO statement.
For instance:
INSERT INTO countries (country_code, country_name, country_id)
VALUES (‘IN’, ‘India’, 3);
This will correctly insert ‘IN’ into country_code, ‘India’ into country_name, and 3 into country_id.
The sequence of columns in the INSERT INTO statement must match the sequence of values provided.
Inserting Null Values:
If a value for a specific column is not available, you can insert NULL into the column if the column allows null values.
If you omit a column from the INSERT INTO statement, and the column allows null values, then the SQL server will automatically insert NULL.
For example, if you don’t have a country code, you can either omit the country_code column in the insert statement, or insert NULL:
INSERT INTO countries (country_id, country_name)
VALUES (4, ‘United Kingdom’);
or
INSERT INTO countries (country_id, country_code, country_name)
VALUES (4, NULL, ‘United Kingdom’);
Both statements will insert a row where the country_code is NULL.
If a column has a NOT NULL constraint, then you must insert a non-null value, or the insert statement will cause an error.
Inserting Data Based on Conditions:
The WHERE clause can be used to insert data into a table based on certain criteria. For example, you could insert data into a new table based on certain conditions from another table using the INSERT INTO … SELECT statement.
Executing the INSERT INTO Command:
After writing an INSERT INTO command, it is necessary to select the command and then click execute.
The system will provide a message stating the number of rows affected, which should be equal to one if only one insert statement is being executed.
Important Considerations:
Data Type Compatibility: It is important to ensure that the data type of the values being inserted is compatible with the data type of the corresponding columns. Otherwise, errors may occur, and the data may not be inserted correctly.
Constraints: If a table has constraints such as primary keys or unique constraints, then inserting data may lead to an error if it violates those constraints.
For example, if you try to insert a row with a duplicate primary key value, the SQL server will throw an error.
By understanding and using these techniques for data insertion, it is possible to populate tables with new data accurately and efficiently.
SQL Error Handling and Exception Management
Error handling in SQL involves managing issues that arise during the execution of SQL code, ensuring that the system responds gracefully to both system-level and business-level problems [1]. It is implemented using TRY…CATCH blocks and other techniques. Here’s a detailed look at error handling as discussed in the sources:
Types of Errors:
System Errors: These are errors that arise due to violations of SQL system rules, such as trying to insert duplicate primary key values, which violate the PRIMARY KEY constraint [1].
Business Errors/Exceptions: These are errors that are not system errors but violate business rules, such as restricting code execution to specific times [1].
Both types of errors can be managed using TRY…CATCH blocks [1].
TRY…CATCH Blocks:
Every SQL code block that requires exception or error handling has two main parts: a TRY block and a CATCH block [1].
The TRY block contains the code that might generate an error [1].
The SQL server will attempt to run all the code inside of the TRY block.
If an error occurs during the execution of the TRY block, the control is immediately transferred to the CATCH block [1].
Any code after the error within the TRY block will not be executed [1].
The CATCH block contains code to handle the error, such as logging, displaying a message, or attempting to correct the error [1].
For example:
BEGIN TRY
— Code that might cause an error
INSERT INTO employees (employee_id, name) VALUES (1, ‘John Doe’);
PRINT ‘Inside TRY block, after insert’; — This will not execute if there’s an error on the line above
END TRY
BEGIN CATCH
— Code to handle the error
PRINT ‘Inside CATCH block’;
PRINT ERROR_MESSAGE(); — Prints a system-generated error message
PRINT ERROR_NUMBER(); — Prints the error number
PRINT ERROR_STATE(); — Prints the error state
— More error-handling logic can be added here
END CATCH
In the example above, an attempt to insert a duplicate employee_id will cause the control to pass to the CATCH block, and any code after the error in the TRY block will not execute.
Error Information in the CATCH Block:
Inside the CATCH block, you can access error information using the following functions:
ERROR_MESSAGE(): Returns the text of the error message [1].
ERROR_NUMBER(): Returns the error number [1].
ERROR_STATE(): Returns the error state [1].
User-Defined Errors:
In addition to handling system errors, you can also raise your own errors to manage business-specific rules or exceptions, which are not necessarily system errors [1].
This is done using the RAISERROR statement, which will force the code to jump to the CATCH block, similar to a system error [1].
For example, you can raise an error if a procedure is run outside working hours:
DECLARE @currentTime TIME = CAST(GETDATE() AS TIME);
IF @currentTime BETWEEN ’18:00′ AND ’06:00′
BEGIN
RAISERROR(‘You cannot run this code during non-working hours.’, 16, 1);
END
The code above will force the control to jump to the CATCH block if the current time is between 6 PM and 6 AM.
Importance of Exception Handling:
Robustness: Exception handling makes your code robust, meaning it can handle unexpected situations without crashing or producing incorrect output [1].
User Experience: It can improve the user experience by providing meaningful error messages when issues occur and allowing the code to respond gracefully to those errors [1].
Debugging: Using TRY…CATCH blocks and error information, it is possible to debug SQL code more efficiently by understanding the errors that occurred [1].
Additional Considerations:
Logical Mistakes: Even if code runs without syntax errors, there might be logical mistakes that require error handling [2].
For example, code could be written to return incorrect outputs even though the syntax is correct.
Time Restrictions: With exception handling, SQL code can be restricted to certain times [1].
You can also implement business rules, such as preventing code from executing if the time is outside the desired range.
By understanding and using TRY…CATCH blocks, the RAISERROR statement, and error functions, developers can create SQL code that is more resilient, user-friendly, and easier to debug.
SQL Full Course 2025 | Complete SQL Course For Beginners | Learn SQL in 11 Hours | Intellipaat
Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
These resources provide a comprehensive pathway for aspiring database engineers and software developers. They cover fundamental database concepts like data modeling, SQL for data manipulation and management, database optimization, and data warehousing. Furthermore, they explore essential software development practices including Python programming, object-oriented principles, version control with Git and GitHub, software testing methodologies, and preparing for technical interviews with insights into data structures and algorithms.
Introduction to Database Engineering
This course provides a comprehensive introduction to database engineering. A straightforward description of a database is a form of electronic storage in which data is held. However, this simple explanation doesn’t fully capture the impact of database technology on global industry, government, and organizations. Almost everyone has used a database, and it’s likely that information about us is present in many databases worldwide.
Database engineering is crucial to global industry, government, and organizations. In a real-world context, databases are used in various scenarios:
Banks use databases to store data for customers, bank accounts, and transactions.
Hospitals store patient data, staff data, and laboratory data.
Online stores retain profile information, shopping history, and accounting transactions.
Social media platforms store uploaded photos.
Work environments use databases for downloading files.
Online games rely on databases.
Data in basic terms is facts and figures about anything. For example, data about a person might include their name, age, email, and date of birth, or it could be facts and figures related to an online purchase like the order number and description.
A database looks like data organized systematically, often resembling a spreadsheet or a table. This systematic organization means that all data contains elements or features and attributes by which they can be identified. For example, a person can be identified by attributes like name and age.
Data stored in a database cannot exist in isolation; it must have a relationship with other data to be processed into meaningful information. Databases establish relationships between pieces of data, for example, by retrieving a customer’s details from one table and their order recorded against another table. This is often achieved through keys. A primary key uniquely identifies each record in a table, while a foreign key is a primary key from one table that is used in another table to establish a link or relationship between the two. For instance, the customer ID in a customer table can be the primary key and then become a foreign key in an order table, thus relating the two tables.
While relational databases, which organize data into tables with relationships, are common, there are other types of databases. An object-oriented database stores data in the form of objects instead of tables or relations. An example could be an online bookstore where authors, customers, books, and publishers are rendered as classes, and the individual entries are objects or instances of these classes.
To work with data in databases, database engineers use Structured Query Language (SQL). SQL is a standard language that can be used with all relational databases like MySQL, PostgreSQL, Oracle, and Microsoft SQL Server. Database engineers establish interactions with databases to create, read, update, and delete (CRUD) data.
SQL can be divided into several sub-languages:
Data Definition Language (DDL) helps define data in the database and includes commands like CREATE (to create databases and tables), ALTER (to modify database objects), and DROP (to remove objects).
Data Manipulation Language (DML) is used to manipulate data and includes operations like INSERT (to add data), UPDATE (to modify data), and DELETE (to remove data).
Data Query Language (DQL) is used to read or retrieve data, primarily using the SELECT command.
Data Control Language (DCL) is used to control access to the database, with commands like GRANT and REVOKE to manage user privileges.
SQL offers several advantages:
It requires very little coding skills to use, consisting mainly of keywords.
Its interactivity allows developers to write complex queries quickly.
It is a standard language usable with all relational databases, leading to extensive support and information availability.
It is portable across operating systems.
Before developing a database, planning the organization of data is crucial, and this plan is called a schema. A schema is an organization or grouping of information and the relationships among them. In MySQL, schema and database are often interchangeable terms, referring to how data is organized. However, the definition of schema can vary across different database systems. A database schema typically comprises tables, columns, relationships, data types, and keys. Schemas provide logical groupings for database objects, simplify access and manipulation, and enhance database security by allowing permission management based on user access rights.
Database normalization is an important process used to structure tables in a way that minimizes challenges by reducing data duplication and avoiding data inconsistencies (anomalies). This involves converting a large table into multiple tables to reduce data redundancy. There are different normal forms (1NF, 2NF, 3NF) that define rules for table structure to achieve better database design.
As databases have evolved, they now must be able to store ever-increasing amounts of unstructured data, which poses difficulties. This growth has also led to concepts like big data and cloud databases.
Furthermore, databases play a crucial role in data warehousing, which involves a centralized data repository that loads, integrates, stores, and processes large amounts of data from multiple sources for data analysis. Dimensional data modeling, based on dimensions and facts, is often used to build databases in a data warehouse for data analytics. Databases also support data analytics, where collected data is converted into useful information to inform future decisions.
Tools like MySQL Workbench provide a unified visual environment for database modeling and management, supporting the creation of data models, forward and reverse engineering of databases, and SQL development.
Finally, interacting with databases can also be done through programming languages like Python using connectors or APIs (Application Programming Interfaces). This allows developers to build applications that interact with databases for various operations.
Understanding SQL: Language for Database Interaction
SQL (Structured Query Language) is a standard language used to interact with databases. It’s also commonly pronounced as “SQL”. Database engineers use SQL to establish interactions with databases.
Here’s a breakdown of SQL based on the provided source:
Role of SQL: SQL acts as the interface or bridge between a relational database and its users. It allows database engineers to create, read, update, and delete (CRUD) data. These operations are fundamental when working with a database.
Interaction with Databases: As a web developer or data engineer, you execute SQL instructions on a database using a Database Management System (DBMS). The DBMS is responsible for transforming SQL instructions into a form that the underlying database understands.
Applicability: SQL is particularly useful when working with relational databases, which require a language that can interact with structured data. Examples of relational databases that SQL can interact with include MySQL, PostgreSQL, Oracle, and Microsoft SQL Server.
SQL Sub-languages: SQL is divided into several sub-languages:
Data Definition Language (DDL): Helps you define data in your database. DDL commands include:
CREATE: Used to create databases and related objects like tables. For example, you can use the CREATE DATABASE command followed by the database name to create a new database. Similarly, CREATE TABLE followed by the table name and column definitions is used to create tables.
ALTER: Used to modify already created database objects, such as modifying the structure of a table by adding or removing columns (ALTER TABLE).
DROP: Used to remove objects like tables or entire databases. The DROP DATABASE command followed by the database name removes a database. The DROP COLUMN command removes a specific column from a table.
Data Manipulation Language (DML): Commands are used to manipulate data in the database and most CRUD operations fall under DML. DML commands include:
INSERT: Used to add or insert data into a table. The INSERT INTO syntax is used to add rows of data to a specified table.
UPDATE: Used to edit or modify existing data in a table. The UPDATE command allows you to specify data to be changed.
DELETE: Used to remove data from a table. The DELETE FROM syntax followed by the table name and an optional WHERE clause is used to remove data.
Data Query Language (DQL): Used to read or retrieve data from the database. The primary DQL command is:
SELECT: Used to select and retrieve data from one or multiple tables, allowing you to specify the columns you want and apply filter criteria using the WHERE clause. You can select all columns using SELECT *.
Data Control Language (DCL): Used to control access to the database. DCL commands include:
GRANT: Used to give users access privileges to data.
REVOKE: Used to revert access privileges already given to users.
Advantages of SQL: SQL is a popular language choice for databases due to several advantages:
Low coding skills required: It uses a set of keywords and requires very little coding.
Interactivity: Allows developers to write complex queries quickly.
Standard language: Can be used with all relational databases like MySQL, leading to extensive support and information availability.
Portability: Once written, SQL code can be used on any hardware and any operating system or platform where the database software is installed.
Comprehensive: Covers all areas of database management and administration, including creating databases, manipulating data, retrieving data, and managing security.
Efficiency: Allows database users to process large amounts of data quickly and efficiently.
Basic SQL Operations: SQL enables various operations on data, including:
Creating databases and tables using DDL.
Populating and modifying data using DML (INSERT, UPDATE, DELETE).
Reading and querying data using DQL (SELECT) with options to specify columns and filter data using the WHERE clause.
Sorting data using the ORDER BY clause with ASC (ascending) or DESC (descending) keywords.
Filtering data using the WHERE clause with various comparison operators (=, <, >, <=, >=, !=) and logical operators (AND, OR). Other filtering operators include BETWEEN, LIKE, and IN.
Removing duplicate rows using the SELECT DISTINCT clause.
Performing arithmetic operations using operators like +, -, *, /, and % (modulus) within SELECT statements.
Using comparison operators to compare values in WHERE clauses.
Utilizing aggregate functions (though not detailed in this initial overview but mentioned later in conjunction with GROUP BY).
Joining data from multiple tables (mentioned as necessary when data exists in separate entities). The source later details INNER JOIN, LEFT JOIN, and RIGHT JOIN clauses.
Creating aliases for tables and columns to make queries simpler and more readable.
Using subqueries (a query within another query) for more complex data retrieval.
Creating views (virtual tables based on the result of a SQL statement) to simplify data access and combine data from multiple tables.
Using stored procedures (pre-prepared SQL code that can be saved and executed).
Working with functions (numeric, string, date, comparison, control flow) to process and manipulate data.
Implementing triggers (stored programs that automatically execute in response to certain events).
Managing database transactions to ensure data integrity.
Optimizing queries for better performance.
Performing data analysis using SQL queries.
Interacting with databases using programming languages like Python through connectors and APIs.
In essence, SQL is a powerful and versatile language that is fundamental for anyone working with relational databases, enabling them to define, manage, query, and manipulate data effectively. The knowledge of SQL is a valuable skill for database engineers and is crucial for various tasks, from building and maintaining databases to extracting insights through data analysis.
Data Modeling Principles: Schema, Types, and Design
Data modeling principles revolve around creating a blueprint of how data will be organized and structured within a database system. This plan, often referred to as a schema, is essential for efficient data storage, access, updates, and querying. A well-designed data model ensures data consistency and quality.
Here are some key data modeling principles discussed in the sources:
Understanding Data Requirements: Before creating a database, it’s crucial to have a clear idea of its purpose and the data it needs to store. For example, a database for an online bookshop needs to record book titles, authors, customers, and sales. Mangata and Gallo (mng), a jewelry store, needed to store data on customers, products, and orders.
Visual Representation: A data model provides a visual representation of data elements (entities) and their relationships. This is often achieved using an Entity Relationship Diagram (ERD), which helps in planning entity-relational databases.
Different Levels of Abstraction: Data modeling occurs at different levels:
Conceptual Data Model: Provides a high-level, abstract view of the entities and their relationships in the database system. It focuses on “what” data needs to be stored (e.g., customers, products, orders as entities for mng) and how these relate.
Logical Data Model: Builds upon the conceptual model by providing a more detailed overview of the entities, their attributes, primary keys, and foreign keys. For mng, this would involve defining attributes for customers (like client ID as primary key), products, and orders, and specifying foreign keys to establish relationships (e.g., client ID in the orders table referencing the clients table).
Physical Data Model: Represents the internal schema of the database and is specific to the chosen Database Management System (DBMS). It outlines details like data types for each attribute (e.g., varchar for full name, integer for contact number), constraints (e.g., not null), and other database-specific features. SQL is often used to create the physical schema.
Choosing the Right Data Model Type: Several types of data models exist, each with its own advantages and disadvantages:
Relational Data Model: Represents data as a collection of tables (relations) with rows and columns, known for its simplicity.
Entity-Relationship Model: Similar to the relational model but presents each table as a separate entity with attributes and explicitly defines different types of relationships between entities (one-to-one, one-to-many, many-to-many).
Hierarchical Data Model: Organizes data in a tree-like structure with parent and child nodes, primarily supporting one-to-many relationships.
Object-Oriented Model: Translates objects into classes with characteristics and behaviors, supporting complex associations like aggregation and inheritance, suitable for complex projects.
Dimensional Data Model: Based on dimensions (context of measurements) and facts (quantifiable data), optimized for faster data retrieval and efficient data analytics, often using star and snowflake schemas in data warehouses.
Database Normalization: This is a crucial process for structuring tables to minimize data redundancy, avoid data modification implications (insertion, update, deletion anomalies), and simplify data queries. Normalization involves applying a series of normal forms (First Normal Form – 1NF, Second Normal Form – 2NF, Third Normal Form – 3NF) to ensure data atomicity, eliminate repeating groups, address functional and partial dependencies, and resolve transitive dependencies.
Establishing Relationships: Data in a database should be related to provide meaningful information. Relationships between tables are established using keys:
Primary Key: A value that uniquely identifies each record in a table and prevents duplicates.
Foreign Key: One or more columns in one table that reference the primary key in another table, used to connect tables and create cross-referencing.
Defining Domains: A domain is the set of legal values that can be assigned to an attribute, ensuring data in a field is well-defined (e.g., only numbers in a numerical domain). This involves specifying data types, length values, and other relevant rules.
Using Constraints: Database constraints limit the type of data that can be stored in a table, ensuring data accuracy and reliability. Common constraints include NOT NULL (ensuring fields are always completed), UNIQUE (preventing duplicate values), CHECK (enforcing specific conditions), and FOREIGN KEY (maintaining referential integrity).
Importance of Planning: Designing a data model before building the database system allows for planning how data is stored and accessed efficiently. A poorly designed database can make it hard to produce accurate information.
Considerations at Scale: For large-scale applications like those at Meta, data modeling must prioritize user privacy, user safety, and scalability. It requires careful consideration of data access, encryption, and the ability to handle billions of users and evolving product needs. Thoughtfulness about future changes and the impact of modifications on existing data models is crucial.
Data Integrity and Quality: Well-designed data models, including the use of data types and constraints, are fundamental steps in ensuring the integrity and quality of a database.
Data modeling is an iterative process that requires a deep understanding of the data, the business requirements, and the capabilities of the chosen database system. It is a crucial skill for database engineers and a fundamental aspect of database design. Tools like MySQL Workbench can aid in creating, visualizing, and implementing data models.
Understanding Version Control: Git and Collaborative Development
Version Control Systems (VCS), also known as Source Control or Source Code Management, are systems that record all changes and modifications to files for tracking purposes. The primary goal of any VCS is to keep track of changes by allowing developers access to the entire change history with the ability to revert or roll back to a previous state or point in time. These systems track different types of changes such as adding new files, modifying or updating files, and deleting files. The version control system is the source of truth across all code assets and the team itself.
There are many benefits associated with Version Control, especially for developers working in a team. These include:
Revision history: Provides a record of all changes in a project and the ability for developers to revert to a stable point in time if code edits cause issues or bugs.
Identity: All changes made are recorded with the identity of the user who made them, allowing teams to see not only when changes occurred but also who made them.
Collaboration: A VCS allows teams to submit their code and keep track of any changes that need to be made when working towards a common goal. It also facilitates peer review where developers inspect code and provide feedback.
Automation and efficiency: Version Control helps keep track of all changes and plays an integral role in DevOps, increasing an organization’s ability to deliver applications or services with high quality and velocity. It aids in software quality, release, and deployments. By having Version Control in place, teams following agile methodologies can manage their tasks more efficiently.
Managing conflicts: Version Control helps developers fix any conflicts that may occur when multiple developers work on the same code base. The history of revisions can aid in seeing the full life cycle of changes and is essential for merging conflicts.
There are two main types or categories of Version Control Systems: centralized Version Control Systems (CVCS) and distributed Version Control Systems (DVCS).
Centralized Version Control Systems (CVCS) contain a server that houses the full history of the code base and clients that pull down the code. Developers need a connection to the server to perform any operations. Changes are pushed to the central server. An advantage of CVCS is that they are considered easier to learn and offer more access controls to users. A disadvantage is that they can be slower due to the need for a server connection.
Distributed Version Control Systems (DVCS) are similar, but every user is essentially a server and has the entire history of changes on their local system. Users don’t need to be connected to the server to add changes or view history, only to pull down the latest changes or push their own. DVCS offer better speed and performance and allow users to work offline. Git is an example of a DVCS.
Popular Version Control Technologies include git and GitHub. Git is a Version Control System designed to help users keep track of changes to files within their projects. It offers better speed and performance, reliability, free and open-source access, and an accessible syntax. Git is used predominantly via the command line. GitHub is a cloud-based hosting service that lets you manage git repositories from a user interface. It incorporates Git Version Control features and extends them with features like Access Control, pull requests, and automation. GitHub is very popular among web developers and acts like a social network for projects.
Key Git concepts include:
Repository: Used to track all changes to files in a specific folder and keep a history of all those changes. Repositories can be local (on your machine) or remote (e.g., on GitHub).
Clone: To copy a project from a remote repository to your local device.
Add: To stage changes in your local repository, preparing them for a commit.
Commit: To save a snapshot of the staged changes in the local repository’s history. Each commit is recorded with the identity of the user.
Push: To upload committed changes from your local repository to a remote repository.
Pull: To retrieve changes from a remote repository and apply them to your local repository.
Branching: Creating separate lines of development from the main codebase to work on new features or bug fixes in isolation. The main branch is often the source of truth.
Forking: Creating a copy of someone else’s repository on a platform like GitHub, allowing you to make changes without affecting the original.
Diff: A command to compare changes across files, branches, and commits.
Blame: A command to look at changes of a specific file and show the dates, times, and users who made the changes.
The typical Git workflow involves three states: modified, staged, and committed. Files are modified in the working directory, then added to the staging area, and finally committed to the local repository. These local commits are then pushed to a remote repository.
Branching workflows like feature branching are commonly used. This involves creating a new branch for each feature, working on it until completion, and then merging it back into the main branch after a pull request and peer review. Pull requests allow teams to review changes before they are merged.
At Meta, Version Control is very important. They use a giant monolithic repository for all of their backend code, which means code changes are shared with every other Instagram team. While this can be risky, it allows for code reuse. Meta encourages engineers to improve any code, emphasizing that “nothing at meta is someone else’s problem”. Due to the monolithic repository, merge conflicts happen a lot, so they try to write smaller changes and add gatekeepers to easily turn off features if needed. git blame is used daily to understand who wrote specific lines of code and why, which is particularly helpful in a large organization like Meta.
Version Control is also relevant to database development. It’s easy to overcomplicate data modeling and storage, and Version Control can help track changes and potentially revert to earlier designs. Planning how data will be organized (schema) is crucial before developing a database.
Learning to use git and GitHub for Version Control is part of the preparation for coding interviews in a final course, alongside practicing interview skills and refining resumes. Effective collaboration, which is enhanced by Version Control, is a crucial skill for software developers.
Python Programming Fundamentals: An Introduction
Based on the sources, here’s a discussion of Python programming basics:
Introduction to Python:
Python is a versatile and high-level programming language available on multiple platforms. It’s used in various areas like web development, data analytics, and business forecasting. Python’s syntax is similar to English, making it intuitive and easy for beginners to understand. Experienced programmers also appreciate its power and adaptability. Python was created by Guido van Rossum and released in 1991. It was designed to be readable and has similarities to English and mathematics. Since its release, it has gained significant popularity and has a rich selection of frameworks and libraries. Currently, it’s a popular language to learn, widely used in areas such as web development, artificial intelligence, machine learning, data analytics, and various programming applications. Python is easy to learn and get started with due to its English-like syntax. It also often requires less code compared to languages like C or Java. Python’s simplicity allows developers to focus on the task at hand, making it potentially quicker to get a product to market.
Setting up a Python Environment:
To start using Python, it’s essential to ensure it works correctly on your operating system with your chosen Integrated Development Environment (IDE), such as Visual Studio Code (VS Code). This involves making sure the right version of Python is used as the interpreter when running your code.
Installation Verification: You can verify if Python is installed by opening the terminal (or command prompt on Windows) and typing python –version. This should display the installed Python version.
VS Code Setup: VS Code offers a walkthrough guide for setting up Python. This includes installing Python (if needed) and selecting the correct Python interpreter.
Running Python Code: Python code can be run in a few ways:
Python Shell: Useful for running and testing small scripts without creating .py files. You can access it by typing python in the terminal.
Directly from Command Line/Terminal: Any file with the .py extension can be run by typing python followed by the file name (e.g., python hello.py).
Within an IDE (like VS Code): IDEs provide features like auto-completion, debugging, and syntax highlighting, making coding a better experience. VS Code has a run button to execute Python files.
Basic Syntax and Concepts:
Print Statement: The print() function is used to display output to the console. It can print different types of data and allows for formatting.
Variables: Variables are used to store data that can be changed throughout the program’s lifecycle. In Python, you declare a variable by assigning a value to a name (e.g., x = 5). Python automatically assigns the data type behind the scenes. There are conventions for naming variables, such as camel case (e.g., myName). You can declare multiple variables and assign them a single value (e.g., a = b = c = 10) or perform multiple assignments on one line (e.g., name, age = “Alice”, 30). You can also delete a variable using the del keyword.
Data Types: A data type indicates how a computer system should interpret a piece of data. Python offers several built-in data types:
Numeric: Includes int (integers), float (decimal numbers), and complex numbers.
Sequence: Ordered collections of items, including:
Strings (str): Sequences of characters enclosed in single or double quotes (e.g., “hello”, ‘world’). Individual characters in a string can be accessed by their index (starting from 0) using square brackets (e.g., name). The len() function returns the number of characters in a string.
Lists: Ordered and mutable sequences of items enclosed in square brackets (e.g., [1, 2, “three”]).
Tuples: Ordered and immutable sequences of items enclosed in parentheses (e.g., (1, 2, “three”)).
Dictionary (dict): Unordered collections of key-value pairs enclosed in curly braces (e.g., {“name”: “Bob”, “age”: 25}). Values are accessed using their keys.
Boolean (bool): Represents truth values: True or False.
Set (set): Unordered collections of unique elements enclosed in curly braces (e.g., {1, 2, 3}). Sets do not support indexing.
Typecasting: The process of converting one data type to another. Python supports implicit (automatic) and explicit (using functions like int(), float(), str()) type conversion.
Input: The input() function is used to take input from the user. It displays a prompt to the user and returns their input as a string.
Operators: Symbols used to perform operations on values.
Math Operators: Used for calculations (e.g., + for addition, – for subtraction, * for multiplication, / for division).
Logical Operators: Used in conditional statements to determine true or false outcomes (and, or, not).
Control Flow: Determines the order in which instructions in a program are executed.
Conditional Statements: Used to make decisions based on conditions (if, else, elif).
Loops: Used to repeatedly execute a block of code. Python has for loops (for iterating over sequences) and while loops (repeating a block until a condition is met). Nested loops are also possible.
Functions: Modular pieces of reusable code that take input and return output. You define a function using the def keyword. You can pass data into a function as arguments and return data using the return keyword. Python has different scopes for variables: local, enclosing, global, and built-in (LEGB rule).
Data Structures: Ways to organize and store data. Python includes lists, tuples, sets, and dictionaries.
This overview provides a foundation in Python programming basics as described in the provided sources. As you continue learning, you will delve deeper into these concepts and explore more advanced topics.
Database and Python Fundamentals Study Guide
Quiz
What is a database, and what is its typical organizational structure? A database is a systematically organized collection of data. This organization commonly resembles a spreadsheet or a table, with data containing elements and attributes for identification.
Explain the role of a Database Management System (DBMS) in the context of SQL. A DBMS acts as an intermediary between SQL instructions and the underlying database. It takes responsibility for transforming SQL commands into a format that the database can understand and execute.
Name and briefly define at least three sub-languages of SQL. DDL (Data Definition Language) is used to define data structures in a database, such as creating, altering, and dropping databases and tables. DML (Data Manipulation Language) is used for operational tasks like creating, reading, updating, and deleting data. DQL (Data Query Language) is used for retrieving data from the database.
Describe the purpose of the CREATE DATABASE and CREATE TABLE DDL statements. The CREATE DATABASE statement is used to create a new, empty database within the DBMS. The CREATE TABLE statement is used within a specific database to define a new table, including specifying the names and data types of its columns.
What is the function of the INSERT INTO DML statement? The INSERT INTO statement is used to add new rows of data into an existing table in the database. It requires specifying the table name and the values to be inserted into the table’s columns.
Explain the purpose of the NOT NULL constraint when defining table columns. The NOT NULL constraint ensures that a specific column in a table cannot contain a null value. If an attempt is made to insert a new record or update an existing one with a null value in a NOT NULL column, the operation will be aborted.
List and briefly define three basic arithmetic operators in SQL. The addition operator (+) is used to add two operands. The subtraction operator (-) is used to subtract the second operand from the first. The multiplication operator (*) is used to multiply two operands.
What is the primary function of the SELECT statement in SQL, and how can the WHERE clause be used with it? The SELECT statement is used to retrieve data from one or more tables in a database. The WHERE clause is used to filter the rows returned by the SELECT statement based on specified conditions.
Explain the difference between running Python code from the Python shell and running a .py file from the command line. The Python shell provides an interactive environment where you can execute Python code snippets directly and see immediate results without saving to a file. Running a .py file from the command line executes the entire script contained within the file non-interactively.
Define a variable in Python and provide an example of assigning it a value. In Python, a variable is a named storage location that holds a value. Variables are implicitly declared when a value is assigned to them. For example: x = 5 declares a variable named x and assigns it the integer value of 5.
Answer Key
A database is a systematically organized collection of data. This organization commonly resembles a spreadsheet or a table, with data containing elements and attributes for identification.
A DBMS acts as an intermediary between SQL instructions and the underlying database. It takes responsibility for transforming SQL commands into a format that the database can understand and execute.
DDL (Data Definition Language) helps you define data structures. DML (Data Manipulation Language) allows you to work with the data itself. DQL (Data Query Language) enables you to retrieve information from the database.
The CREATE DATABASE statement establishes a new database, while the CREATE TABLE statement defines the structure of a table within a database, including its columns and their data types.
The INSERT INTO statement adds new rows of data into a specified table. It requires indicating the table and the values to be placed into the respective columns.
The NOT NULL constraint enforces that a particular column must always have a value and cannot be left empty or contain a null entry when data is added or modified.
The + operator performs addition, the – operator performs subtraction, and the * operator performs multiplication between numerical values in SQL queries.
The SELECT statement retrieves data from database tables. The WHERE clause filters the results of a SELECT query, allowing you to specify conditions that rows must meet to be included in the output.
The Python shell is an interactive interpreter for immediate code execution, while running a .py file executes the entire script from the command line without direct interaction during the process.
A variable in Python is a name used to refer to a memory location that stores a value; for instance, name = “Alice” assigns the string value “Alice” to the variable named name.
Essay Format Questions
Discuss the significance of SQL as a standard language for database management. In your discussion, elaborate on at least three advantages of using SQL as highlighted in the provided text and provide examples of how these advantages contribute to efficient database operations.
Compare and contrast the roles of Data Definition Language (DDL) and Data Manipulation Language (DML) in SQL. Explain how these two sub-languages work together to enable the creation and management of data within a relational database system.
Explain the concept of scope in Python and discuss the LEGB rule. Provide examples to illustrate the differences between local, enclosed, global, and built-in scopes and explain how Python resolves variable names based on this rule.
Discuss the importance of modules in Python programming. Explain the advantages of using modules, such as reusability and organization, and describe different ways to import modules, including the use of import, from … import …, and aliases.
Imagine you are designing a simple database for a small online bookstore. Describe the tables you would create, the columns each table would have (including data types and any necessary constraints like NOT NULL or primary keys), and provide example SQL CREATE TABLE statements for two of your proposed tables.
Glossary of Key Terms
Database: A systematically organized collection of data that can be easily accessed, managed, and updated.
Table: A structure within a database used to organize data into rows (records) and columns (fields or attributes).
Column (Field): A vertical set of data values of a particular type within a table, representing an attribute of the entities stored in the table.
Row (Record): A horizontal set of data values within a table, representing a single instance of the entity being described.
SQL (Structured Query Language): A standard programming language used for managing and manipulating data in relational databases.
DBMS (Database Management System): Software that enables users to interact with a database, providing functionalities such as data storage, retrieval, and security.
DDL (Data Definition Language): A subset of SQL commands used to define the structure of a database, including creating, altering, and dropping databases, tables, and other database objects.
DML (Data Manipulation Language): A subset of SQL commands used to manipulate data within a database, including inserting, updating, deleting, and retrieving data.
DQL (Data Query Language): A subset of SQL commands, primarily the SELECT statement, used to query and retrieve data from a database.
Constraint: A rule or restriction applied to data in a database to ensure its accuracy, integrity, and reliability. Examples include NOT NULL.
Operator: A symbol or keyword that performs an operation on one or more operands. In SQL, this includes arithmetic operators (+, -, *, /), logical operators (AND, OR, NOT), and comparison operators (=, >, <, etc.).
Schema: The logical structure of a database, including the organization of tables, columns, relationships, and constraints.
Python Shell: An interactive command-line interpreter for Python, allowing users to execute code snippets and receive immediate feedback.
.py file: A file containing Python source code, which can be executed as a script from the command line.
Variable (Python): A named reference to a value stored in memory. Variables in Python are dynamically typed, meaning their data type is determined by the value assigned to them.
Data Type (Python): The classification of data that determines the possible values and operations that can be performed on it (e.g., integer, string, boolean).
String (Python): A sequence of characters enclosed in single or double quotes, used to represent text.
Scope (Python): The region of a program where a particular name (variable, function, etc.) is accessible. Python has four main scopes: local, enclosed, global, and built-in (LEGB).
Module (Python): A file containing Python definitions and statements. Modules provide a way to organize code into reusable units.
Import (Python): A statement used to load and make the code from another module available in the current script.
Alias (Python): An alternative name given to a module or function during import, often used for brevity or to avoid naming conflicts.
Briefing Document: Review of “01.pdf”
This briefing document summarizes the main themes and important concepts discussed in the provided excerpts from “01.pdf”. The document covers fundamental database concepts using SQL, basic command-line operations, an introduction to Python programming, and related software development tools.
I. Introduction to Databases and SQL
The document introduces the concept of databases as systematically organized data, often resembling spreadsheets or tables. It highlights the widespread use of databases in various applications, providing examples like banks storing account and transaction data, and hospitals managing patient, staff, and laboratory information.
“well a database looks like data organized systematically and this organization typically looks like a spreadsheet or a table”
The core purpose of SQL (Structured Query Language) is explained as a language used to interact with databases. Key operations that can be performed using SQL are outlined:
“operational terms create add or insert data read data update existing data and delete data”
SQL is further divided into several sub-languages:
DDL (Data Definition Language): Used to define the structure of the database and its objects like tables. Commands like CREATE (to create databases and tables) and ALTER (to modify existing objects, e.g., adding a column) are part of DDL.
“ddl as the name says helps you define data in your database but what does it mean to Define data before you can store data in the database you need to create the database and related objects like tables in which your data will be stored for this the ddl part of SQL has a command named create then you might need to modify already created database objects for example you might need to modify the structure of a table by adding a new column you can perform this task with the ddl alter command you can remove an object like a table from a”
DML (Data Manipulation Language): Used to manipulate the data within the database, including inserting (INSERT INTO), updating, and deleting data.
“now we need to populate the table of data this is where I can use the data manipulation language or DML subset of SQL to add table data I use the insert into syntax this inserts rows of data into a given table I just type insert into followed by the table name and then a list of required columns or Fields within a pair of parentheses then I add the values keyword”
DQL (Data Query Language): Primarily used for querying or retrieving data from the database (SELECT statements fall under this category).
DCL (Data Control Language): Used to control access and security within the database.
The document emphasizes that a DBMS (Database Management System) is crucial for interpreting and executing SQL instructions, acting as an intermediary between the SQL commands and the underlying database.
“a database interprets and makes sense of SQL instructions with the use of a database management system or dbms as a web developer you’ll execute all SQL instructions on a database using a dbms the dbms takes responsibility for transforming SQL instructions into a form that’s understood by the underlying database”
The advantages of using SQL are highlighted, including its simplicity, standardization, portability, comprehensiveness, and efficiency in processing large amounts of data.
“you now know that SQL is a simple standard portable comprehensive and efficient language that can be used to delete data retrieve and share data among multiple users and manage database security this is made possible through subsets of SQL like ddl or data definition language DML also known as data manipulation language dql or data query language and DCL also known as data control language and the final advantage of SQL is that it lets database users process large amounts of data quickly and efficiently”
Examples of basic SQL syntax are provided, such as creating a database (CREATE DATABASE College;) and creating a table (CREATE TABLE student ( … );). The INSERT INTO syntax for adding data to a table is also introduced.
Constraints like NOT NULL are mentioned as ways to enforce data integrity during table creation.
“the creation of a new customer record is aborted the not null default value is implemented using a SQL statement a typical not null SQL statement begins with the creation of a basic table in the database I can write a create table Clause followed by customer to define the table name followed by a pair of parentheses within the parentheses I add two columns customer ID and customer name I also Define each column with relevant data types end for customer ID as it stores”
SQL arithmetic operators (+, -, *, /, %) are introduced with examples. Logical operators (NOT, OR) and special operators (IN, BETWEEN) used in the WHERE clause for filtering data are also explained. The concept of JOIN clauses, including SELF-JOIN, for combining data from tables is briefly touched upon.
Subqueries (inner queries within outer queries) and Views (virtual tables based on the result of a query) are presented as advanced SQL concepts. User-defined functions and triggers are also introduced as ways to extend database functionality and automate actions. Prepared statements are mentioned as a more efficient way to execute SQL queries repeatedly. Date and time functions in MySQL are briefly covered.
II. Introduction to Command Line/Bash Shell
The document provides a basic introduction to using the command line or bash shell. Fundamental commands are explained:
PWD (Print Working Directory): Shows the current directory.
“to do that I run the PWD command PWD is short for print working directory I type PWD and press the enter key the command returns a forward slash which indicates that I’m currently in the root directory”
LS (List): Displays the contents of the current directory. The -l flag provides a detailed list format.
“if I want to check the contents of the root directory I run another command called LS which is short for list I type LS and press the enter key and now notice I get a list of different names of directories within the root level in order to get more detail of what each of the different directories represents I can use something called a flag flags are used to set options to the commands you run use the list command with a flag called L which means the format should be printed out in a list format I type LS space Dash l press enter and this Returns the results in a list structure”
CD (Change Directory): Navigates between directories using relative or absolute paths. cd .. moves up one directory.
“to step back into Etc type cdetc to confirm that I’m back there type bwd and enter if I want to use the other alternative you can do an absolute path type in CD forward slash and press enter Then I type PWD and press enter you can verify that I am back at the root again to step through multiple directories use the same process type CD Etc and press enter check the contents of the files by typing LS and pressing enter”
MKDIR (Make Directory): Creates a new directory.
“now I will create a new directory called submissions I do this by typing MK der which stands for make directory and then the word submissions this is the name of the directory I want to create and then I hit the enter key I then type in ls-l for list so that I can see the list structure and now notice that a new directory called submissions has been created I can then go into this”
TOUCH: Creates a new empty file.
“the Parent Directory next is the touch command which makes a new file of whatever type you specify for example to build a brand new file you can run touch followed by the new file’s name for instance example dot txt note that the newly created file will be empty”
HISTORY: Shows a history of recently used commands.
“to view a history of the most recently typed commands you can use the history command”
File Redirection (>, >>, <): Allows redirecting the input or output of commands to files. > overwrites, >> appends.
“if you want to control where the output goes you can use a redirection how do we do that enter the ls command enter Dash L to print it as a list instead of pressing enter add a greater than sign redirection now we have to tell it where we want the data to go in this scenario I choose an output.txt file the output dot txt file has not been created yet but it will be created based on the command I’ve set here with a redirection flag press enter type LS then press enter again to display the directory the output file displays to view the”
GREP: Searches for patterns within files.
“grep stands for Global regular expression print and it’s used for searching across files and folders as well as the contents of files on my local machine I enter the command ls-l and see that there’s a file called”
CAT: Displays the content of a file.
LESS: Views file content page by page.
“press the q key to exit the less environment the other file is the bash profile file so I can run the last command again this time with DOT profile this tends to be used used more for environment variables for example I can use it for setting”
VIM: A text editor used for creating and editing files.
“now I will create a simple shell script for this example I will use Vim which is an editor that I can use which accepts input so type vim and”
CHMOD: Changes file permissions, including making a file executable (chmod +x filename).
“but I want it to be executable which requires that I have an X being set on it in order to do that I have to use another command which is called chmod after using this them executable within the bash shell”
The document also briefly mentions shell scripts (files containing a series of commands) and environment variables (dynamic named values that can affect the way running processes will behave on a computer).
III. Introduction to Git and GitHub
Git is introduced as a free, open-source distributed version control system used to manage source code history, track changes, revert to previous versions, and collaborate with other developers. Key Git commands mentioned include:
GIT CLONE: Used to create a local copy of a remote repository (e.g., from GitHub).
“to do this I type the command git clone and paste the https URL I copied earlier finally I press enter on my keyboard notice that I receive a message stating”
LS -LA: Lists all files in a directory, including hidden ones (like the .git directory which contains the Git repository metadata).
“the ls-la command another file is listed which is just named dot get you will learn more about this later when you explore how to use this for Source control”
CD .git: Changes the current directory to the .git folder.
“first open the dot get folder on your terminal type CD dot git and press enter”
CAT HEAD: Displays the reference to the current commit.
“next type cat head and press enter in git we only work on a single Branch at a time this file also exists inside the dot get folder under the refs forward slash heads path”
CAT refs/heads/main: Displays the hash of the last commit on the main branch.
“type CD dot get and press enter next type cat forward slash refs forward slash heads forward slash main press enter after you”
GIT PULL: Fetches changes from a remote repository and integrates them into the local branch.
“I am now going to explain to you how to pull the repository to your local device”
GitHub is described as a cloud-based hosting service for Git repositories, offering a user interface for managing Git projects and facilitating collaboration.
IV. Introduction to Python Programming
The document introduces Python as a versatile programming language and outlines different ways to run Python code:
Python Shell: An interactive environment for running and testing small code snippets without creating separate files.
“the python shell is useful for running and testing small scripts for example it allows you to run code without the need for creating new DOT py files you start by adding Snippets of code that you can run directly in the shell”
Running Python Files: Executing Python code stored in files with the .py extension using the python filename.py command.
“running a python file directly from the command line or terminal note that any file that has the file extension of dot py can be run by the following command for example type python then a space and then type the file”
Basic Python concepts covered include:
Variables: Declaring and assigning values to variables (e.g., x = 5, name = “Alice”). Python automatically infers data types. Multiple variables can be assigned the same value (e.g., a = b = c = 10).
“all I have to do is name the variable for example if I type x equals 5 I have declared a variable and assigned as a value I can also print out the value of the variable by calling the print statement and passing in the variable name which in this case is X so I type print X when I run the program I get the value of 5 which is the assignment since I gave the initial variable Let Me Clear My screen again you have several options when it comes to declaring variables you can declare any different type of variable in terms of value for example X could equal a string called hello to do this I type x equals hello I can then print the value again run it and I find the output is the word hello behind the scenes python automatically assigns the data type for you”
Data Types: Basic data types like integers, floats (decimal numbers), complex numbers, strings (sequences of characters enclosed in single or double quotes), lists, and tuples (ordered, immutable sequences) are introduced.
“X could equal a string called hello to do this I type x equals hello I can then print the value again run it and I find the output is the word hello behind the scenes python automatically assigns the data type for you you’ll learn more about this in an upcoming video on data types you can declare multiple variables and assign them to a single value as well for example making a b and c all equal to 10. I do this by typing a equals b equals C equals 10. I print all three… sequence types are classed as container types that contain one or more of the same type in an ordered list they can also be accessed based on their index in the sequence python has three different sequence types namely strings lists and tuples let’s explore each of these briefly now starting with strings a string is a sequence of characters that is enclosed in either a single or double quotes strings are represented by the string class or Str for”
Operators: Arithmetic operators (+, -, *, /, **, %, //) and logical operators (and, or, not) are explained with examples.
“example 7 multiplied by four okay now let’s explore logical operators logical operators are used in Python on conditional statements to determine a true or false outcome let’s explore some of these now first logical operator is named and this operator checks for all conditions to be true for example a is greater than five and a is less than 10. the second logical operator is named or this operator checks for at least one of the conditions to be true for example a is greater than 5 or B is greater than 10. the final operator is named not this”
Conditional Statements: if, elif (else if), and else statements are introduced for controlling the flow of execution based on conditions.
“The Logical operators are and or and not let’s cover the different combinations of each in this example I declare two variables a equals true and B also equals true from these variables I use an if statement I type if a and b colon and on the next line I type print and in parentheses in double quotes”
Loops: for loops (for iterating over sequences) and while loops are introduced with examples, including nested loops.
“now let’s break apart the for Loop and discover how it works the variable item is a placeholder that will store the current letter in the sequence you may also recall that you can access any character in the sequence by its index the for Loop is accessing it in the same way and assigning the current value to the item variable this allows us to access the current character to print it for output when the code is run the outputs will be the letters of the word looping each letter on its own line now that you know about looping constructs in Python let me demonstrate how these work further using some code examples to Output an array of tasty desserts python offers us multiple ways to do loops or looping you’ll Now cover the for loop as well as the while loop let’s start with the basics of a simple for Loop to declare a for loop I use the four keyword I now need a variable to put the value into in this case I am using I I also use the in keyword to specify where I want to Loop over I add a new function called range to specify the number of items in a range in this case I’m using 10 as an example next I do a simple print statement by pressing the enter key to move to a new line I select the print function and within the brackets I enter the name looping and the value of I then I click on the Run button the output indicates the iteration Loops through the range of 0 to 9.”
Functions: Defining and calling functions using the def keyword. Functions can take arguments and return values. Examples of using *args (for variable positional arguments) and **kwargs (for variable keyword arguments) are provided.
“I now write a function to produce a string out of this information I type def contents and then self in parentheses on the next line I write a print statement for the string the plus self dot dish plus has plus self dot items plus and takes plus self dot time plus Min to prepare here we’ll use the backslash character to force a new line and continue the string on the following line for this to print correctly I need to convert the self dot items and self dot time… let’s say for example you wanted to calculate a total bill for a restaurant a user got a cup of coffee that was 2.99 then they also got a cake that was 455 and also a juice for 2.99. the first thing I could do is change the for Loop let’s change the argument to quarks by”
File Handling: Opening, reading (using read, readline, readlines), and writing to files. The importance of closing files is mentioned.
“the third method to read files in Python is read lines let me demonstrate this method the read lines method reads the entire contents of the file and then returns it in an ordered list this allows you to iterate over the list or pick out specific lines based on a condition if for example you have a file with four lines of text and pass a length condition the read files function will return the output all the lines in your file in the correct order files are stored in directories and they have”
Recursion: The concept of a function calling itself is briefly illustrated.
“the else statement will recursively call the slice function but with a modified string every time on the next line I add else and a colon then on the next line I type return string reverse Str but before I close the parentheses I add a slice function by typing open square bracket the number 1 and a colon followed by”
Object-Oriented Programming (OOP): Basic concepts of classes (using the class keyword), objects (instances of classes), attributes (data associated with an object), and methods (functions associated with an object, with self as the first parameter) are introduced. Inheritance (creating new classes based on existing ones) is also mentioned.
“method inside this class I want this one to contain a new function called leave request so I type def Leaf request and then self in days as the variables in parentheses the purpose of the leave request function is to return a line that specifies the number of days requested to write this I type return the string may I take a leave for plus Str open parenthesis the word days close parenthesis plus another string days now that I have all the classes in place I’ll create a few instances from these classes one for a supervisor and two others for… you will be defining a function called D inside which you will be creating another nested function e let’s write the rest of the code you can start by defining a couple of variables both of which will be called animal the first one inside the D function and the second one inside the E function note how you had to First declare the variable inside the E function as non-local you will now add a few more print statements for clarification for when you see the outputs finally you have called the E function here and you can add one more variable animal outside the D function this”
Modules: The concept of modules (reusable blocks of code in separate files) and how to import them using the import statement (e.g., import math, from math import sqrt, import math as m). The benefits of modular programming (scope, reusability, simplicity) are highlighted. The search path for modules (sys.path) is mentioned.
“so a file like sample.py can be a module named Sample and can be imported modules in Python can contain both executable statements and functions but before you explore how they are used it’s important to understand their value purpose and advantages modules come from modular programming this means that the functionality of code is broken down into parts or blocks of code these parts or blocks have great advantages which are scope reusability and simplicity let’s delve deeper into these everything in… to import and execute modules in Python the first important thing to know is that modules are imported only once during execution if for example your import a module that contains print statements print Open brackets close brackets you can verify it only executes the first time you import the module even if the module is imported multiple times since modules are built to help you Standalone… I will now import the built-in math module by typing import math just to make sure that this code works I’ll use a print statement I do this by typing print importing the math module after this I’ll run the code the print statement has executed most of the modules that you will come across especially the built-in modules will not have any print statements and they will simply be loaded by The Interpreter now that I’ve imported the math module I want to use a function inside of it let’s choose the square root function sqrt to do this I type the words math dot sqrt when I type the word math followed by the dot a list of functions appears in a drop down menu and you can select sqrt from this list I passed 9 as the argument to the math.sqrt function assign this to a variable called root and then I print it the number three the square root of nine has been printed to the terminal which is the correct answer instead of importing the entire math module as we did above there is a better way to handle this by directly importing the square root function inside the scope of the project this will prevent overloading The Interpreter by importing the entire math module to do this I type from math import sqrt when I run this it displays an error now I remove the word math from the variable declaration and I run the code again this time it works next let’s discuss something called an alias which is an excellent way of importing different modules here I sign an alias called m to the math module I do this by typing import math as m then I type cosine equals m dot I”
Scope: The concepts of local, enclosed, global, and built-in scopes in Python (LEGB rule) and how variable names are resolved. Keywords global and nonlocal for modifying variable scope are mentioned.
“names of different attributes defined inside it in this way modules are a type of namespace name spaces and Scopes can become very confusing very quickly and so it is important to get as much practice of Scopes as possible to ensure a standard of quality there are four main types of Scopes that can be defined in Python local enclosed Global and built in the practice of trying to determine in which scope a certain variable belongs is known as scope resolution scope resolution follows what is known commonly as the legb rule let’s explore these local this is where the first search for a variable is in the local scope enclosed this is defined inside an enclosing or nested functions Global is defined at the uppermost level or simply outside functions and built-in which is the keywords present in the built-in module in simpler terms a variable declared inside a function is local and the ones outside the scope of any function generally are global here is an example the outputs for the code on screen shows the same variable name Greek in different scopes… keywords that can be used to change the scope of the variables Global and non-local the global keyword helps us access the global variables from within the function non- local is a special type of scope defined in Python that is used within the nested functions only in the condition that it has been defined earlier in the enclosed functions now you can write a piece of code that will better help you understand the idea of scope for an attributes you have already created a file called animalfarm.py you will be defining a function called D inside which you will be creating another nested function e let’s write the rest of the code you can start by defining a couple of variables both of which will be called animal the first one inside the D function and the second one inside the E function note how you had to First declare the variable inside the E function as non-local you will now add a few more print statements for clarification for when you see the outputs finally you have called the E function here and you can add one more variable animal outside the D function this”
Reloading Modules: The reload() function for re-importing and re-executing modules that have already been loaded.
“statement is only loaded once by the python interpreter but the reload function lets you import and reload it multiple times I’ll demonstrate that first I create a new file sample.py and I add a simple print statement named hello world remember that any file in Python can be used as a module I’m going to use this file inside another new file and the new file is named using reloads.py now I import the sample.py module I can add the import statement multiple times but The Interpreter only loads it once if it had been reloaded we”
Testing: Introduction to writing test cases using the assert keyword and the pytest framework. The convention of naming test functions with the test_ prefix is mentioned. Test-Driven Development (TDD) is briefly introduced.
“another file called test Edition dot Pi in which I’m going to write my test cases now I import the file that consists of the functions that need to be tested next I’ll also import the pi test module after that I Define a couple of test cases with the addition and subtraction functions each test case should be named test underscore then the name of the function to be tested in our case we’ll have test underscore add and test underscore sub I’ll use the assert keyword inside these functions because tests primarily rely on this keyword it… contrary to the conventional approach of writing code I first write test underscore find string Dot py and then I add the test function named test underscore is present in accordance with the test I create another file named file string dot py in which I’ll write the is present function I Define the function named is present and I pass an argument called person in it then I make a list of names written as values after that I create a simple if else condition to check if the past argument”
V. Software Development Tools and Concepts
The document mentions several tools and concepts relevant to software development:
Python Installation and Version: Checking the installed Python version using python –version.
“prompt type python dash dash version to identify which version of python is running on your machine if python is correctly installed then Python 3 should appear in your console this means that you are running python 3. there should also be several numbers after the three to indicate which version of Python 3 you are running make sure these numbers match the most recent version on the python.org website if you see a message that states python not found then review your python installation or relevant document on”
Jupyter Notebook: An interactive development environment (IDE) for Python. Installation using python -m pip install jupyter and running using jupyter notebook are mentioned.
“course you’ll use the Jupiter put her IDE to demonstrate python to install Jupiter type python-mpip install Jupiter within your python environment then follow the jupyter installation process once you’ve installed jupyter type jupyter notebook to open a new instance of the jupyter notebook to use within your default browser”
MySQL Connector: A Python library used to connect Python applications to MySQL databases.
“the next task is to connect python to your mySQL database you can create the installation using a purpose-built python Library called MySQL connector this library is an API that provides useful”
Datetime Library: Python’s built-in module for working with dates and times. Functions like datetime.now(), datetime.date(), datetime.time(), and timedelta are introduced.
“python so you can import it without requiring pip let’s review the functions that Python’s daytime Library offers the date time Now function is used to retrieve today’s date you can also use date time date to retrieve just the date or date time time to call the current time and the time Delta function calculates the difference between two values now let’s look at the Syntax for implementing date time to import the daytime python class use the import code followed by the library name then use the as keyword to create an alias of… let’s look at a slightly more complex function time Delta when making plans it can be useful to project into the future for example what date is this same day next week you can answer questions like this using the time Delta function to calculate the difference between two values and return the result in a python friendly format so to find the date in seven days time you can create a new variable called week type the DT module and access the time Delta function as an object 563 instance then pass through seven days as an argument finally”
MySQL Workbench: A graphical tool for working with MySQL databases, including creating schemas.
“MySQL server instance and select the schema menu to create a new schema select the create schema option from the menu pane in the schema toolbar this action opens a new window within this new window enter mg underscore schema in the database name text field select apply this generates a SQL script called create schema mg schema you 606 are then asked to review the SQL script to be applied to your new database click on the apply button within the review window if you’re satisfied with the script a new window”
Data Warehousing: Briefly introduces the concept of a centralized data repository for integrating and processing large amounts of data from multiple sources for analysis. Dimensional data modeling is mentioned.
“in the next module you’ll explore the topic of data warehousing in this module you’ll learn about the architecture of a data warehouse and build a dimensional data model you’ll begin with an overview of the concept of data warehousing you’ll learn that a data warehouse is a centralized data repository that loads integrates stores and processes large amounts of data from multiple sources users can then query this data to perform data analysis you’ll then”
Binary Numbers: A basic explanation of the binary number system (base-2) is provided, highlighting its use in computing.
“binary has many uses in Computing it is a very convenient way of… consider that you have a lock with four different digits each digit can be a zero or a one how many potential past numbers can you have for the lock the answer is 2 to the power of four or two times two times two times two equals sixteen you are working with a binary lock therefore each digit can only be either zero or one so you can take four digits and multiply them by two every time and the total is 16. each time you add a potential digit you increase the”
Knapsack Problem: A brief overview of this optimization problem is given as a computational concept.
“three kilograms additionally each item has a value the torch equals one water equals two and the tent equals three in short the knapsack problem outlines a list of items that weigh different amounts and have different values you can only carry so many items in your knapsack the problem requires calculating the optimum combination of items you can carry if your backpack can carry a certain weight the goal is to find the best return for the weight capacity of the knapsack to compute a solution for this problem you must select all items”
This document provides a foundational overview of databases and SQL, command-line basics, version control with Git and GitHub, and introductory Python programming concepts, along with essential development tools. The content suggests a curriculum aimed at individuals learning about software development, data management, and related technologies.
Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
The initial source, “01.pdf,” details an investigation of a database schema, focusing on understanding table relationships and performing revenue calculations. It demonstrates how to join tables to incorporate customer and product information, filter data for recent sales, and categorize customers based on revenue using case statements. The subsequent sections introduce the concept of pivoting data using aggregation and case statements, illustrating techniques for analyzing customer counts by region and calculating net revenue across different categories and years. Further content explores statistical functions within pivoting for median sales analysis and advances into segmentation using multiple conditions within case statements for detailed revenue breakdowns. The final portion of the provided text transitions into date calculations, covering functions like date_trunc and date_part for time series analysis, extracting date components, and using intervals and the age function to analyze processing times. This culminates in an introduction to window functions, explaining their syntax and application in calculations over partitions of data without collapsing rows, including ranking and running aggregates, and finally examining frame clauses for controlling the data within a window.
Source Material Study Guide
Quiz
Explain the purpose and importance of the %sql magic command in the provided text. What happens if you try to run a SQL query without it?
Describe what magic commands are in the context of the source material. Give at least two examples of magic commands mentioned besides %sql and explain their function.
What is the Contoso database, and what are some of the key tables within it that are discussed in the excerpts? Briefly describe the purpose of the Sales, Customer, and Date tables.
Explain how net revenue is calculated in the context of the Sales table. What columns are used in this calculation, and why is net price used instead of unit price?
Describe the process of joining tables in SQL as demonstrated in the excerpts. What type of join is frequently used, and on what columns are the tables typically joined in the examples?
What is a CASE WHEN statement, and how is it used in the provided text? Give an example of how it’s used to categorize data within a SQL query.
Explain the concept of pivoting data as introduced in the “pivoting with case statements” section. How is the COUNT(DISTINCT CASE WHEN … END) syntax used to achieve this?
Describe the DATE_TRUNC and EXTRACT functions as explained in the text. What are they used for, and what are some examples of date parts that can be extracted?
Explain the purpose and basic syntax of a Common Table Expression (CTE). How are CTEs used to structure more complex SQL queries in the examples provided?
Briefly describe the functionality of window functions as introduced in the excerpts. How do they differ from aggregate functions with a GROUP BY clause?
Quiz Answer Key
The %sql magic command is crucial for indicating to the Jupyter Notebook environment that the subsequent lines of code should be interpreted and executed as SQL queries. Without it, the code will be treated as regular Python code, leading to syntax errors and incorrect execution.
Magic commands are special commands in the Jupyter Notebook environment that extend its functionality. Besides %sql, examples include %timeit which measures the execution time of the next line of code, and single % followed by Python code which executes that line as Python without timing.
The Contoso database is the data set used throughout the course. Key tables discussed include Sales (containing transaction information like price and quantity), Customer (containing customer details like name and location), and Date (intended for date-based aggregations but later suggested to be ignored for learning date functions).
Net revenue is calculated by multiplying the quantity of a product by its net price and the exchange rate. The net price is used because it represents the actual price charged to the customer after all discounts and adjustments.
Joining tables combines rows from two or more tables based on a related column. Left joins are frequently used to keep all rows from the left table and matching rows from the right table. Tables are typically joined on key columns like ProductKey, CustomerKey, and date columns.
A CASE WHEN statement allows for conditional logic within a SQL query, enabling the assignment of different values based on specified conditions. For example, it’s used to categorize customers as “high” or “low” value based on their net revenue.
Pivoting data transforms rows into columns. The COUNT(DISTINCT CASE WHEN condition THEN column END) syntax is used to count the distinct occurrences of a specific column based on whether a certain condition is met, effectively creating new columns for each category.
DATE_TRUNC extracts a specified date part (e.g., month, year) from a date, while EXTRACT similarly retrieves a part of a date. They are used for analyzing data based on different time granularities. Examples of extractable parts include ‘year’, ‘month’, ‘day of week’.
A Common Table Expression (CTE) is a temporary, named result set defined within the scope of a single query. CTEs are used to break down complex queries into smaller, more manageable, and readable parts, often used before a final SELECT statement or when joining to the same subquery multiple times.
Window functions perform calculations across a set of table rows that are related to the current row, but unlike aggregate functions with GROUP BY, they do not collapse the rows into a single output row. They allow for calculations like running totals, rankings, and averages within partitions of data.
Essay Format Questions
Discuss the importance of using magic commands in the context of interactive SQL querying within a Jupyter Notebook environment. How do they facilitate the integration of SQL with other programming languages like Python, as suggested in the excerpts?
Analyze the strategy of exploring the Contoso database by examining individual tables (Sales, Customer, Date) and then combining them through joins. What are the benefits and potential challenges of this approach to understanding a new database schema?
Evaluate the use of CASE WHEN statements in SQL for data categorization and pivoting, as demonstrated in the source material. Provide examples of scenarios where these techniques would be particularly valuable for data analysis and reporting.
Compare and contrast the DATE_TRUNC and EXTRACT functions for manipulating date data in SQL. In what situations might one function be preferred over the other, and how do they contribute to more effective time-based analysis?
Explain the role and advantages of using Common Table Expressions (CTEs) in writing complex SQL queries. How do CTEs improve query readability and maintainability, and can you provide a hypothetical example (based on the source material) where a CTE would significantly simplify a query?
Glossary of Key Terms
Magic Command: Special commands in interactive environments like Jupyter Notebooks that provide extra functionality beyond the standard language syntax (e.g., %sql, %timeit).
SQL: Structured Query Language, a standard language for accessing and manipulating databases.
Jupyter Notebook: An interactive web-based environment for creating and sharing documents that contain live code, equations, visualizations, and narrative text.
Database: An organized collection of structured information, or data, typically stored electronically in a computer system.
Table: A structure within a database that organizes data into rows and columns.
Column: A vertical attribute or field in a table, containing a specific type of data for each record.
Row: A horizontal record in a table, representing a single instance or entry.
Query: A request for data or information from a database.
Syntax: The set of rules that govern the structure and format of statements in a programming or query language.
Autocomplete: A feature where the environment suggests or automatically completes code as the user is typing.
Syntax Error: An error in the structure or grammar of a statement that prevents it from being correctly interpreted.
Execution Time: The amount of time it takes for a program or query to run and complete.
Net Revenue: The actual revenue received after accounting for discounts, returns, and other adjustments.
Unit Price: The standard price of a single unit of a product before any discounts or adjustments.
Quantity: The number of units of a product.
Join: An SQL operation that combines rows from two or more tables based on a related column.
Left Join: Returns all rows from the left table and the matching rows from the right table. If there’s no match in the right table, NULLs are used for the columns of the right table.
Alias: A temporary name given to a table or column in a query to make it easier to refer to.
CASE WHEN Statement: A conditional expression in SQL that allows for different results based on specified conditions.
Pivoting: A data transformation technique that rotates rows into columns.
Aggregation: The process of summarizing data using functions like COUNT, SUM, AVG, MIN, and MAX.
DATE_TRUNC: An SQL function that truncates a timestamp or date value to a specified precision (e.g., day, month, year).
EXTRACT: An SQL function that retrieves a specific component (e.g., year, month, day) from a date or timestamp value.
CAST: An SQL operator used to change the data type of an expression.
Common Table Expression (CTE): A temporary, named result set defined within the execution of a single SQL statement.
Window Function: An SQL function that performs a calculation across a set of table rows that are related to the current row, without collapsing the rows.
Partition By: A clause used with window functions to divide the rows into partitions within which the function is applied.
Order By: A clause used to sort the rows within a result set or within a window function’s partition.
Jupyter, SQL, and Database Exploration with PostgreSQL
## Briefing Document: Analysis of Provided Sources
This briefing document summarizes the main themes and important ideas presented in the provided excerpts from “01.pdf”. The excerpts cover a range of topics related to using Jupyter Notebooks with SQL, exploring a database (Contoso), performing various SQL operations (including joins, aggregations, window functions, pivoting, date manipulation, and query optimization), and finally, setting up a local PostgreSQL environment with tools like pgAdmin and DBeaver for more robust database interaction and project management.
**Main Themes:**
1. **Introduction to Jupyter Notebooks and SQL Integration:** The initial sections focus on using Jupyter Notebooks with SQL through “magic commands” like `%sql`. This integration allows for writing and executing SQL queries directly within a Python environment.
* **Key Idea:** The `%sql` magic command is crucial for executing SQL within a Jupyter Notebook cell. Without it, SQL syntax will be highlighted as incorrect and will result in errors.
* **Quote:** “very important that you put these magic commands up at the top now so people don’t think i’m crazy magic commands are the actual official language of this”
* **Key Idea:** Jupyter Notebook also supports other magic commands like `%timeit` for measuring code execution time, demonstrating the versatility of the environment. Single `%` applies the command to one line, while `%%` can apply to an entire cell (though not explicitly shown in the excerpt).
2. **Exploring the Contoso Database Schema:** The sources introduce the Contoso database as the primary dataset for the course. The excerpts detail the exploration of key tables like `Sales`, `Customer`, and `Date`.
* **Key Idea:** Understanding the relationships between tables (e.g., `Date` table related to `Sales` via date columns, `Customer` and `Product` tables linked to `Sales` via keys) is fundamental for querying and analysis.
* **Quote:** “last table to explore is that date table this is related using that date column here to the sales table order date and delivery date.”
* **Key Idea:** The `Date` table, while useful for quick filtering in tools like Power BI, will be largely ignored in the course in favor of learning more flexible date functions in SQL.
3. **Performing Fundamental SQL Operations:** The excerpts illustrate core SQL concepts such as calculating net revenue, joining tables to combine data from different entities, and using aliases for tables and columns.
* **Key Idea:** Net revenue is calculated as `quantity * net_price`. The `net_price` already accounts for discounts and promotions.
* **Quote:** “the net price is the price after all the different discounts promotions or any adjustments so basically it’s what we actually charge to the customer when they pay for the product”
* **Key Idea:** `LEFT JOIN` is used to combine tables while ensuring all rows from the left table are included. Aliases (e.g., `s` for `Sales`, `c` for `Customer`, `p` for `Product`) improve query readability.
4. **Introduction to Pivoting Data with `CASE` Statements:** The sources introduce the concept of pivoting data, transforming rows into columns. This is achieved using `CASE WHEN` statements combined with aggregation functions like `COUNT` and `SUM`.
* **Key Idea:** `CASE WHEN` allows for conditional logic within SQL queries, enabling the creation of new categories or columns based on existing data.
* **Quote:** “we’re going to be using statements like case when and aggregation in order to pivot data but what the heck is pivoting data let’s take a look at this simple example”
* **Key Idea:** Pivoting can be used to create summary tables where values from one column become headers in the output.
5. **Date Manipulation with Functions like `DATE_TRUNC` and `TO_CHAR`:** The excerpts demonstrate how to extract specific parts of a date (e.g., month, year) using functions like `DATE_TRUNC` and `EXTRACT`, and how to format dates into desired string representations using `TO_CHAR`. Casting data types (e.g., to `DATE`) is also shown.
* **Key Idea:** `DATE_TRUNC` allows truncating a date to a specified level of precision (e.g., month).
* **Quote:** “specifically if you just want to specify one attribute you want to extract out of it such as something like month as we did you could either do quarter year decade century or even millennium”
* **Key Idea:** `TO_CHAR` provides more flexible date formatting options using various format codes.
6. **Introduction to Window Functions:** A significant portion of the excerpts is dedicated to introducing window functions. These functions perform calculations across a set of rows that are related to the current row, without collapsing the rows like `GROUP BY`.
* **Key Idea:** Window functions use the `OVER()` clause to define the “window” of rows for the calculation. `PARTITION BY` divides the data into groups, and `ORDER BY` orders the rows within each partition.
* **Quote:** “they let you perform calculations across a set of tables related to the current row…and like we showed they don’t group the results into a single output row this is very beneficial as we’re going to demonstrate some future exercises”
* **Key Idea:** Examples include calculating running totals, ranks (`ROW_NUMBER`, `RANK`, `DENSE_RANK`), and moving averages using frame clauses (`ROWS BETWEEN …`).
7. **Lag and Lead Functions:** The excerpts introduce `LAG` and `LEAD` functions, which allow accessing data from previous or subsequent rows within a window partition. This is useful for calculating differences or growth rates over time.
* **Key Idea:** `LAG(column, offset, default)` retrieves a value from a row `offset` rows before the current row. `LEAD` works similarly for subsequent rows.
* **Quote:** “these type of things in a window function allow us instead of looking at the current row to allow us to look at things like the row above it or the row below it”
8. **Frame Clauses in Window Functions:** The sources explain how frame clauses (`ROWS BETWEEN`) within the `OVER()` clause can further define the set of rows to be considered for a window function calculation (e.g., a moving average over the preceding three months). `CURRENT ROW`, `PRECEDING`, and `FOLLOWING` are key keywords.
* **Key Idea:** Frame clauses allow for flexible calculations based on a sliding window of rows.
* **Quote:** “this allows us to specify a physical offset from the current row such as maybe the three preceding rows or maybe the two following rows”
9. **Setting Up a Local PostgreSQL Environment:** The later excerpts transition to setting up a local PostgreSQL database, including installing the server and pgAdmin (a GUI administration tool).
* **Key Idea:** Having a local database environment allows for more hands-on practice and development without relying on remote systems.
* **Key Idea:** pgAdmin is used for managing the PostgreSQL server, creating databases, running queries, and exploring the database schema.
10. **Introducing DBeaver as an Alternative Database Tool:** DBeaver is introduced as a more versatile database management tool that can connect to various database systems, unlike pgAdmin which is specific to PostgreSQL.
* **Key Idea:** DBeaver offers a more comprehensive set of features for database development and administration, including project management, SQL editing enhancements (auto-completion, formatting), and data export/visualization capabilities.
* **Quote:** “dbeaver now this is a database management tool so can only connect to different [types of databases]”
11. **Project Management in DBeaver:** The excerpts demonstrate how to create projects in DBeaver to organize SQL scripts, bookmarks, and other related files. This helps in structuring database development work.
12. **Introduction to Views:** Views are introduced as virtual tables that represent the result of a stored query. They simplify complex queries and provide a level of abstraction over the underlying tables.
* **Key Idea:** Views are created using the `CREATE VIEW` statement and can be queried like regular tables.
* **Quote:** “it’s a virtual table that allows us to show the results of a stored query in it”
13. **Introduction to VS Code for Project Development:** VS Code is presented as a powerful code editor, not just for SQL but also for creating and managing project documentation (like README files in Markdown). Its preview capabilities for Markdown are highlighted.
* **Key Idea:** VS Code, with its extensions, provides a robust environment for both code (SQL) and documentation.
14. **Query Optimization with `EXPLAIN` and `EXPLAIN ANALYZE`:** The final excerpts touch upon basic query optimization by introducing the `EXPLAIN` and `EXPLAIN ANALYZE` commands, which provide insights into the query execution plan and performance.
* **Key Idea:** `EXPLAIN` shows the planned steps the database will take to execute a query. `EXPLAIN ANALYZE` actually executes the query and provides timing information.
* **Quote:** “explain demonstrates the execution plan without actually executing it whereas explain analyze basically means like it’s going to analyze it and it actually does execute it”
**Most Important Ideas and Facts:**
* Jupyter Notebooks can seamlessly integrate with SQL using magic commands like `%sql`.
* The Contoso database is the central dataset for learning SQL concepts.
* Understanding table relationships is crucial for effective querying.
* `CASE WHEN` statements are essential for conditional logic and data pivoting.
* Window functions provide powerful analytical capabilities without collapsing rows, enabling calculations like running totals, rankings, and moving averages.
* `LAG` and `LEAD` functions allow for comparisons between rows.
* Frame clauses in window functions define the scope of rows for calculations.
* Setting up a local PostgreSQL environment with pgAdmin and DBeaver provides a robust platform for database learning and project development.
* DBeaver is a versatile database tool supporting multiple database systems.
* Views simplify queries and provide abstraction.
* VS Code is a valuable tool for both SQL development and project documentation (using Markdown).
* `EXPLAIN` and `EXPLAIN ANALYZE` are used to understand and optimize SQL query execution.
These excerpts lay a comprehensive foundation for learning intermediate SQL concepts, ranging from basic query structures and database exploration to advanced analytical functions and development environment setup. The progression through Jupyter Notebooks to local database tools like PostgreSQL and DBeaver indicates a move towards more practical and real-world database interaction and project management.
Exploring Data with SQL Magic and the Contoso Database
1. What is the purpose of the %sql magic command in the provided context?
The %sql magic command is essential for executing SQL queries within the environment (likely a Jupyter Notebook). When placed at the beginning of a cell or line, it signals to the interpreter that the subsequent text should be treated as a SQL query to be run against the connected database. Without this command, the SQL syntax would be misinterpreted, leading to errors. Using two percent signs (%%sql) applies the command to the entire cell, while a single percent sign (%sql) applies it only to the current line.
2. Beyond SQL, what other types of “magic commands” are mentioned and what is their general function?
The text mentions that %sql is not the only magic command available. It specifically highlights the %timeit magic command as an example. The general function of these magic commands is to provide additional functionalities and tools within the coding environment, such as measuring the execution time of code (%timeit) or facilitating interaction with external systems or specific languages (like SQL with %sql).
3. What is the Contoso database and what are some of the key tables within it that are explored in the excerpts?
The Contoso database is the primary dataset used throughout the lessons. The excerpts introduce and explore several key tables: – Sales: This table contains transactional data, including information about orders, quantities, net prices (prices after discounts), and order dates. It’s central to calculating revenue. – Customer: This table holds information about customers, such as their given name, surname, country, continent, and customer key. – Product: This table contains details about the products being sold, including product key, product name, category name, and subcategory name. – Date: This table contains various date-related attributes that can be used for aggregation and filtering based on dates, although the course later emphasizes using date functions instead of relying solely on this table.
4. How is “net revenue” calculated within the context of the Contoso database, and why is it considered important?
Net revenue is calculated by multiplying the quantity of a product sold by its net price (the price after all discounts and adjustments) and the exchange rate. It is considered important because it represents the actual revenue received from customers after accounting for discounts and promotions, reflecting the true value of sales transactions.
5. What is “pivoting data” as described in the excerpts, and how is it achieved using SQL?
Pivoting data involves transforming rows into columns. The example provided shows how to take customer counts grouped by continent (originally in rows) and restructure the output to have each continent as a separate column displaying the total customer count for that continent. This is achieved using aggregate functions (like COUNT DISTINCT) combined with CASE WHEN statements to conditionally assign values to the new columns based on the continent.
6. What is the purpose of the DATE_TRUNC and EXTRACT functions when working with dates, and how do they differ from the TO_CHAR function?
DATE_TRUNC is used to truncate a date to a specified level of precision, such as month, quarter, or year. It returns a timestamp or date with the less significant parts set to the beginning of the time period (e.g., the first day of the month).
EXTRACT is used to retrieve a specific component from a date or timestamp, such as the year, month, or day. It returns a numeric value representing that part.
TO_CHAR is used to format a date or timestamp as a text string according to a specified pattern. This allows for flexible output formats, such as extracting the month name or formatting the date in a particular way.
While DATE_TRUNC and EXTRACT help in manipulating and retrieving date parts for analysis or grouping, TO_CHAR focuses on presenting date information in a desired textual format.
7. What are “window functions” and how do they differ from standard SQL aggregate functions? What are some examples of window functions discussed in the excerpts?
Window functions perform calculations across a set of rows that are related to the current row, without collapsing the rows into a single output row like standard aggregate functions (e.g., SUM, COUNT, AVG) do. They allow you to access and perform calculations on a “window” of data defined by a PARTITION BY clause (dividing the data into groups) and an ORDER BY clause (specifying the order within each partition).
Examples of window functions discussed include: – Aggregate functions used as window functions (e.g., AVG() OVER (…)). – Ranking functions (ROW_NUMBER(), RANK(), DENSE_RANK()). – Value functions (FIRST_VALUE(), LAST_VALUE(), NTH_VALUE(), LAG(), LEAD()). – Percentile functions (PERCENTILE_CONT()).
8. What is “cohort analysis” as demonstrated in the excerpts, and what kind of insights can it provide about customer behavior?
Cohort analysis involves grouping users or customers based on a shared characteristic, typically the time they acquired a product or service (their “cohort year” or “first purchase date”). It then tracks their behavior over time. The excerpts demonstrate cohort analysis by examining how different cohorts of customers contribute to total revenue and customer retention in subsequent years. This can provide insights into customer lifetime value, the effectiveness of acquisition strategies over time, and customer churn patterns by showing how engagement and spending change for different initial groups of customers.
SQL for Data Exploration: A Course Overview
Based on the sources, data exploration within the context of this SQL course for data analytics appears to be a crucial initial step involving understanding the structure and content of a database using SQL queries and visualization techniques.
Here’s a breakdown of data exploration as presented in the course:
Understanding the Database Structure: The course emphasizes the importance of getting familiar with the database schema, which includes identifying the different tables and understanding how they relate to each other. The Entity Relationship Diagram (ERD) of the “cantazo” database is introduced as a tool to visualize these relationships, particularly how dimensional tables like store, product, and customer relate to the main fact table, sales.
Examining Tables and Columns: Data exploration involves inspecting individual tables to understand the columns they contain and the types of information stored in them. This is demonstrated by using SQL queries like SELECT * FROM sales LIMIT 10 to view the first few rows and identify the available columns such as dates, customer key, store key, product key, quantity, price, cost, and currency information.
Exploring Metadata: The course also shows how to query the information schema, a meta-database, to discover the tables within a database and the columns within each table. Specifically, it demonstrates using SELECT table_name FROM information_schema.tables WHERE table_schema = ‘public’; to list the tables and SELECT * FROM information_schema.columns WHERE table_name = ‘customer’; to see all the column names in the customer table.
Using SQL for Initial Analysis: Simple SQL queries are used to get a first look at the data and its characteristics. For example, selecting all columns from a table and limiting the number of rows allows for a quick overview of the data’s format and values.
Leveraging Tools for Exploration: The course utilizes Google Colab in the first half, which allows for running SQL queries and provides features like converting query results to interactive tables and generating visualizations. The integration with Gemini AI is also mentioned as a way to assist in generating SQL queries for data exploration. Later in the course, pgAdmin and DBeaver are introduced as more advanced database tools that allow for visual exploration of the schema, tables, and data. DBeaver, in particular, is highlighted for its ability to view table data in a grid format and examine ER diagrams.
Understanding Data Relationships through Joins: As part of data exploration, the course demonstrates how to use JOIN clauses (specifically LEFT JOIN) to combine data from multiple related tables, such as sales, customer, and product, to understand how different entities interact and to bring together relevant attributes for analysis.
Identifying Key Fields: The initial exploration helps in identifying key fields that will be important for further analysis, such as the different keys used to relate tables and the metrics (e.g., quantity, net price) available for calculations like net revenue.
In essence, data exploration in this course lays the foundation for more advanced data analytics by ensuring a solid understanding of the available data, its structure, and its basic characteristics through the use of SQL and database exploration tools. This initial phase is crucial for formulating meaningful analytical questions and developing effective SQL queries for deeper insights.
SQL for Time Series: Date Calculations
Based on the sources, this SQL course includes a chapter dedicated to date calculations, emphasizing their importance for time series analysis. The course covers several key date and time functions and keywords:
DATE_TRUNC() Function: This function allows you to truncate a timestamp down to a specified level of precision, such as year, quarter, month, week, day, hour, etc.. For example, DATE_TRUNC(‘month’, order_date) extracts the month and year from the order_date. The output of DATE_TRUNC() is a timestamp, which can be cast to a DATE data type if needed using the ::date operator.
TO_CHAR() Function: This function provides a flexible way to format date and time values into strings based on various format patterns. You provide a timestamp or date and a format string to specify the desired output. For instance, TO_CHAR(order_date, ‘YYYY’) extracts the year, and TO_CHAR(order_date, ‘MM-YYYY’) extracts the month and year in the specified format. TO_CHAR() offers more customization compared to DATE_TRUNC() as you can combine different components in your desired order.
DATE_PART() Function: This function extracts a specific component from a date or timestamp, such as year, month, day, hour, minute, second, etc.. The syntax involves specifying the part you want (e.g., ‘year’, ‘month’) as a string and then the source date or timestamp. For example, DATE_PART(‘year’, order_date) extracts the year. The source mentions that the output might include decimals, which might not always be desirable.
EXTRACT() Function: Similar to DATE_PART(), EXTRACT() also retrieves a specific component from a date or timestamp. However, the syntax is slightly different: you specify the part (e.g., YEAR, MONTH, DAY) as an uppercase identifier followed by the keyword FROM and the date or timestamp. For example, EXTRACT(YEAR FROM order_date) extracts the year. The course prefers EXTRACT() over DATE_PART() for components like year, month, and day as it typically returns integer values without unnecessary precision.
CURRENT_DATE: This keyword returns the current date at the time the query is executed, based on the server’s time zone or a specified time zone (though specifying a time zone is optional if the default is sufficient).
NOW(): This function returns the current date and time (timestamp) at the moment the query is executed.
INTERVAL Keyword: The INTERVAL keyword is used to represent a span of time, which can be defined in units like days, months, years, hours, etc.. You can create an interval by using the keyword INTERVAL followed by a value and a unit (e.g., INTERVAL ‘5 year’, INTERVAL ‘6 month’). Intervals can be added to or subtracted from dates and timestamps for date arithmetic.
AGE() Function: This function calculates the difference between two timestamps or dates, returning the result as an interval. The order of the dates matters; AGE(end_date, start_date) will yield a positive interval. You can then extract specific components from the resulting interval, such as the number of days, using the EXTRACT() function.
The course also demonstrates how these date functions are used in conjunction with other SQL clauses for analysis:
Filtering Dates with WHERE Clause: Date functions are commonly used in the WHERE clause to filter data based on specific date ranges or conditions. Examples include filtering orders within a specific year using EXTRACT(YEAR FROM order_date) = 2023 or finding orders within the last 5 years using order_date >= current_date – INTERVAL ‘5 year’.
Grouping by Date Components with GROUP BY Clause: Functions like DATE_TRUNC() or TO_CHAR() are useful for grouping data by specific time periods, such as monthly sales by grouping on DATE_TRUNC(‘month’, order_date) or TO_CHAR(order_date, ‘MM-YYYY’).
Ordering by Date with ORDER BY Clause: Dates can be used in the ORDER BY clause to sort results chronologically.
Calculating Time Differences: The AGE() function is used to calculate the duration between events, like the processing time of an order by finding the age between the order date and the delivery date.
The importance of dynamic filtering using functions like CURRENT_DATE and INTERVAL is highlighted, as it allows for creating queries that automatically adjust based on the current time, such as always retrieving data for the last 5 years.
SQL Course: Calculating Net Revenue
Based on the sources, net revenue in this SQL course is consistently calculated by taking into account the quantity of items sold, the net price of each item, and the exchange rate if currency conversion is needed.
Here’s a breakdown of the net revenue calculation process as described in the sources:
Net Price Defined: The net price is the price that the customer actually pays for a product after all applicable discounts, promotions, or adjustments have been applied. It is explicitly stated that the net price is less than the unit price due to these reductions.
Basic Calculation: The fundamental way to calculate the net revenue for a particular transaction is by multiplying the quantity of the product purchased by its net price. This can be represented as:
Net Revenue = Quantity * Net Price
Incorporating Exchange Rates: In the context of the “cantazo” database used in the course, the transactions may involve different currencies (e.g., pounds and US dollars). To standardize the revenue in a common currency (like US dollars, as the instructor prefers), an exchange rate is applied. The complete formula for net revenue used throughout the course is:
Net Revenue = Quantity * Net Price * Exchange Rate
This formula is used in various lessons when calculating total revenue, revenue by category, customer lifetime value, and for cohort analysis.
Example of Currency Conversion: Source provides an example where revenue figures are initially in pounds and then converted to US dollars by multiplying by an appropriate exchange rate.
Application in SQL Queries: The course demonstrates the use of this net revenue calculation within SQL SELECT statements, often with the result being aliased as net_revenue or total_net_revenue. This calculation is then used in aggregations with the SUM() function to find total revenues for different groupings of data (e.g., by order date, category, customer cohort).
Therefore, to calculate net revenue in the context of this course and the “cantazo” database, you generally need to multiply the quantity of the products sold by their respective net prices, and then adjust for currency differences by multiplying by the relevant exchange rate. The course emphasizes that the net price already reflects any discounts or adjustments, representing the actual amount charged to the customer.
Customer Segmentation: A Data-Driven Approach
Based on the sources, customer segmentation is a key data analytics concept that involves dividing customers into distinct groups based on shared characteristics or behaviors. The goal of customer segmentation is to enable more targeted analysis and tailored strategies for different customer groups.
Here are the main aspects of customer segmentation discussed in the sources:
Definition: Customer segmentation involves taking large datasets and breaking them down into smaller, more manageable pieces to analyze different behaviors within those groups. This allows for a deeper understanding of customer behavior and facilitates more effective decision-making.
Methods using CASE WHEN Statements: The course emphasizes using the CASE WHEN statement as a fundamental tool for customer segmentation. This allows for the creation of new columns that categorize customers based on specified conditions.
Simple Binary Segmentation: Customers can be segmented into two groups, such as “high value” and “low value,” based on a single criterion like net revenue threshold (e.g., above or below $1,000).
Segmentation by Multiple Conditions: More advanced segmentation can involve multiple conditions using the AND operator within a CASE WHEN statement. For example, segmenting customers based on a combination of the year of purchase and whether their net revenue is above or below the median.
Segmentation into Multiple Tiers: Customers can be divided into more than two segments (e.g., low, medium, high value) using multiple WHEN clauses within a single CASE block. This allows for a more granular understanding of customer value.
Segmentation based on Net Revenue and Spending: A primary method for customer segmentation in the course involves analyzing customer spending, often using net revenue as the key metric.
Segmentation using Percentiles: The course demonstrates segmenting customers into tiers (low value, mid value, high value) based on their total lifetime value (LTV) using percentiles (25th and 75th). Customers falling below the 25th percentile are categorized as “low value,” those between the 25th and 75th percentiles as “mid value,” and those above the 75th percentile as “high value”.
Purpose of Segmentation: The primary goals of customer segmentation highlighted in the sources include:
Identifying Valuable Customers: Understanding who the most valuable customers are based on their spending or LTV.
Targeted Marketing: Enabling businesses to target specific customer groups with marketing campaigns tailored to their needs and value.
Analyzing Group Behavior: Facilitating the analysis of different customer groups to understand their spending habits, retention rates, and other key behaviors.
Developing Business Strategies: Providing insights that can inform business decisions and strategies for customer engagement, retention, and growth.
Implementation in SQL: The process of customer segmentation typically involves:
Calculating relevant metrics like total net revenue or lifetime value.
Using CASE WHEN statements to create a new column that assigns customers to different segments based on defined criteria.
Aggregating data by these segments to analyze their characteristics, such as total revenue contribution, customer count, and average value.
In summary, customer segmentation as taught in this course is a crucial analytical technique leveraging SQL’s conditional logic and aggregate functions to categorize customers based on their value and behavior. This process allows for a more nuanced understanding of the customer base and enables businesses to implement more effective and targeted strategies.
SQL Query Optimization: Techniques and Analysis
Based on the sources, query optimization is an important topic covered in the second half of this SQL course. The goal of query optimization is to improve the performance and efficiency of SQL queries, making them run faster and consume fewer resources.
Here’s a breakdown of the key aspects of query optimization discussed in the sources:
Understanding the Execution Plan with EXPLAIN: The course emphasizes the use of the EXPLAIN keyword to understand how the database plans to execute a query.
EXPLAIN shows the execution plan without actually running the query. This plan details the steps the database will take, such as table scans and joins.
EXPLAIN ANALYZE goes a step further by executing the query and providing actual execution times, planning time, estimated costs, number of rows processed, and other statistics. This allows for a more precise understanding of query performance. The output of EXPLAIN ANALYZE can help identify bottlenecks in a query.
The output of EXPLAIN and EXPLAIN ANALYZE includes details about the cost (an arbitrary unit assigned by PostgreSQL), the estimated number of rows, and the width (row size in bytes) for each step in the execution plan.
DBaver provides a feature to visualize the execution plan, offering another way to understand the query execution flow.
Query Optimization Techniques: The course covers various techniques to optimize SQL queries, categorized as beginner, intermediate, and advanced.
Beginner Techniques:
Using LIMIT: Employing the LIMIT clause to restrict the number of rows returned can significantly reduce query execution time, especially when dealing with large tables and only a subset of data is needed. The sources demonstrate a substantial decrease in execution time when LIMIT is used.
Being Selective with Columns (SELECT Specific Columns vs. SELECT *): While PostgreSQL might sometimes efficiently retrieve data using SELECT *, it’s generally recommended to select only the specific columns required for the analysis. This practice can be more efficient, particularly in large databases, by reducing the amount of data that needs to be processed and transferred.
Using WHERE Instead of HAVING: Filtering data using the WHERE clause before aggregation (with GROUP BY) is generally more efficient than filtering the aggregated results with the HAVING clause. The WHERE clause reduces the number of rows that need to be processed by the aggregation step.
Intermediate Techniques:
Minimizing GROUP BY Operations: Reducing the number of columns in the GROUP BY clause, especially if grouping by columns that have repeating values within the context of the aggregation, can lead to performance improvements. The course demonstrates that removing an unnecessary column from the GROUP BY clause can decrease execution time. In cases where seemingly redundant GROUP BY columns exist, using aggregation functions like MAX() on those columns can help minimize the GROUP BY while preserving the desired information.
Reducing JOIN Operations: Minimizing the number of JOIN operations and considering the type of JOIN used can impact query performance. Using functions to extract necessary information instead of joining tables can sometimes be more efficient. Additionally, using more specific join types like INNER JOIN when all matching records are expected in both tables can be slightly more performant than less restrictive joins like LEFT JOIN.
Optimizing ORDER BY Clauses: Optimizing the ORDER BY clause involves limiting the number of columns being sorted, avoiding sorting on computed columns or function calls if possible, and sorting by the most selective columns first. Utilizing database indexes for sorting can also be beneficial, although index management is typically handled by database administrators.
Advanced Techniques:
These include using proper data types, implementing indexing on frequently queried columns to speed up data retrieval, and employing table partitioning for very large tables to improve query performance. These techniques are often managed at the database administration level.
By understanding and applying these query optimization techniques, along with analyzing the execution plan provided by EXPLAIN and EXPLAIN ANALYZE, users can write more efficient SQL queries that perform better, especially when working with large datasets.
SQL for Data Analytics – Full Intermediate Course
The Original Text
data nerds welcome to this full course tutorial on intermediate sql for data analytics this is the course for those that understand the basics of sql but want to take it to the next level perfect for those that took my first course on this now to master this tool we’ll break down more advanced sql concepts in short 10-minute lessons during this you’re going to be working right alongside me completing realworld exercises following each lesson you’ll have the option to do interview level practice problems to not only prep you for the job but also reinforce your learnings and by the end of the course we’ll have used sql to build a fully customizable portfolio project that you can share to demonstrate your experience now sql is by far the most popular tool of data analysts for those in the united states it’s the top skill that’s in almost half of all job postings for data analysts and this only increases in demand for senior data analyst roles coming in at two out of every three job postings now in related data jobs like data engineers it’s almost the same appearing in two of three job postings and for data scientists it’s in almost half now sql or sql is the language used to communicate between you and a database it’s my mostused tool as data analyst starting with my first job working for a global fortune 500 company and even to my most recent role working with mr beast yeah even jimmy uses it this tool is so imperative i use it all the time with python excel powerbi and tableau to connect to my databases so over the years i’ve been cataloging everything i found helpful with using this tool and i put it all into this course now you’re probably wondering who is this course for well if you’re unfortunate to take my first course here’s some items you should definitely know keywords used for data retrieval functions and keywords used for aggregations and grouping the different types of joins and also unions keywords used for logic and conditions along with date manipulation data schema control and finally subqueries and cte now as far as the math required to take the course if you have a secondary education such as high school in the united states you have the requisite knowledge to take this we’re going to be doing at most just some basic algebra and statistics now let’s get into the course structure we’re going to be breaking this down into two halves in the first half we’ll have an intro that will get you set up and comfortable with the database we’ll be using throughout the entire course next we’ll jump right in pivoting data using case statements we’ll be transforming and analyzing data using aggregation and also statistical methods then we’re going to get into intermediate date and time functions because frankly you can’t get away from date and time data in databases we’ll then wrap up the first half covering window functions the most requested topic i’ve gotten by far on this covering basic and also complex aggregations now for the second half of the course we’re going to shift gears we’re going to not only install postgress on your machine but also we’re now going to be working in these lessons to building our portfolio project we’ll start by setting up the database locally and installing a top editor for running sql queries with this environment set up we’ll build our first view and this will actually help us solve our first portfolio problem after this we’ll transition into learning the most popular functions to transform messy data to solve our second portfolio project problem and then we’ll wrap all this up with query optimization understanding how to use keywords like explain to optimize queries so by the end of this you’ll have a real world project to showcase your newfound skills and demonstrate your experience now i’m a firm believer in open sourcing education and making it accessible to everyone so this course is completely free i’ve linked all the resources you need below including the sql environment and all the different files you need to run the queries remotely and locally oh and also include my final project that you can also model after now unfortunately youtube isn’t paying the bills like it used to so i have an option for those that want to contribute and thus help support fund more future tutorials like this for those that use the link below to contribute you’re going to get some more additional resources specifically after each lesson you’re going to have access to interview level sql problems it will not only reinforce your learnings but also prep you for job interviews in here you’re going to get community access to be able to ask any questions to fellow students along with access to the queries and notes behind each lesson so you can follow right along as i go through it finally at the end you’ll receive a certificate of completion that you can share to linkedin now for those that have bought those supporter resources you’re going to continue watching the course here on youtube but then you can go to my site to actually work through all these problems access to the notes and access to community all right we’re about to get into the first lesson before we do that i want to cover some common questions and answers specifically we’re going to start with this one first what database are we even using well every year stack overflow interviews a bunch of nerds to find out what are their top technologies that they’re using and 50,000 chose that postgress was their top option to use and to use over the coming year and according to this visual it’s not only the most admired it’s the most desired to learn database so for this course we’re going to be using and learning with postgress now that we know the database how the heck are we going to be running these sql commands well as i mentioned previously this course is broken into two halves and we’re going to be using an option for the first half that gets you up and running quick specifically we’re going to be using google collab which is a free option and it allows us to have an environment that we can not only load the database in but also query it i’ve linked this notebook below and includes all the code necessary to install this database and get into querying it now don’t worry if you haven’t used collab before i’m going to break it down all in the next lesson which for those that bought the course perks you’re going to get access to these lessons which are in a jupyter notebook for the second half of the course we’re going to shift gears and we’re going to install postgress locally on your computer and run all the queries from there we’re going to get you set up with pg admin which is postgress’s custom guey in order to interact with databases but from there we’re going to get you set up with the most popular database editor dbaver which is used by over 8 million users and this is where we’re going to be running our queries and i like this editor because it’s not only free but it also connects to a host of different databases so whatever you use and learn in this course with this editor you can apply to other databases now that we know the database and the editor what data set are we going to be using for this well i present to you cantazo and this is a data set created by microsoft used to imitate real business data jumping back into dbaver we can see the erd or entity relationship diagram and this shows how the data set revolves around sales data that’s the fact table and then we also have four dimensional tables that relate to it this is going to be great for analyzing business transactions in a real world scenario we’re going to go over everything you need to know for this after our google collab lesson now that we got that out of the way let’s get into some resources you have available starting with those that have decided to support the course first i’m going to walk you through how to get access to the course notes which detail all the different topics and code that i use within each lesson and next you’re going to have access inside of the course platform to interview level sql practice problems after each lesson i’m going to provide you with a bunch of different practice problems that range in difficulty for you to go through and test your skills if you get stuck feel free to jump in the comment section below and talk with other students in the course speaking of help how the heck do you get help in this course well you could jump into the youtube comment section and hope somebody comes and actually answers your question or you can get a really quick answer going to a reputable chatbot like chatgbt i use this bad boy all the time with my coding issues and it gets you an answer quick all right next question well isn’t really a question it’s more of a statement people tell me all the time luke this video is too long i can’t navigate it well unfortunately i think you don’t know how first of all i include chapter markers for all the different lessons throughout this the next is keyboard shortcuts i like to use j and k in order to jump forward or backwards 10 seconds and then finally if you need more precise navigation you can just click and drag up on the navigation bar of the video itself and then you can do precise seeking pretty cool all right last question who helped build this course and i’d be remiss if i didn’t give a shout out to kelly adams she was the brains behind putting together the lesson and also a lot of the practice problems for this this course wouldn’t have been possible without the help of her all right let’s get into the first lesson all right in this lesson we’re going to be going over how we’re going to be running sql queries in this first half of the course using google collab which is a type of jupyter notebook so link below is a blank notebook and opening up it’s not fully blank but it’s blank enough to actually get started with writing sql queries let’s do a quick demo of how we’re going to use this for sql queries first i need to run this cell up top and it’s going to give me this warning that hey this notebook was not authored by google it’s fine it’s run anyway it’s from me you can trust me it should take about 40 to 50 seconds to run this cell which we’ll go through more later in this video basically it’s loading the database and getting it set up for us to actually use and now run sql commands so inside this code cell let’s provide a command so we’re going to begin by writing our command underneath this percent sql syntax right here at the top and i’ll provide this query looking into the sales table looking at those top 10 options i can run it by pressing this play button or pressing shift enter in less than a second i have all the different results pictured below if i want to run another cell i can just come underneath it click code make sure that i add that percent sequel to the top of the cell it’s not going to work otherwise and then run my next command that i want to right underneath this all right with that out of the way we’re going to now dive deeper into understanding what is google collab what are jupyter notebooks how to actually use these to run sql queries and what the heck is going on with all that code that i had in those cells now if you have familiarity with using google collab already or already confident in using jupyter notebooks and you feel like any of this that i’m going to cover is not relevant to you it’s fine go ahead skip to the next lesson this is more focused on those that don’t have any background with using jupyter notebooks so let’s start with jupyter notebooks here i have a jupyter notebook of this actual lesson inside of vs code don’t worry you don’t need to actually open inside vs code just showing this for demonstration purposes now personally i love jupyter notebooks for performing analysis because not only can i have these text cells like are pictured right here and then scrolling down even further i can see that at the bottom i have a sql cell along with the sql output so i love these because i can use sql to extract out and analyze the data i need and then if needed use something like python to visualize it now moving into google collab which you can see right here i’m inside my web browser and this is that same exact file that i had inside of vs code but now it’s here inside of google collab and similarly it has that same functionality where i can write python code in cells along with using that sql and the outputs of that below it i really like google collab because it makes it super easy to share and collaborate with others this isn’t just a static document i can come in here and actually run all the different cells inside of this notebook and if somebody wanted to they could come in and modify this query further so right now this one’s only looking at years let’s say we wanted to look at the actual total revenue i could just add this line in the command run it and get the results right below so super easy to collaborate with others and now you may be wondering why are we actually using collab for running these sql commands well basically this code right here that i have in this cell that we’re going to cover i promise allows us to load in our database and for you to have access to the database immediately without having to actually install it locally on your own computer so basically we can get up and running with running all these different sql commands really quickly let’s start with a blank notebook to walk through this process of understanding how to use notebooks if you navigate to collab.resarch.google.com this is where we’re going to start a new notebook and it will have prompted you to log into google at this point anyway go and click this so it starts this new one which gives the title untitled zero you can go up up here and actually change it and i’ll change it something like collab 101 quick overview before diving into the center portion right here we have a typical menu up at the top to do a bunch of different options and then we also have this sidebar over to the left hand side that gives us a lot of different options as well in the center here is where the actual notebook is itself and i can do things like either add a code cell or a text cell if i wanted to i could type into it this is a text cell i can also change the formatting of it by highlighting it and toggling it to being a heading they also have multiple other options available as well whenever i’m done with this all i have to do is press shift enter and whenever i press shift enter it then starts another cell a coding cell below that now in collab these are exclusively python cells we have to do some magic if you will in order to get it to run sql but you may not realize it but you actually know some python even if you don’t know it i could do something like 2 + 2 press shift enter and what’s going to happen here is it’s going to run the cell of 2 plus 2 and then we get the results of four now if i don’t need certain cells like this one up at the top i just click into it and click the trash can similarly i can do it to this one down below now let’s go over these menus and for this i’m going to be demoing it using the actual lesson plan notebook from this cuz it makes it more interactive to show actually the capabilities of it anyway over on the lefth hand side if i click this over on the left i have the table of contents based on how i formatted all the different lesson notes i can actually scroll through and see all the relevant topics if i wanted to find something i can just go in here type fine of markdown and as expected take me to all the different markdown things inside of here we also have other things like variables secrets and also files that’s more in depth if you’re using python for this you won’t really need to use that along with these three at the bottom also not going to be using as much in this sql course now up at the top in the file menu right here file edit view insert everything like that’s normal runtime is the one location that i find i’m actually using the most and find the most important anytime i’m opening a notebook i’m going to be doing this run all and i can also see that i can do this with the shortcut of command f9 and this will go through and actually run all the cells down here at the bottom it gives you a status update of what’s going on along with the time it’s taking so far now scrolling through all these different cells i can see they all executed properly but sometimes we run into bugs and they’re not running properly in that case we can come up here into runtime and i recommend running this of just restart session and run all it will prompt you if you need if you really want to do this and yes go ahead and do it basically clear everything out and run it again that’s only if you’re having problems you shouldn’t but if you did now you know in this last section of the lesson let’s understand what is going on with how we’re running sql queries inside of this notebook for this i want you to actually open up that blank sql notebook and load it into your window and you can follow along with me if you haven’t done it already go ahead and up at runtime click run all so i mentioned earlier all of this code here which is in python goes through and actually installs and sets up your database we’re going to walk through it really quickly but the important thing to understand here is not necessarily the code or that you need to code it yourself it’s mainly understand what’s going on behind the scenes first it goes in and imports some important libraries that we need for this next if it’s in collab which we’re in we go in and install postgress so postgress is actually running inside of this environment that we’re inside of it goes through and sets up a user a password and then from there actually installs the database itself which you can get at this link right here from there we import in a sql library in order to be able to run sql commands specifically it’s called gps sql and then with this jubsql we go ahead and load the extension actually connect to this database that we loaded in from up above and do some other fancy things that help us get formatting and everything else set up properly so similar before below this magic command of percent sql i can write a sql query as i’m writing these sql commands you should have autocomplete come up so in this case i have select if i want to use it all i have to do is press tab and then once i have everything i need there once again i can press shift enter now that magic command is really important if i were to copy this paste it below one i get all of this highlighting saying that it’s misspelled and that they have syntax errors and then two when i actually try to run it i get actual syntax errors so very important that you put these magic commands up at the top now so people don’t think i’m crazy magic commands are the actual official language of this and we’re not only limited to sql magic commands they also have a host of other ones let’s say i want to use this one of time it where it measures the execution time of the next line of code i could type the magic command of percent time it some help pops up of what actually is going on with this module that we’re actually using here which is pretty actually useful and then underneath it i can put some python in here i’ll just do something simple like 2 plus 2 running this pressing shift enter we can see that this special command provides the time of this took 9.93 nanconds now with these magic commands you can also use just one percent sign and that means it applies only to the line that it’s currently on so in this case i could do 2 + 2 on this line press shift enter it’s still going to run it which in this case looks like it’s a little bit faster but if i actually had only one percent sign and let’s say this is on another line pressing shift enter it’s just going to output the four and it’s not going to actually time it it’s not until i actually use two of the percent times and run it that it will actually time it and now we’re back up to 9.85 nconds so i’m just reinforcing this because it’s very important that you remember to do that percent sql before any sql command now if you’re nerd like me and you want to dive deeper into the documentation of jubsql because of the brains behind that sql magic man you can a link below all right for those that purchased the supported resources you now have some practice problems to go through and get more familiar with how to use jupyter notebooks and sql queries together in the next lesson we’re going to be diving deeper into the database to understand all the different tables and what comes along with it with that i’ll see you in the next one in this lesson we’re going to be getting an intro into the database that we’re going to be using for the entirety of this course specifically the contazo database for this we’re not only going to explore why we’re using this data set but also the components about it exploring all the different tables using things like the erd or entity relationship diagram now we’re going to use this lesson as a warm-up to get ready to get into using intermediate sql so during the course of this i will be covering different past topics that you should know in order to get you up to speed as fast as possible if you haven’t used sql in a while by the end of this we’re going to be covering a query from scratch in order to dive in to the most popular tables while using google’s collab and some additional ai features to speed up your workflow now the contazo database that we’re going to be using for this is based off of a data set from microsoft which they’ve been using for years whenever they launch any products specifically sql products in order for you to explore how to use the functionality of it anyway this database is really robust because it contains a lot of different information in it such as sales transactions product information store details and even date and time data and this database is great because it allows us not only to explore all these different intermediate sql topics that we’re going to be using for this but also it’s based on a real world business set of data so what you’re going to learn in this course you can apply to the real world and so you may be like luke how the heck do i get this database installed well if you remember from the last video we have this python code up at the top that actually goes through and installs the database the database or the sql file for loading it it’s located at this link and we go through this script right here in order to load it in to this collab notebook which i’ve conveniently linked a blank notebook below that you’ll be able to follow along any of the lessons with so this diagram shows via these lines between all these different tables how they are actually related and there’s actually a lot of columns inside of these tables themselves so we put these ellipses at the bottom to basically signify or symbolize all the different columns that are in it so let’s get into breaking this bad boy down we have a total of six tables in this contazo database specifically our main fact table is the sales table and this contains all of our quantitative business metrics that we’re actually going to be analyzing and inspecting as we go throughout the course so it’s probably the most important table you need to know then we have four related tables or commonly known as dimensional tables these things have descriptive attributes that we can use in our analysis so for things like store we relate it using the store key and the sales and then stores table and the store database has information on well the stores similarly been said about the product and also the customer table the date table is slightly different in that it relates to the different dates specifically our order and delivery date the last table in this is that currency exchange table and it’s not related at all to our fact table and we’ll show why in a little bit now you may be wondering how can you actually go through and see what this database looks like and understand what are the tables in it well we’re going to be exploring tools later in the course specifically this is pg admin right here where i can visualize that erd and it shows how our fact table of that sales table is related to all those other different dimensional tables additionally it’s pretty nice because i have this kazo 100k and i can go into something like schemas and then down to tables and further i can further explore all these other different tables as well even looking at things like the columns for the sales table but we’re getting ahead of oursel we’ll learn how to do that in a bit i’ll teach you some shortcuts on how you can actually do something similar to this in collab so let’s get into running some queries first thing we need to do is go through and actually run all the different cells in your notebook basically get that database loaded into our environment and so we’re looking to explore what are the tables in this database that we just loaded into here i’m going to use gemini for this if it’s your first time using this ai model from google it’s going to prompt you with this privacy notice make sure you click continue and i can prompt it this of what sql query shows the tables in a database what we can do is access all these different table names by looking in information schema which is a meta database and specifically using the data attribute the looking at the tables within it which is a table now for this you can either click copy cell or you can do add code cell now remember we’re going to have all the syntax highlighting issues because we’re not or we don’t have in that magic command we need to put at the top specifically that percent sql so i’ll just copy that from here paste it up here and then run this bad boy so from this we confirm we do have six tables in our database and if i wanted to i can convert this data frame to an interactive table like this and then we also have this option to visualize it which we’ll be doing down later down the road so let’s explore first that sales table as that’s the most important part of this whole puzzle i want to see all the different columns of this so i’m going to use select and then star we’re going to do this from that sales table now if you’re noticing right now i have some autocomp completion happening right now you see i typed sales and i have this underscore fact underneath it this is the ai autocomp completion especially whenever i’m learning how to use sql i don’t find this very helpful and actually quite distracting so we can turn this off real quick if we go into open settings and select under ai assistance we can uncheck this option here for show ai powered inline completions whenever we close this we can see no longer pops up now this is good enough query as is but anytime i do a select star type thing it’s very resource inensive especially if there’s a lot of columns and rows so with this i’m going to limit this to just the first 10 rows then from there press control enter also don’t need this query off to the side so i’m going to close that so with this sales table we can see that it has all those different relations to those other tables such as the dates customer key sore key and product key from there we have information on what is actually sold in this sale specifically the quantity the price the cost and then also the currency used and its exchange rate in our example at the end of this we’ll go through calculating what is the net revenue and how we need to actually multiply or use all this together to calculate that let’s get through exploring these tables specifically we’re going to go with the easiest one first next of exchange rate if you recall our currency exchange table is in no rel way related to that sales fact table but what the heck is in it well exploring it we can actually see that in it it has a date column from currency to currency and exchange rate basically it at a specific time in history it tells you how you could convert a currency from one to another what the rate you need to use that now conveniently our sales table automatically just includes this exchange rate which was calculated from this table so technically this table is only needed if you need to go back and dive into understanding the exchange rate and how it’s trending over time all right we have four tables left and they’re all the dimensional tables that related to our sales table let’s start with store first this one is related to that sales table on a store key and then has information on where this is located such as country name of the store even the size next up is our product information and it’s related to that sales table on that product key it has information on the product specifically what is the name who’s the manufacturer how much it even weighs and what categories and subcategories it falls into next up is our customer table it’s related to that sales table on our customer key and this has a bunch of information related to the customer itself like where they’re located what their name is what their birthday is blah blah blah anyway what you notice right here in the middle is we have these ellipses and that’s because there were so many columns in this it didn’t show it here now previously whenever we were looking for the tables in the database we could run this on that metadata inside the information schema so what i’m going to do is actually i’m going to take this right here command c this and i’m going to paste it right into here but for this we don’t want to use tables we want to use columns running shift enter on this this only gives us table name information so i’m actually going to change this to select star run shift enter so we can see everything available in this query and inside of here i can see of this columns table we have a table name and column name so what i can do is i can now filter this for the table of customers so i’ll specify where table name is equal to customers running this again pressing control enter got a typo it’s customer running control enter again so now i have a way to view all the different column names and it’s not cut off and so we can see everything inside of it but not really finding anything that great right here for now but other stuff we’ll use in the future last table to explore is that date table this is related using that date column here to the sales table order date and delivery date now this table has a lot of different ways that you could aggregate all the different date data in here by looking at maybe day of week or month or year so this is great and all especially if you’re using a tool like powerbi and you want to just grab something quickly in order to filter maybe for january 2015 data but in this course we’re going to be diving deeper into using different date functions and so we’re not really going to rely on this table at all to get the data out because you won’t always have a date table available in order to investigate things so basically just ignore this bad boy now let’s wrap this lesson up by getting into an investigation of how we can use all these tables together for a common example so let’s say my boss who’s not so good at sql comes to me and wants to get some different revenue data that has information about customers and also products they’re ping purchasing and whether they’re of different high value or low value items so we’re going to walk through this example calculating the net revenue for this and how we can put this all together using all the different tables first thing we need to do is calculate net revenue so let’s look back at that sales table we’re going to use that same query as below and we get this table that we saw previously now for this how do we want to actually calculate that net revenue well in order to do this we need to use the net price now you’ll notice from this the net price is less than the unit price that’s because the net price is the price after all the different discounts promotions or any adjustments so basically it’s what we actually charge to the customer when they pay for the product and with this net price we need to multiply it times that quantity so what i’m going to do is put a comma here go to a next line and say we want to multiply the quantity times the net price and we’ll say this we’ll label this as the net revenue now when i name variables or when i name new column names i’m going to put this underscore between it i just find it easier to read the naming convention that kazo’s database is using and looks like i have a typo which is pretty good that we hit this right now because this is how i’m actually going to go through and troubleshoot this first it will tell me that there’s this runtime error anytime i’m running a query you can just ignore that you’re going to be seeing that all the time but it actually points this carrot here to where the issue is specifically it’s point to this line and it has to deal with quantity is not spelled correctly at all running this again pressing controll enter okay we actually have it now and over to the side we have that net revenue double checking this it looks like all the numbers are actually getting calculated correctly there’s one last step to do and that is we need to convert it to a common currency right now you can see that they’re using pounds here and then us dollars below basically we’re going to use it all the same i’m in america so we’re going to be using us dollars all we have to do for this is just multiply by the exchange rate gone ahead and added it in and now we can see that it is in fact adjusted for what it needs to be now we’re going to be adding customer and also product information using this customer key and product key but this table’s already getting sort of large so i want to condense it down to different columns i’m for sure i’m going to use and really the only other thing i care about is order date so we’ll go ahead and simplify this table down to this next we’ll move into our second of five steps and we want to filter for our recent sales specifically we want things from 2020 and greater for this i’m going to use a wear clause and i want this for that order date that we have in that sales table to be greater than or equal to january 1st 2020 now let’s go ahead and try to run this and it looks like it works now i would say in order to be safe if you’re ever working with date data that you’re not sure if it was converted to the date type specifically in postgress you can use this colon operator and then specify the data type you want to use for this in this case date so order date is getting converted or cast as a date this is going to work just the fine but just a tip for you all right next thing my boss wants added in is the customer info about who ordered that order now in order to do this we need to use a join and there’s four major types of joins left join right join inner join and then full outer join in our case we want to perform a left join because table a is our sales table and we want any related data to that a table in the sales table returned from that b table or customer table so let’s add this left join we’re just going to go between from and where i’ll add in a left join we want to do this on the customer table we’re going to give it the alias just c to make it easy similarly i want to give sales an alias as well i’m actually going to bring this down and then indent this over i’ll give this the alias of s now for this left join we want to do this on from the sales table we want to use that customer key and then from the customer table we want to use customer key so we’re going to use good actual naming conventions here i’m going to add that s dot to the front of order date along with the front to quantity net price and exchange rate now i’m going to go ahead and run this to see if it actually works and it looks like it works we don’t have anything from the customer table i’m going to go ahead and add in all the different columns by doing basically a c.ar notation to bring all those in all right so from this list i can see there’s a few different columns we want that my boss has told me about specifically we want to get the given name or first name surname country full and then also the continent that they’re from all right second to last step we need to add that product information in and similarly we’re going to be forming a left join with this we’ll give that product table an alias of p and we’ll be connecting it on the product key of the sales table and the product table once again i want to see everything from that product table so i’ll do p.star running this control enter i can see that we connected it properly with all the different product information once again i don’t want all the different columns associated with this only want to select few does my boss specifically these four columns of product key product name category name and subcategory name all right so looking pretty good with this only one last step to do and we’ll have all the information we need specifically we want to look at whether a customer is high value or low value looking at the net revenue we want to basically bin these customers into whether they’re spending less than $1,000 or greater greater than $1,000 in order to accomplish this we need to use a case when statement and we’re going to add it in at that last column right here so we’ll say case when we want to look at the net revenue but we can’t use an alias inside of the select statement because it’s not necessarily defined yet so we just need to take all of this below paste it in here and say greater than 1,000 and in that case we want to say that it is high else we want to say that it’s low so we can end this and then we’re going to use the alias for this of high low real original i know let’s run this pressing control enter inspecting it we can see that our formula is working for those values that are greater than a th00and we’re marking it as high so this has everything that we need in it for my boss remember right now we’re doing this limit 10 we actually need all the different data in it so i’ll go ahead and press play looks like we have 124,000 different rows in this and if i want to export this to my boss i could click this here in order to convert this into this type of table but what’s really convenient about this is i can now copy this entire table which allows us to either export to a csv json or even markdown csv is most common so i’ll use that all right so now that’s our initial dive into this kazo data set we now have some practice problems for you to go through and get even more familiar with this data set working through some problems in the next lesson or the next chapter we’re going to be diving into using the case statement in order to pivot data super exciting all right with that see you in the next chapter welcome to this chapter on pivoting with case statements and specifically we’re going to be using statements like case when and aggregation in order to pivot data but what the heck is pivoting data let’s take a look at this simple example focusing on that first table first typically our data comes in a long format and in this case we have an example of a columns of date category and sales where we have different categories of a and b it’s very common to pivot things such as on the category here of a and b so that way we get to more of a wider format as shown below this is not only easier to read and understand and analyze but also easier to visualize which we’ll be doing in this so what will we be covering in the lessons in this chapter in this lesson we’re going to be focusing on understanding the basics of using aggregation methods such as count and sum in order to pivot data we’ll use count to analyze the number of customers per region and then we’ll use sum to calculate the net revenue based on different categories in different years in lesson two we’re going to build this up further and start looking at statistical functions such as min max median and average for this we’ll get into an example of calculating what is the median sales across categories then finally in lesson three we’re going to jump into advanced use cases of case statements specifically we’re going to be looking at things like segmentation we’ll learn how to analyze by multiple and conditions in order to look at things like bucketing for certain years based on revenue and then similarly we’ll use multiple when conditions in order to analyze different bucketing of revenue tiers and see how they apply across different categories now i just showed a bunch of visuals and the goal of this course is not learning how to build or make visuals which i will show in this but really i want to be able to show that hey with these insights that we’re gaining you can take it a step further and visualize it all right with that let’s get into it in this first example we’re going to do a review understanding count but also distinct count in order to calculate the total number of customers per day in 2023 this will be the final table that we end up getting as always if you want to follow along open up that blank sql notebook and run all the cells in it so we can get started so remember we want the total number of customers per order date we can uniquely identify this based on the customer key so add a select statement from there i’ll add order date followed by customer key and then we’ll get this from that sales table let’s start with this first so as we can see from this we have this is the first of 2015 we have duplicate customer keys but then we also have a bunch of different ones we’re going to start simple first we’re just going to do a count of all the customer keys so i’ll wrap customer key and count and provide it the alias as total customers let’s run this and this isn’t going to work right because well we need a group by right so adding that group by statement we’ll add in we want to do this by that order date and then run this again all right so now we have the order date by total customers right now i’m noticing that the dates are not in order so i’ll add in an order by order date and not too bad but remember previously whenever we were actually looking at it we could see that customer key is actually duplicated we want to find the unique customer so we want to use something like distinct so going back up into our original query all i’m going to do is add in distinct in here and then run control enter and now those numbers are going to drop right because they have a we’re going to remove all those duplicates last thing we need to do for this one is just add a wear condition for filtering for dates in 2023 so i’ll add an order date and for this i recommend using the keyword between so we don’t have to do that greater than less than all that kind of mess and then putting in between january 1st 2023 to december 31st 2023 running this we can actually check the contents yep january 1st to december 31st one quick note now on visualizing this you can use this button right here and actually select it to go through and draft different visualizations to try to understand what is going on here with the data what it will do is it will give you different previews in our case this is time series data so i know that’s the best choice to use for the visualization whenever i go to select it it will autogenerate all the different python code you need in order to visualize that data and then all you have to do is click add cell and then running this you can actually visualize it in more detail right below so that’s why i really like collab for this is because it has gemini implemented into it makes it super simple for you to just go forward and actually visualize this all right let’s now get into actually pivoting using count as an aggregation and for this we’re going to be looking at something similar from that last example understanding how many daily customers we have but broken down by region specifically three continents of europe north america and australia for this we’re going to be getting this final table where we have things like order date in the leftmost column and then we have the customers based on the different regions in their own individual column first things first though what continents do we actually have available inside of our database it’s underneath the customer table when we run this we can see as a previous report in got europe north america and australia so let’s go forward with actually adding this table into the query that we just made at that last example in order to do that we’re going to be performing very commonly a left join and that’ll be with the customer table with an alias of c and we’ll do this on our customer key and what we’ll need to do now cuz we have two tables in here we’ll need to assign an alias also to our sales table and then also to all those other different columns that come from the sales table running this to make sure that the error there’s no errors there are accidentally messed up order date run this again okay everything’s working fine now but now we need to create individual columns for total customers based on continent so how are we going to do this well let’s focus on this syntax right here we’re going to be using the count distinct that we use as we used previously and inside of this we’re going to be throwing in a case when statement it’s case when a condition then what the output we want it to be the column in this case and then end and then finally assigned an alias so i’m going to go ahead and copy this right here and i’m going to insert it in the next line underneath here but we need to go through and actually fill it out so the condition is we’re looking for if it equals a certain continent so for the continent from that customer table we’re going to see in this case if it equals to europe and specifically the column that we want from this is then that customer key so i’ll go ahead and put that in and this one will give the alias in this case called eu customers let’s try this bad boy out and bam now we have our european customers in here let’s go ahead and add the other two as well of north america and australia all right i got those in as well have the north america and then also the australian customers go ahead and run this and scrolling down we can see that based on the total customers the europe north america and australian they do add up to this line right here so that field of total customers is now somewhat redundant i’m going to go ahead and actually remove that and this will be our final query now visualizing this one similar to the last one this one i find especially has multiple different columns in it the visualizations it provide aren’t that good specifically it is here in these time series but it’s broken up to where this one’s europe this one’s north america and then this one’s australia they’re not all on the same graph so unfortunately gemini in this case is not that strong in producing graphs if you really want to visualize it and you want to know my method for it all you have to do is go ahead and click that table and then remember you can actually copy it and this is going to copy the table to your clipboard so this contents right here i want it as a csv i’ll go ahead and copy it and we’ll need to put it into some sort of document because it’s pretty long so you want to put into a document such as csv so on mac i’ll put in something like textit um on windows you’ll put into something like notepad i’ll just paste the contents in using commandv and then from there just save it inside your favorite chatbot in my case i really like chat gbt you could use gemini claude whatever i’ll give it the simple prompt with the actual document of visualize this as a line chart and then with it visualized we can actually i like going in this interact mode on chatgpt we can actually go through and you can see both the all three of these regions along with visualize if you want to download the graph you can just click it there last example for this lesson for this one we’re going to be looking at using the sum function with case when in order to look at what is the total revenue by category and we’re going to be using that case when in order to look at 2022 verse 2023 this is the final table that we’ll be creating where we have category in the leftmost column and then we’ll have the total net revenue for 2022 and then for 2023 right next to it now for this i don’t want to start from scratch so i’m going to do take this last query that we took right here and then paste into cell make sure it runs properly all right now for this i want to just start simple i want to first look at what is the total revenue by order date so i’ll just start by first removing we’re going to be done in 2022 and 2023 removing this wear clause also we don’t need this customer table so i’ll remove this as well along with these this count distinct that we did for all the different customers with it just simply like this i’ll just start run it and make sure yep everything’s appearing it’s got all the different order dates in it okay now let’s get the total revenue and that’s going to be done by using sum now if you remember from a couple of lessons ago you need three things for this quantity net price and then also exchange rate so i’ll add all three of them here using multiplication and i’ll assign this as the alias of net revenue um remember anytime we’re doing an aggregation need to have that group by let’s go ahead and run this not bad but right now we’re aggregating it by order date and we actually want to break this down based on the category just as a refresher you don’t need to run this query what we need is from the product table is right here this category name so one we’re going to need to merge this with our sales table and two extract out that category name so inside of our original query we’re going to go ahead and do a left join connecting in the product table with the aliasp and on specifically that product key we’ll go ahead and run this to make sure that it still executing properly okay good we didn’t bring anything from the product table in we need to do that now and what we want to do is replace this order date now but with category name and we have it in three places so i’m going to show you a shortcut real quick so what i’m going to do is i’m going to highlight all of this right now you can see it’s only selected on the top one i’m going to press on mac command shift l on windows you’d press control shift l and now all of these are when i press backspace all of them are removed and then as i type all of them get typed in super convenient saves a lot of time and then i’ll go ahead and run this again by pressing controll enter and bam now we have category name and net revenue for each we have this net revenue across the entire data set still need to filter down but this is pretty good so far now in order to do this bas it up on 2022 and 2023 we need to be using a similar type syntax that we used before specifically wrapped in our sum function we’ll use our case when when it meets a certain condition of 2022 2023 we’ll provide it the net revenue when it meets that condition else if it’s not that year it’s going to be zero so whenever we sum it all up you only sum up if it’s that year and then finally we’ll end it with an alias so i’m going to go ahead and just copy this all and then inside of here i’m going to paste it down below so our first condition is checking on whether the date is in 2022 so for this we’re going to be using that order date column and we want to check if it’s between a certain date specifically between january 1st 2022 to december 31st 2022 had a brain fart there for the column for this we’re going to be using what’s above here inside of our net revenue of all three of these columns multiplied together and then finally the alias we’ll provide it we’ll call it total net revenue 2022 let’s go ahead and run this and bam we have it for 2022 this is looking good let’s go ahead and put a comma on the end and i’m going to go ahead and copy this and then pasting it right below then i’ll just need to go through and update it to make sure that we’re now using instead of 2022 that we’re actually using 2023 make sure to also change the alias okay running this processing control enter we get almost our final results once again we don’t need that net revenue in there it’s not telling us what we need so we’ll go ahead and remove it and now we have our final table and from this we can see that for some strange reason from 2022 to 23 for all of these columns all the data went down that’s not really good i’ll leave that to my boss to figure out all right it’s your turn now to go through we have some practice problems aligned for this to get you more familiar with how you can use case when statements in order to pivot data using these different types of aggregation methods in the next lesson we’ll be building on this focusing on statistical functions such as min max average and median and diving further into revenue with that see you in the next one now that we’re warmed up using basic functions to analyze pivoted data we’re going to now shift our focus in using statistical functions for this specifically we’re going to be covering these functions we’re going to warm up by focusing on the easy ones first of average min and max in order to pivot that database and understand some data insights and then from there use the percentile count or continuous function in order to analyze the median revenue of sales and we’ll continue on that same trend of analyzing this based on all these different categories to see what is the highest performing category so where can we find out what statistical functions are available to us well we go to the source documentation here at postgress they have the section on all the aggregate functions which includes the statistical functions and scrolling on down we can see max and min which we’re going to be using shortly which find the maximum or minimum value of expressions across all non-null input values and then similarly we have a whole host of in-depth uh statistical functions those around correlation looking at r squar standard deviation and even variance so let’s get into actually analyzing using min max and also average i need you to start up a blank notebook for you to work with and so what are we going to be analyzing for this well if you remember back from last lesson we calculated the total net revenue by category broken down for 2022 and 2022 we’re going to be doing a very similar approach to keep it simple looking at things like min max and average because of this if you still have that query you can go ahead and just copy that query right above as we’re going to be reusing that and modifying that to apply these new functions so inside my blank notebook i’m going to go ahead and paste that right here and actually running it press control enter first one we’re going to try is average and this finds the arithmetic mean of all non-null input values so pretty simple in here we’re going to keep this query mostly all the same but instead of doing sums here we’re going to be performing the actual averages because of that i need to name the aliases appropriately naming them average net revenue 2023 i’ll press controll enter go ahead and run this and bam now we have our average values and this is pretty neat because if we remember back computers had the highest total revenue but yet in this home appliances have the highest average net revenue anyway if you want to visualize this we could click that graph thing and try to actually visualize it below but being that this is categorical data i don’t find that the graphs that ever provides are that good because this table is so small i actually sometimes can just take this and copy the entire contents and just paste it right inside the chat itself since it doesn’t take up too much space and give it the prompt to visualize this and then bam we can see the different average values for these different categories across the years and actually compare them seeing things like computers actually were lower in 2022 than 2023 now that we understand average let’s explore min and max and it’s going to be a very similar syntax to this so i’m going to use some ai to automate this i’m going to go ahead and just copy this query right here open up gemini and i give it the prompt add in min and max statements similar as done with these average statements and then below this just go ahead and paste it in let’s see if it can actually do this going to expand this out to actually be able to inspect this and it looks like we got it done so we don’t need to be all repetitive anytime there’s repetitive task and give to ai to actually do this i’m going go ahead and actually insert this into here and close out of gemini and then i have all of these syntax errors because remember we don’t have the magic command for sql at the front and now i can go ahead and actually run this pressing control enter and bam now in one table we have not the average the min and also the max all formatted and typed out correctly pretty neat with ai so let’s crank this up a notch and start looking at a similar analysis but now using the median for those that are not familiar with what a median is if you were to have a list of numbers and then you were to sort them in order the median is the middle number so in this case when we have these seven numbers right here six is the middle number whereas when we have eight numbers we take the average between the fourth and fifth number which could be only four and five and that becomes 4.5 now median is extremely important especially when you’re working with data in this case we’re looking at salary distribution this from my python course and we’re looking at salary distributions and you can see that we have the salary go up and then go down but then it goes out and we have like high out values way out past 350,000 if we just use the average this average is going to be pushed towards a higher number and is not going to be representistic of the actual data so median helps fix this issue by basically sorting all those numbers taking that middle number and getting a more representistic number of what you would expect to see in this case salary what would you expect to see i wouldn’t want to expect to see a higher salary when i know i’m going to get a lower so you may be like “this is pretty easy all i got to do is change all these average values to median and then run this unfortunately there’s no such function as median hence the no function matches this given name and argument type and that’s common across all databases whether you’re using postgress sql server or mysql we’re going to use this percentile con or continuous and that continuous portion is a key part because now this function that we’re using percentile cont is not only an aggregate function but it’s an ordered set aggregate function so what does this all mean not only do we have to use percentile con and what fraction we want to use for this in this case the median or half of it is 0.5 but we also have to use this other syntax here of within group and then order by so let’s actually break this syntax down using this percentile continuous function we need to provide it basically a list of ordered values it’s not going to sort it itself and be able to pick it out like other aggregate functions because of this we first in parentheses have this order by column this specifies how we want to order the values that we’re going to be picking out the median value from but this only sorts the values we actually need to bind it to this percentile continuous function that’s why we have this within group portion right here to bind that ordered set if you will to that function so let’s just take a simple example first using net price here i have a simple select statement of net price we’re taking it from sales itself and it looks like we have almost 200,000 rolls of net price let’s get the median value from this so i’ll first start by using that percentile continuous function notice they also have a percentile discretet that’s if you don’t want it to average if you have two numbers in the middle you don’t want it to average if you want to actually pick a certain value you use discretet i mostly stick to continuous and then we’re finding the median value or that that is at the 50th percentile hence 050 and we need to first bind what we’re going to be binding this so we’ll use that within group and then in parenthesis we’ll use that order buy specifically of that net price and then we’ll assign an alien an alien an alias of median price let’s go ahead and run this bad boy and bam we get that median price of $191 just out of curiosity i’m going to compare it to that average net price and similar to what we saw with that salary data that i was showing previously the average price is much higher and that’s because we have these high value items that aren’t necessarily purchased as much driving that average up so media in this case is a lot better at getting a representistic understanding of what the common net price is people are seeing so what are we going to be calculating this last example well we want to find out what are the median sales by category comparing 2022 to 2023 notice here we’re going to say sales i’m going to use sales more frequently um but this is technically net revenue but in the business side we typically just say this is sales i don’t like to start from scratch from this so i’m going to work with that very last query that we just did where we found out the average min and max four different categories so i’m going to go ahead and just copy this all and then paste it into a new code cell here remember we’re not going to be using this average min or max so i’m going to go ahead and remove it let’s just start by getting the median net revenue or sales for basically all the years and then we’ll filter down by 2022 and 2023 after so i’ll define that we want the median by specifying percentile continuous we’ll use the binding function within group and then we’ll use that order by and this will be done on the net revenue which is quantity times net price times that exchange rate we’ll give it the alias as median sales all right let’s see if this bad boy works controll enter all right not bad we have our median sales for our columns but fast forwarding to the future remember we want to call them on 2022 and 2023 so starting with 2022 first i’ll put year 2022 here so know that that’s what we’re working with basically we want to provide the necessary values right here inside the parenthesis to filter down for 2022 so because of that we need to use a case when statement so i’ll add in case and then from there i’ll press enter and indent this in and then we need to fill in when the column name equals a condition then we basically we want this value of the net revenue and so after that we’ll actually end it okay so we need to fill in this column name equals condition mainly we want to meet the condition of verifying the order date is in 2022 so we’ll remove this out and we’ll start by defining the order date we’ll use that between argument and specifying that we want between january 1st 2022 to january or december 31st then it’s equal to these values i’m going to go ahead and just for good measure to make sure it’s a little bit more readable put that into parenthesis so just to be clear right you can see this order by we have this pink parenthesis right here it’s then doing a case statement to determine if they’re within a certain date then it’s equal to this value else not going to have any values let’s go ahead and run this all right not too bad i want to add in 2023 now i don’t feel like retyping all the values so i’m going to use gemini for this and i’ll paste in the code giving the prompt of addin 2023 also under this thing and it looks like it did it correctly i’ll go ahead and insert this in double checking it yeah looking good and now running this final query we have the median sales for 2022 and 2023 so just taking it a step further actually analyzing this we can see that comparing those median sales to that total net revenue which is also total sales we can see some interesting insights specifically for computers from 2022 to 2023 the median sale actually went down and corresponding with this the total sales of that same category of computers went down so maybe it’s something you could bring up to the svp of computers all right so you now have some practice problems to go through and get even more familiar with using these statistical functions when applied with pivoting and using case when statements in the next lesson we’re going to be getting into advanced segmentation we’re going to be learning how to use keywords like and and when to break down analysis even further with that i’ll see you in the next one welcome to this last lesson on using case statements in order to pivot data and in this lesson we’re going to be going into advanced segmentation so what is segmentation well it’s a really important data analytics concept in order for you to take large data sets and break it down into smaller pieces in order to analyze different behaviors as a data analyst i’m applying this concept all the time when i’m using large data sets so i can dive deeper into the details and understand different behaviors so how are we going to do this well we’re going to start off easy in our first one we’re going to learn how to use the and statement within a case when statement in order to analyze multiple conditions for that net revenue of all those eight categories we’re going to break it down looking at segmentation of year based on 2022 or 2023 along with looking at whether it has a high or low value median price mainly look at the net revenue for orders that are less than the median value and look at the net revenue for those greater than the median value for a second and also final example we’re going to be looking at how we can use multiple when clauses within a single case block this is particularly important whenever you need different outcomes based on different conditions and for the example we’ll be doing we’ll be breaking down the revenue into multiple different tiers so instead of looking at orders less than and greater than the median value with this we’re going to take it a step further and we’re going to be able to look at orders based on where they fall in certain percentiles now before we get into both those examples i want to just demonstrate it real quick how to actually use both of these concepts the first concept is we could use and to combine multiple conditions within a case when statement and this is simply just done by adding condition one and condition two and use that and end statement so for this one we’re going to be looking at two things quantity and net price i’m going to go ahead and run this query right here we can see based on our order date we can see things like the quantity and then also the net price what we want to do is create a new column and classify whether these orders are a high value order so basically i have a quantity greater than or equal to two and the net price is greater than or equal to maybe $50 and then if it’s not that we just want to call it a standard order so going to insert a new line underneath this select statement starting with our case statement and then indent it in to insert in when we’re going to be checking two things right we’re looking at what is the quantity and that it’s greater than or equal to two and adding in the and keyword we’re going to look at the net price and whether that one is greater than or equal to 50 in this multiple condition case we want to categorize this as a high value order else we want to categorize this as a standard order we’ll end this case statement and give it the alias of order type okay running this pressing controll enter we can now see that this allows us to do multiple conditions and those that have greater than one or greater than 50 are categorized as high-v value orders the second concept to understand is we can use multiple when clauses within a single case block now there’s no limit to the number of wens we can put into here but basically it’s shown as every time you have a when you then need to have that then keyword and specifying what you want after that and then usually end it with some sort of else now i find myself using this type of approach of multiple when statements whenever i have to break it up into different categories for example we’re going to be following the same approach right looking at quantity and net price but we’re going to want to categorize this now so previously we were just categorizing high-v value orders those greater than two and greater than 50 and everything else is a standard order instead we want to more precisely fine-tune this we want to classify when it’s a multiple high-v value item so greater than two and we’re going to change the value to greater than 100 then we want to also check for single high-value items so those that are greater than 100 but are a quantity less than two so one and then we’ll categorize those that are multiple standard items so those that are greater than two but still less than 100 and then everything else is going to be a single standard item so we can build on this query that we already used previously one thing to note is like i said we changed this so this is no longer 50 i’m going to change that to 100 and call this multiple highv value order for this we just enter a new line enter in our when statement and then use what we’re going to analyze for is the net price greater than or equal to 100 then in that case we’re calling it a single high-v value item next when quantity is greater than or equal to two then we’ll have multiple standard items and then finally the else will be a single standard item okay let’s go ahead and run this bad boy and scrolling on down we can see that it appropriately classified based on these multiple different conditions pretty neat all right so if there’s anything you get out of this video it’s these two concepts because we’re now about to get into more technical examples to demonstrate how you would use this in the real world so as always i like to start out with what we’re going aiming to achieve in this and similarly we’re going to continue our analysis looking at our categories but for this we want to not only break it down by year looking at revenue in 2022 versus 2023 but the reason why we need this and condition is we want to segment it further in order to understand the low revenue and high revenue what do i mean by that in the last lesson we looked at what was the median order value for a single order well basically we want to see what is the total or net revenue for orders less than median and then what are those greater than median thus high now technically as we showed in the last lesson we have median values for each of the different categories in order to understand how this and condition works in here we’re going to keep it really simple at first and we’re just going to calculate a single median value for all the categories and apply it to all so let’s start by calculating that median value we’ll start with select and then use that percentile continuous function and then use within group as our bridge to then put in order by for calculation of our net revenue of quantity net price and exchange rate we’ll name this as the median for right now we need this from the sales table and since we’re working between 2022 and 2023 we’ll add a wear clause on this as well specify and use an order date between 2022 and 2023 and i have a syntax error because i applied an alias of s for sales we’ll be using this later so we’ll keep it i’ll add that s there run this again and we get a median value of 398 so remember what do we want to do we want to calculate basically the revenue that is less than the median value and the revenue that is higher than the median value so just as a reminder we’re trying to calculate with that median value of 398 for orders that are less than 398 what is that total revenue low revenue and then also the higher what is higher and doing this for 2022 and 2023 so what i’m going to do is go back to the problem that we used previously because we reuse this where we’re calculating those median sales um at the end of lesson two in the last lesson i’m going to go ahead and copy this and then back in our blank notebook underneath our median calculation i’m going to go ahead and paste this in now we’re not going to be using these median sales like we calculated it right here so let’s just run this to see what we have right now and it should just show all eight of our categories now we can add in those calculations so specifically for this one i don’t care about the 2022 and 2023 23 i just want to calculate the low net revenue and the high net revenue we’ll start with low first we know we’re going to be adding everything up so we’re going to be using the sum function for this and then we’re going to have a case when condition and it’s going to have that syntax of case when the condition then the value and then end it and we’ll end it with the alias of low net revenue so the condition is we want it to be less than that 398 that that order value is so we’ll add in quantity time net price times exchange rate and make sure that it is less than 398 this line’s getting a little long so i’ll go ahead and bump this on down and then also indent it in and now we need what is the value well the value is what is that revenue so i’ll copy the same formula that we used above command c and place this into value one so let’s go see if this works right now as written and looks like it does do these numbers make sense yeah they do so now let’s add a statement for the high net revenue i’ll go ahead and copy this all add a comma and then paste it into here changing this to greater than equal to 398 and then changing this to high and then going ahead and run this make sure we have no errors looks like we have everything here so now we’ve at least gotten that low net revenue and that high net revenue the next thing we need to do is now actually segment it down based on 2022 and then 203 and this is where we’re finally actually getting into what we’re trying to teach with this concept of using an and in a case when condition specifically inside of our case when we can use two different conditions using that and keyword so basically we want to add our multiple conditions right here inside of here before the then so what i’m going to do is enter down and then indent in and start that second and condition so we’ve already checked for the first condition of hey is it less than the median value the next thing we need to look at is the order date between january 1st 2022 and december 31st 2022 and i can actually go ahead and copy this and also place it in this condition as well i’ll then update this to be this is 2022 and this is also 2022 let’s run this make sure that it’s working properly and i got a little bit of a typo i added two betweens in here i don’t know what i’m thinking there try it again okay now it’s working all right this is looking pretty good now all we got to do is add in 2023 so i’ll use gemini for this cuz i don’t really want to fix all that code paste in the formula giving you the prompt of add in two columns for 2023 it generated it so i’ll add that code cell in and then from there add in that uh sql magic command run this pressing controll enter and bam now we have that net revenue for 2022 and 2023 whether low or high so that’s how you use the and condition but i’m going to be honest this query i don’t like hard coding in values like this and so i’m going to show you real quick a little advanced technique so that way you don’t have to hardcode values in so if we scroll back up to our original query right here that actually calculates our median value we can use a cte to insert in this value into that query now as a refresher for cte they start with that width command and then from there you’re going to be assigning the name of that cte so i’ll name this one median value and use as from there i’ll put an opening and closing parenthesis along with actually pasting in that value i like to indent it in to just make it easier to read we’re not using it yet but i’m just going to press control enter to make sure that my query is still operating correctly it is and then now all i need to do is insert it into this from command right here so i inserted it in as median value giving it the alias of mv and then if we remember the column name is median so i’m going to replace that 398 remember we can press command shift l to select all of it and then type in mv.median okay let’s run this bad boy hope it works and it does gets our final value now this is pretty good now that we can see these breakdowns between the years based on high and low net revenue we visualize this we can actually understand better what happened with the computer sector specifically remember computers drop down in revenue well it’s not for those low revenue or those value of orders less than the median value really what happened was is they saw a drop in the orders for those high value orders those that are greater than the median so pretty interesting insight that we found out and we actually break it down further in the next example in this example we’re going to be building on that last example further by using multiple when clauses within a case block previously we were only using one when clause but we’re going to actually step it up a notch and use two when clauses why is this important well this is going to allow us to segment within a column into in our case different revenue tiers so all of these will be categorized whether high low or medium along with the associated calculating for the total revenue so what does this high low and medium revenue even mean we’re going to be segmenting based on where an order falls within its percentile specifically if an order’s revenue is less than the 25th percentile we’ll cate categorize this as low between 25 and 75 is medium and greater than 75 is high now why this 25th and 75th percentile well it’s actually pretty common in statistics to use these values in order to bucket things into their different quartiles technically statisticians like to call that range between 25 and 75 the entire quartile range we’re just going to call it medium and then everything less than this low and everything higher than this high so that’s the basis on where we’re getting these numbers for now you may be like luke how the heck do i calculate the 25th and 75th percentile but remember the median is the 50th percentile so that’s why in this case we use that 0.5 so let’s go ahead and copy this that we used before and i’ll put in here.25 and we’ll call this that 25th percentile going ahead and running this we get an error because silly me can’t start an alias with a number in sql we actually need to use a letter to start so i’ll just say hey this is revenue 25th percentile and now run this okay not bad let’s do this for the 75th percentile and adding all in pressing control enter we have the 25th percentile and 75th 75th should be much greater than the 25th as expected looking good all right we’re going to be using these values in our final query so we’re going to be not going to be hard coding them in so i’m going to create another cte so i’ll go ahead and tab this over and then from there create a ct of percentiles and then assigning it inside parenthesis here so running this make sure it works just fine not going to cuz i need to insert a query below it i don’t want to start from scratch with that bottom query so i’m going to actually just going to scroll up to that previous one that we just created i’m going to copy this one and we’ll modify this one to make sure that it works pasting it in here first thing i’m going to just get rid of all of these conditions that we created along with that median value because we’re no longer using it but i will go ahead and add in that cte of percentiles and we’ll just name it something like percentile okay let’s actually try to run this and i can see what the error is now i have this comma after the end before the from let’s try this again okay we got all the categories as expected and we have our cte inside of there so the first thing i think the easiest is because we already have this group by and everything like that let’s add an aggregation in here to actually provide what is that total revenue so we use the sum function for this then we’ll add in the quantity time net price time exchange rate and we’ll give this the alias total revenue all right let’s go ahead and now run this all right so we have our categories and total revenues we now need to do one more step and this is actually what we’re trying to learn of implementing specifically breaking down all these different categories into those revenue tiers and so for this we’re going to be using the multiple when statements within our case i’m going to go ahead and copy this right here and then paste it in between here all right so the first thing that we want to check for is whether we’re mean that low tier condition basically that the order revenue is less than that 25 percentile so for the condition we’re checking whether this value here i’m going to go ahead and copy this and paste it in here is less than or equal to that revenue 25th percentile that we’re calculating up here so we brought it in with that alias of pctl and that’s that revenue 25th percentile and we’re going to assign it the value of low for the next condition i’m going to go ahead and just copy this right here paste it into condition two and i’m going to just say we want to check to make sure that it’s greater than or equal to the 75th percentile in that case it’s going to be high everything else is going to be classified as medium for the alias for this we’ll name it as revenue tier all right looks like everything’s in order let’s go ahead and run this bad boy press a control enter and we get an error which i’m catching it because it’s pointing out that it’s with the when clause right here now we’re aggregating right we’re doing the the sum of the total revenue based on these different tiers so technically we also need to do a group by for this so underneath here i’ll add in revenue tier sher run this again and bam now we got it not too bad now technically i would like in the order i’m a little nitpicky i would like the order low medium high so i’m going put number values out in the front of these in order for it to be able to actually order in the correct order so i need to actually add that in underneath the category name and we’re going to do an revenue tier run control enter we got a comma run control enter okay now we have it and better order high medium low 1 2 3 and this is pretty neat cuz we’re able to do multiple segmentation in order to analyze these different revenue tiers and we actually visualize it when we put into something like a 100% stacked column chart we can see that we have the high the light blue whereas medium and low are getting darker something like that computer sector that we keep on talking about they are very reliant on their revenue coming from those high ticket items those that are greater than 75th percentile whereas something like games and toys are highly reliant on low and medium value items so pretty interesting insight one last technical note for both the first and last problem we used these 25th and 75th percentile across the entire range of categories and similarly for that first problem we use the median value across all the different categories technically this isn’t necessarily the best practice you should do for this we went back to that first problem you’d actually want to calculate the median for each of these different categories and then from there actually segment it down and break it down further but whenever we look at the query it breaks down it gets a lot more complex and isn’t really what the focus is of this lesson focusing on that adding multiple when conditions or using that and condition but if you like to see the nitty-gritty technical details they are in the notes all right you now have some practice problems for you to go through and get more familiar with using these multiple different conditions for segmentation with that we’ll be moving into the next chapter on dates and we’re doing a lot of different functions with that with that see you in the next one welcome to this chapter on date calculations and in this we’re going to be learning how to use different date and also time functions and keywords in order to analyze data now the first lesson we’re going to get an intro and how this is useful in performing time series analysis specifically we’re going to use things like date truncate and two character in order to calculate things like the number of unique customers or net revenue per month in the second lesson we’re going to fine-tune how we can extract out certain components of the date and also use things like the current date or now in order to investigate certain time periods from when we’re analyzing it and in the final lesson we’ll cap it off with keywords like interval and functions like age in order to calculate things like average processing time and compare that to the number of orders we have and this is all very important and so kelly and i included this in these beginning chapters because you’re going to see as you go throughout the rest of the chapters a lot of the concepts were learning with how to manipulate dates are going to be used in those future chapters date and time data is everywhere you go you can’t get away from it now i highly encourage you for any of these functions or operators if you’re curious of learning more go into the source documentation which provide the link over here or maybe over here so we’re going to be using the date trunk function so what does this do well in date trunk you provided a field that you want output from it whether something like seconds minutes hours days weeks whatnot and you provide the source usually this is in the form of a date or time so let’s go over a simple example first i’m open up with a blank notebook here i just have a simple query where we’re looking at the order date because that’s what we’re going to be manipulating with this date trunk formula from sales and we only have 10 uh outputs so i’ll controll enter okay this is good now all these dates are the same um but i want to be able to see all the different dates so a little trick you can do is i’m going to use order by and then use the function random now whenever i run this pressing control enter i’m getting random dates right here so we can better see if it’s actually applying to a lot of different data anyway let’s get into that date trunk function so typing this function right here we can see that it also outputs in here hey we first need our date expression and then we need our date part which i just clicked to open up we can also use this documentation to further investigate what is going on here so pretty convenient of what’s going on here okay so we’re going to first input in that date expression and it’s a string so it needs to be in single quotes i’m going to put in month from there we need to put in the date part so i’ll put in order date now we’ll run this pressing controll enter and we get if we see these these order dates we can see that it’s just the month now notice the data type of this it’s getting converted to a timestamp and this is a little bit inconvenient and a little bit too verbose for me so we can clean this up specifically if you remember from a few lessons ago we can use this double colon sign which is the cast operator and we can cast this instead of as a timestamp as a date running control enter now i just have it output as a date take it one step further and also just rename this as order month and we’re good to go all right so what are we actually trying to calculate in this exercise this is the final table that we’re aiming to get to in it we have the order month which just showed you how we did and we want to get with this we’re going to use group by to analyze the net revenue and also the total unique customers so let’s start with that query that we just previously built we’re going to add onto this by first calculating the net revenue remember we use the sum function for this this is calculated by multiplying quantity times net price times exchange rate and this is our net revenue so i want to make sure that this is operating correctly and silly me we’re doing an aggregation of sum so we need to perform some sort of group by so instead of this order by i’ll put in a group by and specifically we’ll call out that we want to do it by the order month i’ll also go ahead and just remove out that order date so we don’t have to add it to that group by run control enter okay not bad getting net revenue per order month we only have 10 results right here i have it to limit 10 mainly i do that when building queries so they operate more quickly i’ll take it off at the end next thing to get is total unique customers so i’m going to go ahead and add that in we need to do a distinct count of the customer keys so count and then distinct specifying customer key and then we’ll assign the alias of total unique customers running this okay we got totally unique customers net revenue and order month so exactly what we needed out of this we can go ahead now remove that limit 10 and press ctrl enter bam now date trunk is really great especially if you just want to specify one attribute you want to extract out of it such as something like month as we did you could either do quarter year decade century or even millennium and so if i like to customize it more i like to use the two character function specifically you provide it something like a timestamp and then the text output and it outputs it in that text format scrolling on down we can see that it has a host of different options that we can use for this you can specify a lot of things like hour of the day year even things like month and the good thing about this is you can actually combine these together in a format or in an order that you want so let’s just show this simply by implementing in order date and formatting them as month and year we go ahead and press control enter and we have our random dates right here let’s now add this new function in in its own line so enter two car and then the next thing is the actual field itself and then next is what we want to output so in the case that we want something like just the year only i’m going to put it in single quotes and then go ahead and run it and we can see unlike the last one where we had to like cast it as a date and remove all the time and stuff it just outputs what we need and then if i wanted something else say not only the year but also the month i could just put it in there so double m in that case running control enter now we have the month and year so this table is super helpful in understanding what are the different formatting options that you have for this what you can use so back to the original example that we were working with we can actually replace this entire line and use this two character function specifying order date and then how we wanted it formatted and of course we’ll give it that alias similarly of order month so it’s performing the group by properly i’ll press control enter and so let me forgot a comma after this and so now i feel this output is a lot more readable regarding this order month because that removes that day and we can actually see what it is for each of these months the revenue and total customers now we because we aggregated this on a monthly basis vice that daily basis that we were previously doing we removed a lot of noise and from this we can see that in 2020 we had obviously some sort of worldwide event that caused an impact in the number of unique customers and also net revenue that we had but it looks like as of 2022 these numbers have returned back to normal except a slight dip in 2023 maybe something we’ll have to investigate later all right it’s your turn now to go and test these out we have a few short practice problems for you get more familiar with this in the next lesson we’re going to be jumping into even more complex formulas such as current date or even now with that see you there in this lesson we’re going to build further than we learned in the last lesson specifically by understanding more about how we can actually filter dates and even do it dynamically first we’re going to learn two more functions on how to format different dates and for this we’re going to be diving into how we can analyze things like the net revenue for each year for every category and then from there we’ll use things like current date and now to basically filter data by a certain time time frame from this time period now pretty neat all right first of the two functions is date part and this one extracts a specific component from a date or time stamp as we can see we have date part function unit and then what is the source or column name host of different options so with the sample query we can look at things like the year month or date here we’re using that date part function specifying those different components the applicable column and we give it appropriate aliases one thing to note with this which is not necessarily my most favorite part is that they come in with precision so they have decimals after it and i don’t necessarily want this depending on what unit i’m working with because of that i prefer to use something like extract and it has a very similar format to date part it’s actually based on date part and we can see as going through this we can do things like day decade dow and whatnot basically all the same things that we can use in date part we can use in extract the syntax for this is slightly different though in this case we’re going to use the unit and instead of doing a string for it we actually do an uppercase actual variable name and then we say from the source in our case our column name so this bottom query is using that extract and we’re going to do exactly the same thing that we did just above in that similar example using date part we specify year month day from order date and provided the appropriate alias for it let’s go ahead and run this bad boy and like i mentioned i like this one a lot better especially when dealing with things like year months or dates where i want in your digits for these values so let’s use this extract function in order to analyze the net revenue per order month now previously right we use this two car or two character function to actually analyze per order month what the net revenue is let’s instead create columns for months and then also for years so i’m going to go ahead and remove this portion right here and put in extract it gives us a hint up here we want to put in the part first well we want the year next we’re going to use the keyword of from and then the date expression specifically we want to use order date we’ll give it the alias of order year next we’ll get into adding that month one we’ll write the extract formula do it for month from and specify order date for this we’ll give it the alias of order month now the group by for this we’re doing two columns now so i’m going to want to actually do order year and order month this looks good let’s go ahead and run it and not bad it’s all over the place so i’m actually going to go ahead and change this to do an order by at the end of this so that we can get some semblance out of this data and bam this looks a lot better and now we have this in different columns so depending on how those that i give this data to they can slice and dice it even more easily all right let’s actually get into some new concepts and implement dynamic filtering by using things like the current date or the time now in order to filter back let’s talk about current date first so typing a simple select statement along with current date it provides me the documentation for this and it basically says hey it returns the current date as of the specified or default time zone parentheses are optional when called with no arguments basically you can provide a time zone if you want anyway running this we can see we get the current date i’m filming this apparently i’m filming this on valentine’s day that reminds me i need to call my fiance and actually wish her happy valentine’s day so i’m glad i saw that anyway that’s current date let’s go to the next one and that one is using the function of now similarly i can run a select statement with this just calling the function make sure you do have an open and closing parenthesis and we run this and we can see that it is valentine’s day at 2:30 in the morning okay actually not filming this at 2:30 in the morning this is actually green witch meime which is over in england so that’s what time it is there so that’s why for the current date it gives you the option to actually throw in a time zone in there to update it appropriately so what are we going to be calculating well the short answer is we’re going to be looking at understanding what is the net revenue per category for those orders 5 years ago back from today we’re basically building this table the important thing to understand is this this is a dynamic filter and these type of things are very important and understanding to do because sometimes you want to or you’ll have workflows set up that run queries automatically at midnight and you don’t want to be pulling in all the data maybe you only want the data for the last 5 years and things like this are great for that so let’s start with a base query that we’ve seen time and time again first we’re going to be extracting out the order date the category name and then perform a calculation for net revenue we’re going to be doing this from the sales table and left joining it with the product table on the product key and then because we’re doing aggregation above for the net revenue we need to actually group it by the order date and the category name pressing control enter we got this and it looks like these dates are unordered so i’m going to go ahead and throw in an order by run this and now we have the order the dates in order so we only need one do one more step but i’m going to break this step down because we want to include only orders within the last 5 years basically we shouldn’t be seeing anything from 2015 i’m in 202024 as of filming this so we’re going to create a wear filter in here to do this but i want to break it down slowly to show what’s going on step by step so i’m going to insert in how what components we’re going to use to filter within a wear clause we’ll start simple first we’ll look at current date and see what it outputs to here as expected we get to see it’s valentine’s day now in order to extract out these last five years we need to get what is the year in the current date and also what is the year in the order date so for this we’re going to use that extract command we want from it the part which is the year and we’ll do from we’ll keep it simple first with just the order date i’ll give it the alias of order year running this see it’s working just fine we’re getting that order year for all those different order dates next let’s extract out the year from the current date so we’ll just put in here that keyword of current date and we’ll say this the alias of current year run this okay not bad now we want the year that we’re going to be filtering by right we want it to be basically 5 years ago so all we need to do is i’m going to copy this one we’re going to be getting rid of all these um but i want this one right here and instead what we’re going to do is we’re going to do minus5 and we’ll set this one as minus5 clever i know running control enter we can see that the minus5 is actually five behind this so what we need to do in our wear clause is combine these in a way to where it filters for that so wear clauses go underneath from or the left join and for this we want to make sure we want to see the order year so i’m going to go ahead and copy this above c is greater than or equal to this minus5 value that we did right here and i’ll go ahead and post that in here let’s go ahead and run this crl enter and bam now this is a little bit hard to read but we can see just looking at the order date column these are the orders for the last five years i’m going to go ahead and remove these unnecessary values now we don’t need this anymore that was just for building this and we can see that we have this now this query’s built for the last 5 years now you may be like “luke this is it’s valentine’s day right now but this is going back to january 1st 2020 what if we went to be very precise about that?” and i’ll say “aha to that we’ll answer in the next lesson.” all right you now have some practice problems to go through and get more familiar with using these different things and creating your own dynamic filters in the next lesson we’re going to get into date differences basically using functions like age to measure the time between different dates with that see you in the next one in this third and final lesson on this chapter on date and time function objects we’re going to be going into now understanding how to calculate intervals in the first half we’re going to continue on from that problem from the last lesson and instead of making really that verbose way of calculating the last 5 years we’re going to use the keyword of interval to write much more succinct queries and readable queries to understand what we want to get an interval of in the second half we’re going to be going into a pretty interesting business problem of exploring average processing time so in order to calculate this interval between the order date and also delivery date which we know we’re going to use functions like age and also show what can be done with that previous function of extract first let’s explore how to use this keyword of interval interval can represent a span of time such as days months hours minutes decades or even weeks and we use this by using the interval keyword and then a value and unit so let’s test this bad boy out i’m going to run a sele simple select statement specify the keyword of interval and then let’s do something like five centuries with this we can see that it gets the title of interval and it calculates it to be 182,500 days anyway normal output for this is in days whether you’re using centuries or even use something like months running control enter comes out in days so how can we use this in the query that we used in the last lesson to filter for orders within the last 5 years i’ve simplified the query basically we’re pulling out the current date order date from that sales table and we use this formula of pulling the year out from order date and the year out from the current date and subtracting five run this we can see that the current date is valentine’s day and the order date is within that last 5 years now notice this right i called it out last time this goes all the way back from january 1st 2020 so technically this is slightly greater than 5 years so let’s write this query a lot more succinctly i’m going to go ahead and remove this portion right here and for this we want to make sure that the order date that we’re actually trying to filter for is greater than or equal to dates of 5 years ago so we can use once again that current date and we can subtract from it the interval of 5 years running this bad boy we can see that this one now does the last 5 years as shown by the order date and it’s very specific right it gets all the way down to filtering it to february 14th valentine’s day in 2020 so getting into cleaning up that full query from last time this is actually it right here if we run it again we can see that it does the current date the order date category name and net revenue so it breaks down by category the different net revenues this one technically remember wasn’t 5 years so what we can do is go back and replace this portion right here with that newly formated clause that we came up with and then whenever we go ahead and run it we can see that now we have it for the last 5 years all right before we get into age and also review of that extract function let’s look at what we’re actually trying to solve in this portion of the lesson if you recall back we have two columns an order date and delivery date column i forgot to put a comma here so it’s not appearing now it is um so we have things like order date and delivery date what we can do with this type of information is calculate how long what is like an average processing time for a customer to receive an order it’s a very important metric whenever used in business analytics so what we’re going to do by the end of this is show on a yearly basis not only the net net revenue which is those blue bars but also what is the average processing time for those years and we find that it’s going up so what are we using for this well we’re using the age function and with this we provide two in this case timestamps we provide dates and it’ll output an interval let’s do a simple example first running it just right inside of a select statement we’re going to be using the age function and then we’re going to provide it two dates now i have a couple of errors with this i’ll go ahead and run this first pressing control enter and we have this render age and it says hey no function matches the given name and argument specifically it says age integer integer i don’t want to evaluate integer i want to evaluate as a date the problem is with the date we have to provide it as a string so make sure you have single quotes around it running controll enter we have it now if you notice from this i did the 8th of january to the 14th of january and it’s saying it’s -6 dates for the age function to get a positive value you need to provide the end date and then you need to provide the start date so i’m going to go ahead and place these in the correct order running this get six days now let’s say we want to do some math with this i have currently six days and let’s say we wanted to subtract i don’t know five days from this if i put in here after this minus five and try to run this i’m going to get an error specifically it’s going to have that the operator does not exist interval minus integer right now this is an interval and we’re trying to subtract an integer we need to convert this portion to an interval sorry i mean integer need more coffee well we can use the extract function i’m going to go ahead and remove this all cutting it out we’re going to use the extract and we need the part and then specify the from keyword and then the date expression for this we’re going to specify day it doesn’t need to be in single quotes for this it understands that keyword of day i’m specify from and then for the date expression i’m going to go ahead and paste in that age now running this pressing controll enter we can say see that we get that six from this and i can do the minus five from this now pressing control enter we get one so let’s get into calculating the average processing time by year and we’re going to be doing this for the last 5 years similar to this table we also need to calculate the net revenue but we’re not going to do that until the very end because it’s going to make our query a lot longer with actually joining in the table that has the revenue data so let’s start simple first and let’s just look at the order date and delivery date and we’re going to be getting this from the sales table now let’s put in a new column in for the processing time we’ll throw that age function and remember we need to put the end date first so that would be the delivery date followed by the start time or the order date and we’ll name this as processing time this query is getting quite long so i’m going to go ahead and throw in a limit of 10 in here just to start with running this we can see we’re getting basically zero processing time everything’s getting delivered on the same day that it’s ordered i want to see a little bit more different orders so i’m going to throw in an order by random and run this and now we can get some actually that have some days in there to show that okay this is actually working so let’s start getting the average processing time and aggregating it based on year because of this we need the year for this i’m going to use the date part function and as the first argument we need to specify what is we actually want we want year and we want that out of we’re going to go with the order date of when it actually started we’ll give this the alias of order year now next thing we do is actually get into aggregating this age but remember we have this processing time is an interval of of the data type interval so 3 days 0 days it’s not going to be able to actually average that so we need to use that extract function first and i need to specify the part from this specifically it’s the days and specify that from keyword and then age delivery date and order date i’ll then put a closing parenthesis on this and then rename this alias to average processing time now we’re doing an aggregation so one we’re going to need to specify a group by and then because of this we don’t need to have that order date and delivery date because we have to aggregate by that and so i’ll remove this order by random and we’ll throw in a group by right here specifying order year also this won’t have a lot of outputs i’m going to remove that limit statement then running this and silly me i’m reading this here the delivery date must appear in a group by clause or be used in aggregate function basically i forgot to use the actual function right here oopsies throwing that in of average and now running this we are getting the average processing times all right we need to clean this up still these years are out of order and these amount of digits on here are just unreadable so i’m going to throw in an order by clause underneath here and we’re going to order by the order year running this we now have it in order and now let’s clean up this average processing time i only really want two digits and if you remember from the basics course we went over the round function in there and with the round function you provide the value or column x and then after that n is the number of actual digits that you want or decimal places after this so if you don’t use anything with zero digits but we want two so we’ll go ahead and put two on there run control enter bam got average processing time over well we want to do the last 5 years so from our previous example up above i don’t like typing code if i don’t have to i’m going to go ahead and copy that wear clause and then putting it underneath the from statement now running this we have for the last years the average processing time all right now just one last thing to do we need to now add in that revenue based on each of those years for this we’re going to be using the sum function to sum it all up and we’re going to be multiplying quantity time net price times exchange rate and we’ll give it the alias of net revenue all right let’s go ahead and run this not too bad typically with these high of numbers i don’t care about these two decimal places so i’m going to use the round function again and in this case i’m not going to specify an argument at all run this and okay it looks like it’s given it two decimal places with just 0 0 what if i can just specify this and looks like i can’t specify zero but instead what i can do is i could give it something like cast and then for cast i can specify that i want to cast it as an integer we’re going to control enter bam now we have what i want and then if we graph this we can see that the average processing time over time has gone up from around a little less than one up to 1.6 even with the dip in revenue or i think the number of orders probably loosely correlated we still saw the average processing time go up so this is a good little data point to keep track of and we can pass on all right it’s your turn to go through you have some practice problems to get more familiar with the extract age and also interval keywords and functions and from there we’re going to be going into the next chapter after that on window functions pretty complex topics i’m looking forward to get into it see you there all right welcome to this chapter on window functions this is probably the most requested topic for me to cover in this intermediate course so i was super excited to get into this now we’re going to break this chapter up into five lessons starting with this lesson focusing on the syntax of window functions doing some simple examples and explain it to you then we’ll start picking up the pace looking at things like aggregation ranking lag lead and then finally we’ll close it off looking into how we can use things like frame clauses well this doesn’t matter unless you really understand what window functions are so let’s look at a simple example so let’s look at a query that’s breaking down the net revenue by an order number in it i’m listing things like the customer key and then the order key and then the associated line number for that order calculating the net net revenue and then getting it from sales we’re then ordering it by the customer key so what’s going on here okay customer key 15 made only one purchase of over $2,000 but customer 180 did three separate purchases where two of these purchases of were the same order they just have different line items so now let’s say based on all these individual orders we wanted to find out because we want to do some deeper analysis what is just the average order specifically what is the average net revenue for an order what i could do is run an aggregation function removing all those other different columns and just run it here and see that the average order order value is 1,000 but i want it in this table and i can’t get that necessarily unless i actually get it in this type of format using this aggregation but this is where window functions come in instead all i have to do is just insert in our window function don’t worry we’ll go about it over in a second it’s using the over function in this case go ahead and run it and have an error because i got a comma in here and we can see that it’s the same value as below of 132 but it’s now in our original table just like we want it so we can do even more calculations with it so why use window functions they let you perform calculations across a set of tables related to the current row like we just showed and like we showed they don’t group the results into a single output row this is very beneficial as we’re going to demonstrate some future exercises so this is using it for things like running totals ranks or even averages anyway let’s get into the syntax for this we start by defining the window function or what we want to do if it’s an aggregation something like sum or count if it’s ranking it’s something like rank or dense rank the next is over and this defines the window for the function inside of it we have the keyword of partition by we’ll get that in a second let’s actually go back so let’s walk through that window function that we just saw without using partition by and i’m going to create a new line and for this the window function we’re going to use for this is average because we’re calculating what is the average net revenue from there i’ll put our variables in there of quantity net price and exchange rate then we’ll put over and it’s very important after this that we include open and closing parenthesis even if we’re not going to put anything in there and then i’m going to give it a very verbose title to make sure we understand what it means the average net revenue of all orders going ahead and running this we can saw similar to before it’s at 1,032 but now let’s say we wanted to filter this type of window function further maybe by something like the customer key and this is where partition by comes into effect it divides the results into partitions or better said divides it into separate groups without actually having to use such as a group by clause to do this so going back to our previous example i’m going to go ahead and copy this all so we can see it add a comma new line and paste it in and this one we’re going to find partition by customer key and we’ll say that this the average net revenue of this customer let’s go ahead and run it and scrolling down we can see for customer 15 it only had one order so that’s the average net revenue whereas customer 180 had multiple and the average of that was 836 so let’s just briefly explore the power of window functions by looking at a simple example i’m not going to ask you to follow or understand the syntax because we’re going to get to it later on but this is going to demonstrate the features that we’re going to be getting to in this chapter so here we are looking at things like customer key order date and net revenue i’m going to go ahead and run this so let’s say for each of these customers we wanted to rank their orders from highest to lowest based on net revenue well i could use something like this window function that’s using row number and some other stuff that we’re going to get to and in this case we can see that it actually calculates the rank for this specifically it has the highest at rank number one and the lowest at three for 180 now let’s take it to another level we could do something like calculate the customer’s running total so in this one we can see with 180 that has multiple orders first it’s at 525 and then it jumps up to 2500 these orders on the same day so the running total is the same we could also do things like get the customer total net revenue and this one isn’t really that impressive because yeah we could see that it gets the total but personally i like it because i can then use this to maybe calculate hey what percent is this order of the total net revenue of a customer in that case i can just add it in doing the net revenue divided by the window function to figure this out and so now i can the table by the way and now i can see what is the percent of the revenue for a customer so this makes even more unique of why window functions are so powerful so let’s actually get into applying those concepts so you can actually write those queries like i just ran through what we’re going to be finding is the percent daily revenue based on an order line item to do this we already know how to do net revenue we’ll need a window function to calculate daily net revenue and then we’ll calculate a percentage from that so let’s start building this query out we’ll list the order date the order key the line number which an order or an order key could have multiple different shipments with it so therefore it has different line numbers and then finally we’ll just start with the net revenue we’ll get this from the sales table and we’re just going to limit it to the first 10 all right so and as mentioned the line number right it just says 0 1 2 3 the highest number in this is six i don’t really like that these are two separate columns so i’m actually going to combine it i’m going to take the order key and multiply it times 10 so basically adds a zero to the end and then from there add in that line number and then i’ll give it the alias of order line number we can see that what it looks like actually here it gets accomplishes what we want so i can go ahead and remove line order and order key so the first thing we want to calculate in our final table is that daily net revenue in order to do that we need to use the sum function on our net revenue and now we want to use over and then inside of over that partition by remember we want the daily net revenue so we’re going to be putting the partition by of order date and from there we get that daily net revenue now now we want to calculate finally the percent that an order line item is of the daily net revenue this one’s going to be pretty simple as it’s going to be a lot of different copy and paste so first i’ll drop in the net revenue i’m going to multiply it times 100 so it gives us a bigger digit for that percentage and then from there we’re going to divide by all of this window function right here which i’m going to copy and paste into here and we’ll give it the alias percent daily revenue not too bad it’s all over the place i want to actually see this ordered for this day to see what is the highest so i’m going to put in an order by specifying the order date and then the percent daily revenue but for this one i want it in descending order and bam we get this final table showing this percent daily revenue based on the order i went ahead and graphed it so we could compare the different order line items for that day and we could see that some of the orders i mean these are taking up of 20% up to 10% of the entire daily net revenue and conveniently up the top of the chart but the total daily net revenue so it’s pretty convenient for window functions that we can get all this type of data into a single table and it makes it a lot easier later on whenever we dive deeper into it and maybe visualize it one minor note on this query it is getting a little bit verbose in that a lot of this is repeating stuff that we could reuse and that could be done using something like a ct or subquery what i could do is actually just put it into its own subquery of the core items that we need from this so the order date order line item net revenue and daily net revenue and then from there put it into something like a subquery so i’m going to go ahead and tab this over and so that way we can do a select star from and then put this all within parenthesis and we’ll give this alias of revenue by day so it still has the same output in below but now instead of having to repeat all that different code here like i did to calculate the percentage i just come up here and then insert a new row specifying i want to do 100 times the net revenue divided by the daily net revenue and give it the alias of percent daily revenue now running this boom this has everything that we have or want out of it and in my mind it’s a slightly easier to read and to get through when sharing with others so let’s now get into performing a cohort analysis in this last example and cohort analysis is going to be done a lot throughout this project because it’s pretty popular in business analytics all right what the heck is this well a cohort is a group of people or items sharing a common characteristic and then the analysis of this examines the behaviors of this over time specifically the behaviors of that group being a personal item so what does this even look like well for this example right here this is what we’re going to be doing and analyzing or putting people into cohorts based on the year of their first purchase and then from there calculating what portion of the revenue they are contributing to the net revenue so down at the bottom is the purchase year and then over on the y axis the net revenue in 2015 where this is the data that starts it starts in 2015 everybody contributing this is from cohort 2015 so that’s all the net revenue then we get into 2016 we have their 2016 cohort along with it’s a small contribution from the 2015 cohort in 2016 we work our way all the way to 2019 and we can say see once again that the cohort of that year is the largest contributor whereas those from previous years are less of contributors now it’s important that you understand for this example that the cohort is based on the first year that you made a purchase so what are we aiming to get out of this well we want to have this final table where we have the cohort year or the year of their first purchase and then from there the purchase year or the apparent year of the net revenue with the total revenue for that cohort so we’re going to start simple first with our first query looking at using a window function to extract out the cohort year based on a customer so we’ll start with the select statement specifying the customer key we’ll also do order date so we can see what’s going on there and we’ll get this from the sales table we’re going to limit this to only 10 values also have a comma right here we don’t want that going to press control enter and bam okay so we’re getting customer keys and order dates i’m also going to go ahead and just for good measure i’m going to order by the customer key so that way we can make sure that we’re looking at this all appropriately specifically if there’s grouping such as here is 180 i see them all together so now let’s use a window function to get our cohort year we’re going to start first by just getting what is the minimum date out of or for a customer so in the case of this 180 i would expect it to be this order in july so we’ll start this window function using the minimum of specifically order date and we’re going over and then inside of parenthesis we want to put the partition by of the customer key and i know it said date right now but we’re going to just name this the alias cohort year because we know we’re going to change it okay press control enter okay looking at 180 it is in fact that lowest date for order date as the cohort year now what we want to do is extract out that year from there so i’m going to run that extract function on this and we need to specify the part first so that’s going to be year and then from and then everything after this is that date expression so then i’ll put a closing parenthesis all the way in the end run this one thing real quick i did show that how it was 2018 but now look at this if we go to something like customer key 387 we can see that their first purchase was in 2018 but then later on in 2023 they still have that cohort year of 2018 so we know our formula is working so let’s go ahead and clean this query up because i don’t need this order date at all for what we’re doing it for so i’m going run control enter now i’m noticing we have duplicates because customer key is appearing more than once so what i can do is add a distinct statement right after this and now when i do this boom i’m only getting distinct values for the customer keys and the cohort year so this table’s a lot more concise um we also don’t need this limit 10 anymore and technically we don’t need an order by either this is good start so we figured out that cohort year for the customer this is once again the final table we want to get to now what we need to do in order to basically add these additional two columns on of the purchase year and the net revenue is we need to calculate this if you will separately and join in that cohort year using something like a cte so let’s put this all into a cte that we can then join to that sales table to basically attach on for all these different customer keys their associated cohort year and then we can aggre aggregate and find out what is that total revenue so we’ll use the width keyword specifying this as the alias as yearly cohort we’ll use as and then an open parenthesis we’ll tab this on over and then put a closing parenthesis and then i just want to make sure that this works properly so i’m going to do a select star and do from that yearly cohort above and run this to make sure that it is outputting correctly yep still the same table okay now let’s join this on to our sales table for this is very important we get the correct join for this specifically we’re going to be connecting to the sales table so i’ll put that right here and then for a join we’re going to be doing a left join we’re doing a left join because we want to make sure that we have and keep all of the different sales values from it and then we’re joining using yearly cohort which has been distilled down to remove any duplicate data so we’re not creating duplicate rows for each of these i’m going to give them an alias this one s and this one y and then as far as how we’re going to join this you’ll be on the customer key from both of these tables let’s go ahead and well let’s actually do a limit 10 because there’s a lot of data all right so just inspecting the table we can see we have the customer key of 947009 and then it is joined on that customer key along with its associated cohort year so just as a reminder of what we need to get to now we need to get the cohort year and then the purchase year or the year of the order date and aggregate all this to calculate what is the net revenue now when we do this aggregation we’re not going to use a window function this time we’re going to use a group by on these different years so let’s clean up these columns that we’re actually using we’re going to use that cohort year and then also the purchase year which is based on order date so we need to extract out the year from order date and we’ll define this as the purchase year let’s just make sure this is correct before doing our aggregation next okay good we have all the order years and purchase years now we want to move into getting a sum of all the revenues so we’ll use the sum function specifying the quantity times the net price times the exchange rate and we’ll give this the alias of net revenue and so because of this i’m going to be doing a group by specifically on that cohort year and also the purchase year which we’ll need to put the function in here going ahead and running this we can see we have our results and anytime you’re finding your or getting to your final results you need to make sure that the numbers are making sense specifically i know that the net revenue for 2015 was around $7 million and i can see from this that for 15 i say 2017 mean 2015 so for 2015 is owned 7 million and this checks out also silly me i could also use purchase year in this case doesn’t necessarily have to be the function itself all right so we want all the values for this because we’re almost there i’m going to go ahead and remove that limit 10 go ahead and run this and bam we have our final values where it shows based on a purchase year how much a certain cohort contributed to the revenue for that year and this is the ultimate visualization that we get to and we can see that pretty interesting enough the cohorts from previous years don’t really contribute that much to the overall net revenue so we have a little bit of a retention issue all right it’s your turn now to go in and get more familiar with window functions by doing practice problems in the next lesson we’ll be diving we’ll be diving deeper into aggregate functions and basically fine-tuning our knowledge of how to use window functions with that i’ll see you there welcome to this lesson on aggregation functions using window functions anyway we last lesson we went through and started to apply aggregation techniques to window functions but we’re going to build slightly more further on this covering three key concepts so in the last lesson we actually used the min aggregation function to analyze the impact on yearly cohorts on the total revenue as we went through the years well in this one we’re going to do a simple analysis basically reinforcing what we learned with cohorts but this time using the count function in order to analyze the total number of unique customers and how they impact based on their cohort into future years next we’re going to move into using the average aggregation function and for this we’re going to be focusing on the long-term value of a customer basically using window functions in order to calculate what is the total amount of revenue that a customer has contributed and we’ll not only be able to break it down on a customer by customer basis but we’ll also be able to analyze it from a cohort year perspective as well and finally we’re going to wrap it up with some simple examples on understanding how to filter window functions basically where you should be applying your wear clause in order to filter a window function properly all right let’s get into it so for this example we’re going to be using count in order to aggregate our window function the syntax is all the same so major concepts remain all the same let’s get into what we’re going to be solving for this we’re trying to find out the number of unique customers which we’re going to use the customer key for this and find out based on their cohort or the first year they bought an order how they contribute to future years this graph right now is showing the total number of unique customers every single year and then from there broken down by cohort so we can see from 2019 we had those from 2015 16 17 so on for the final output we’re going to be doing it very similar to last time where we want a table of the cohort year and then the purchase year and then the number of customers or the number of unique customers based on a cohort year and purchase year now you could start with the query from last lesson and modify it to fit your need for this but i find this question is actually more simple than the last one so we’re going to start from scratch also it’s just good practice so for this we want to get not only the customer key but also based on a customer we want to get its cohort year or the first year it made an order and then also a purchase year when the year of a purchase was we’re going to be ultimately using that customer key to get the unique count anyway let’s start programming start with a select statement adding in our customer key next we want the cohort year so similar to the last time we first want to get that minimum order date for our window function and then from there we want to do it over the partition by customer key i’m going to name this alias right now cohort year cuz we’ll eventually get it into a year format right now it’s obviously just date for this we’re going to be getting this from our sales table let’s go ahead and run this just to see scrolling down to that 180 customer key we can see that it is all from june so this is looking good next thing we want to do is extract the year out of this so we’ll do an extract we need the part which is year and then from and then the date expression which we’re going to wrap all in parenthesis and then we’ll put another closing parenthesis on there to for the extract we’re control enter and we have the years now next let’s get the purchase year and we don’t need to do a windows function for this right we just need to extract the year from our order date so i’ll do an extract the part is year and then we’ll do a from and then the date expression is order date we’ll give this a name purchase year and go ahead and run it okay overall looking good everything look like it’s calculating correctly now what we need to do is move into actually getting the count of the unique customers using that customer key and you may falsely think that this is actually going to be wrong code right here but you may falsely think that you could use a group by to do this specifically we’d want to do something like count the different customer keys or i should say count the distinct customer keys and then from there add a group by to do by cohort year and purchase year however when we ever go to run this we get the following error message window functions are not allowed in a group by basically we can’t combine these window functions and group eyes so trying to use group eye in order to calculate this and save some code not going to work here just wanted to point that out but what we can do is make this into a cte and then use that cte to run a window function on to get the customer key so for this to create our cte i’ll give it the alias of yearly cohort open parenthesis and put all the syntax in there next just to make sure that everything is working correctly i’m just going to call a simple select star on from yearly cohort run this and it’s still running properly okay we’re good to go let’s refine this now now remember this is the final output that we want we want cohort year purchase year and then number of customers so let’s work to build this table for this i’ll remove the store put in cohort year first put in the purchase year and then next let’s build the window function to go through and count the unique customer keys but we don’t want it just on one thing we want it based on both the cohort year and the purchase year so in this case customer key 180 was a unique customer in 2018 which is also the cohort year but then as a cohort year 2018 was a unique customer and also two 2023 so for this we’ll first start by doing a count and one thing i’m not with this right this has some duplicate data we want to get unique customers so technically i’d only want to see for 180 this 2018 2018 but then also for this 180 this 2018 for the current year and 2023 i wouldn’t want to see duplicates so what i can do first is i can add a distinct up here onto this query running control enter and then it’s not really showing because we remove the customer key i’ll just put it in for the time being running this again we can now see that for customer key 180 now it’s just those two entries so for this we’re going to be doing a count of customer key or you can do countst star over and then this is when we would get into our window function using partition by both the purchase year and then also the cohort year and we’ll name this as num customers so let’s go ahead and run this all right not too bad the one thing to note right is this is has multiple different duplicate lines because we have all those customer keys in there so once again what we can do is we can just add a distinct right after the select statement so that way we can get it filtered down to only unique values and then from that we can see that okay this is all on order now so now we need to actually do an order by so we’ll add an order by doing by cohort year and then purchase year and scrolling down we have what we want now and i know based on previous calculations that we had in 2015 only 2825 unique customers so this validates or help understand that this data should be correct so if i were to visualize it this is what we get and it comes out very similar to our total net revenue so overall not a lot of unique insights that i found from doing this compared to net revenue but at least we explored it before moving on i want to touch on a short example specifically around what we talked about just recently on this error message that we got when we ran this query where it said window functions are not allowed in group eyes technically this isn’t entirely correct as i’m going to show in a second we’ll be able to run window functions with group eyes but overall i don’t recommend using window functions and group eyes within the same query and we won’t be doing it for the remain of the course so i want you to learn this major concept now so let’s look at this simple example we’re going through and we’re going to collect the customer key and then using a window function we want to count how many orders that a customer has and so we’re going to partition it by the customer key and this will be the total orders i’ll go ahead and just run this and so we can see that customer key 15 has one 180 has three orders so they have a total of three orders everything’s working fine but now let’s say we want to calculate the net revenue but this time we want to group this all up because i’m tired of all these separate rows and so we’re going to use a group by with this to find the net revenue so i’ll put in the average and this will be of the quantity time net price time exchange rate and i’ll give it that alias of net revenue we’re doing an aggregation so like usual got to use the group by and we’ll do this of the customer key now when we go to run this query it is going to work notice remember 180 is at total of three orders and i’m getting an error message because i’m silly and i didn’t put a comma after this column go ahead and run this now and magically this query does work now right because we have an eye with a windows function in it but now going to that customer 180 they only have a total of one orders in fact if you look at all of them and i can move into this bigger table all of the total orders for every single one of this column here is one so what’s going on here well what’s happening is window functions in the process of running through this run after a group eye so what’s happening is everything is getting grouped together and then the window function is running this causes a major issue of conflicting aggregations so if you’re not getting an error message stopping you from this you’re probably not going to get the right results there are better alternatives which we’re going to go over and mainly that’s using cte or sub queries and breaking up your queries to separate them with our previous query what happened was we ran our group by aggregation to find our net revenue it condensed all those customer keys down to one value and then secondly it finally ran that windows function so that’s why it feels there’s a one value for all these total orders cuz it only sees one after the group by so let’s fix this up so this actually works and see how we’re going to be doing this in this case i would want to run the ct first so i’m going to get rid of this aggregation of this average and we’ll give this the alias of order value also i’m going to be removing this group by because we’re not doing the aggregation anymore we’ll then put this all into a cte so there’s no aggregation inside of this one we’ll then just query this table getting the customer key and total orders to make sure that it’s aggregating properly i’m going to go ahead and press ctrl enter and with this one we can see that 180 has in fact three and now we can go through and actually do our aggregation with that cte specifically we would get the average order value and give the alias of net revenue and then perform a group by on customer key and total orders going ahead and run this bad boy we can see we have 180 with three orders and the correct value there so it’s working so for the remainder of the course you’re going to see me anytime i need to do a new group eye or a new window functions i’m going to just create a new ct and then do it there all right getting into that second exercise we’re going to be focusing on the average function and using this in our window functions overall the syntax is the same so nothing changes there but we’re going to introduce a new business concept specifically this one of customer lifetime value and as the name implies it’s the total revenue generated by a customer over their lifetime with that company we’re also going to explore some other concepts such as the average order value or the typical amount spent per transaction but that’s less of a focus for this more of ltv so for this the main concept of concern is lifetime value which we have the abbreviation of ltv and it is the total revenue generated by a customer for a business over their entire relationship with that company so what are we going to be ultimately calculating for this well we want to find out based on a customer so in this case the x-axis is that customer id we can see what their total lifetime value is and even compare it as with this dash line to the average lifetime value for that particular cohort that calculation is a little bit more tricky to get to but we’re going to get to it so the final table we’re aiming to get to is this based on a particular customer key we want to be able to extract out not only the cohort year but also the customer lifetime value and what is the average lifetime value of that cohort so we can compare that two now this type of analysis is really great because we could do something like target these high value ltv customers because they’re more likely to make purchases and that’s typically what businesses do they’re not going after the lower numbers they’re going after the higher numbers enough happen let’s actually get into it and for this first query we’re just going to focus on three things the first is getting that customer key the next is we want that cohort year which as we’ve seen before is extracting out using the year now there is an aggregation function in this of the minimum order date so we are going to need to do a group by after this but anyway next thing we want to get into is actual a sum of their total purchases so i’ll run a sum function put in the quantity times net price times exchange rate and then give this the alias of customer ltv specifically we’re summing up all of the revenue for that customer so it is that we’re going to take this all from the sales table and like i said we need to do a group by specifically by the customer key let’s go ahead and run this all right not too bad we have basically the first three columns of well the four that we need for this and speaking of that fourth column that’s what we want to get of the average customer lifetime value for a particular cohort but remember we learned previously can’t be inserting window functions in a group by so i need to put this all into a cte and then do the window function we’ll start with that width keyword giving the alias of yearly cohort then signing it this alias obviously putting it all within parenthesis like usual i want to double check my work so i’m going just do a select star from that yearly cohort and everything’s outputting like we saw previously good to go so far inside of our main query we want to one we want to have all the different columns so i’m going to do a select star it’s only three feel fine doing this but next is where we actually want to do that window function using average and for this we’re going to use that value of customer ltv over in parenthesis going to do a partition by and we want to do this by cohort year we’ll assign this the alias of average cohort ltv okay let’s go ahead and run this not too bad one thing is i probably want this ordered in a particular order specifically on the customer key and the cohort year so i’m going to stick an order by in there and specify cohort year and customer year and bam this is the final table that we need for this and this can tell us some different statistics that we can now use like i said we could send this off to maybe something like the sales department or marketing department and they could do targeted ads at these high-value customers in order to have a potential or more of an impact on revenue now i also graph the average lifetime value for each of the cohorts and as expected it does decrease over time that’s expected because they have less time in the cycle than say somebody that 2016 or 2017 guess the one thing to note is 2015 and 16 or less than 17 so the first few years we started off we didn’t do as well with customer retention and extracting value out of it so pretty neat insight out of this all right last part in this lesson and we’re going to focus on two examples very quick in how to use where within a query to filter properly and this is really important especially for the practice problems we have for this in order to understand that you’re filtering properly so let’s start easy first how can we filter before a window function well we could use this where statement here and it is going to apply before we actually invoke the window function itself let’s actually test this with a simple example so for this i’m querying to get the customer key and we’ve seen this windows function before of extracting the minimum year to get that cohort year based on that customer year i’m going to go ahead and run this and there’s trusty number 180 we see that it is in cohort year 2018 so we know the problem or the query worked out properly so let’s say we have a scenario where we only want to look at cohorts from 2020 onward basically we don’t want to put people in cohorts before 2020 we want to just analyze from 2020 onward this is a great example of filtering before a windows function in this case we can use the wearer and we’re going to specify that the order date is greater than or equal to january 1st 2020 now let’s see what happens with customer key 180 okay previously it was bucketed into as its first order in 2018 but now since we’re saying “hey don’t pay attention to that anymore i just want to focus from this point onward it getting reclassified as 2023.” so that’s how you filter before window functions the wear clause can be right there in the statement underneath the select and it’s going to get applied before the window function runs now contrast that with filtering after a window function in this case we need to do something like a ct or subquery we’re going to do a cte here we would need to get that cte and then from there filter so let’s continue on with this last example to show when we might want to use this so the first thing i’m going to do is i’m going to remove this where because we’re not going to do that anything with it we’re going to put this portion into a cte we’ll give it the alias of cohort and then i’ll do a opening and closing parenthesis to put it all within then we’ll do a select star and we’ll do this from cohort so now let’s insert our wearer and in this case right we want to filter after the windows functions specifically we want to remember previously customer 180 was in that 2018 time as its cohort year let’s say we just didn’t even want to look at any of the cohorts if they were broken into that cohort we don’t want to even classify them at all after 2020 so for this we want to specify where it’s greater than or equal to and the year in this case would be 2020 now what i would expect from this is that customer key once again is going to disappear also this from is messed up right here it’s not it’s from cohort okay let’s go ahead and run this and bam we notice now that 180 is removed from this because of how we actually applied this after the window function to remove those cohort years from the original purchases anyway wrapping your mind around how wear function are applied before or after window functions does require some practice so we got some practice problems for you go through and test out and get more familiar with it all right in the next lesson we’re going to be jumping into functions around ranking and that one is pretty exciting so we’ll see you in there all right we’re continuing on in this chapter of window functions now focusing on how we can rank different values and use a certain order to rank them for this we’re going to be covering three main types of ranking functions row number rank and also dense rank i’ll be explaining the difference between all these and we’ll be doing this by ranking customers based on how many orders they’ve completed so more as they completed higher the rank they are but before we even get into any of those functions we need to first understand how to use order by within a window function in order to get the correct ranking that we want now previously with sql you’ve seen order by used typically after the from statement to actually order your values but we can use it inside of the window function and what this can do is order our values within a specific partition that we’re running this window function on and order by as a reminder always defaults to ascending but you can specify descending so what are we going to do in well with the sales table you’ve already seen the customer key order dates net revenue what we can do is get a running order count and what this will do is based on an order so in this case for 180 it has it first order in june or sorry july so we have our first order and then in august on the same date we have two more orders so then it bumps up to three similarly for customer 387 they have four orders on their first day and then the next day or the next time they complete an order they bump it up to five so let’s build something similar i’m going to throw in a select statement along with customer key order date and then net revenue we’ve seen this all before this is going to be from our sales table i’m going to go ahead and run this just to see what it’s outputting okay sweet not bad let’s now just get with this a count of the orders based on using a window function based on that customer key so i’m going to insert a count function and we’ll just do it on count star we’re going to do this over and then i’m going to do a set of open parenthesis i’m actually going to indent this down to make this easier to read and for this we’re going to partition by customer key we’ll do by order by in a second i want to go ahead and actually see this first now we’ve seen this before but i want to call out some things specifically notice that the customer keys are different from before so it actually went through and ordered it with that partition by except it didn’t order it by or didn’t order the order date because this one for 180 in june or july is after the august anyway that’s something to note for later but we do find we get the correct calculations for this because 180 has three orders and we can see all of them here so now let’s insert in our order by i’ll put it right after the partition by and specifically we’re going to be doing it by that order date and this can control our row processing order let’s go ahead and run this bad boy and so inspecting this table we can see now that not only the customer keys in order the order dates in order and whenever we go through the count itself for 180 it has one and then the next time in august there’s two more so now it increases by three so our function’s working so for these aggregate functions like count average or whatnot it’s going to determine how values are accumulated row by row also i realize i never gave this an alias so i’m going call it running order count anyway let’s now use one more aggregate function to demonstrate this further specifically i want to do a running average of the net revenue so basically i after the first order i expect to be around 525 and then after these next two orders i’d expect it to be an average of all three of these and it’s going to do this line by line for this i’m going to insert a new line i’m actually going to just go ahead and copy a lot of this because the boiler plate code itself is going to be approximately the same instead of doing a count we want to do an average and specifically we want to do it on that net revenue so i’m going to copy that and then paste it into here we’re still going to be partitioning by the customer key and ordering it by the order date in order to carry carry out that rowbyrow execution and we’re going to name this running average revenue okay let’s go ahead and run this and now inspecting it we can see that yeah for the first one they only have one order next one of 180 the first order the average is in fact the same as it is but then when we move into the next round of orders it’s averaged among all three of these so for 180 at this point in their order history the average order revenue is around $836 with that knowledge let’s actually get into our ranking function specifically first we’re going to be focusing on row number and understand the importance of how it needs to interact with order by now in postgress they have quite a few different ranking window functions but in this lesson we’re going to be focusing on these top three right here row number rank and dense rank for row number this returns the number of the current row within its partition counting from one so for this let’s just label the rows in our sales table with a row number for this we’re going to select all the rows of the database and we’re going to say this is from the sales table since we’re doing all the rows i’m going to just limit this to the top 10 run control enter okay this is our values back let’s now get into using this to assign a row number at the front of here one quick note for this data frame that’s appearing here is this is the index right here but it’s not necessarily callable the 0 1 2 3 4 5 so we really can’t use that that’s why we actually have to generate this anyway after the star we’re going to start a new line cuz i want all the actual columns to or yeah all the different columns to appear for this we’re going to specify the row number and then run over and i’m just going to do an open and close parenthesis we’re not going to use uh a group by just yet i’m going to go ahead and run this and if we scroll all the way over we now have the row number right here 1 2 3 4 5 6 7 8 9 10 as expected and it looks like our data set is in order now i don’t really like where this row number is appearing i actually want it at the front of the data set and i’m also going to give it an alias of row num anyway let’s move it to the front so important thing to note is yes this does provide the row numbering but this actually does this in a well chaotic order and it’s not guaranteed that it is going to be assigned based on how the data set is in the system so i don’t recommend doing this i always recommend anytime you’re using any of these numbering functions to use an order by so what is this order going to be based on well every order has a unique order key and also line number additionally with this i’m going to also use that order date to make sure that we’re maintaining in that date order i’m not 100% sure if the order key may mix around in certain locations in time so for this i’m going to press enter and now i can enter in my order by and i’m just indenting this in to make it easier to read we’re doing an order by we’ll specify that order date first and we’ll just run it as is just to show this it’s still providing those row number in sequential order by order date but like i said we want to be very specific to make sure that it’s doing it correctly so what i’m going to do is also add in that order key and then that line number control enter and bam so this has what we need in regards to row number let’s now take this a step further combining it with partition by specifically let’s say that we want to start or have a number for like our daily order number and so every new day we want to start this over let’s look first what would it what is it at on the 2nd of january 2015 and we do this by inserting a where statement and this is saying hey filter this for where order date is greater than this this is only done just to look at that uh on the next day running controll enter and silly me already forgot what we learned in that filter lesson right whenever we’re doing the wear right here underneath the uh window function it actually will apply this filter data to there and so in this case it automatically did start uh numbering rig again at one but generally if i didn’t have that wear statement wouldn’t do it anyway not necessary for you to actually follow along and do this code i’m just doing this for demonstration purposes i put all of our original query inside of a cte called row numbering then i called it to look at the first 10 values in it anyway with this same queries below now i want to filter for the second basically where order date is greater than the first running this one we can see that that row numbering starts you know at 26 27 or whatever we want to do a new window function in order to assign a new numbering each day so let’s go back to that original query that we have remove that ct so we can make it easier actually to run through all this for this we now want to add a partition by right above the order by and for this we want that new numbering to start every single day so we’re going to give that the order date as the variable for this okay going to go ahead and run this and actually we’re not going to see any difference in what’s displayed here but we do see down here that in 420 that it only the numbers only go up to like 97 so it does look like it’s working what i’m going to do is i’m going to now just go ahead and copy this all paste that into that cte so we can demonstrate this again and then run this remember previously it was starting the numbering around number 24 or something but now on the second it is starting it at the first so this is working so let’s now get into comparing the three major types of function we uh using for this row number rank and dense rank note all of them provide big or an integer value from this they all have a different way of ranking depending on what your use case is so let’s look at a simple example for this so let’s say we went through and calculated how many o orders each customer did and this one up here they’ve had a total of 31 orders well we’re going to go through and rank them using row number and then rank and dense rank and then seeing how they actually differentiate from each other for this let’s first just pull in the information we need using a simple select statement we’re going to use customer key and then to count all the orders we’re just going to do a count star to basically count the line as an order and we’ll assign this as total orders we’ll get this from the sales table and we did an aggregation method so we’ll do need to do a group eye of the customer key so let’s go ahead and check this bad boy out and with this we can see the different customers and then the orders the or the total orders are all over the place they’re actually get where we need to order this but we’re going to be ordering it in our window function so i don’t want to necessarily do that until after our window function also this is just too many values i’m just going to do a limit 10 for the time being all right let’s get into now building our window function first with row number to assign a row number based on what are the total number of orders they have so for this i’ll start a new line we’ll insert our row number function we’ll do it over and then inside of here we don’t need to do a partition by because well we’ve already grouped by our customer key to find out what the totals are so all we need to do now is just put an order by in here specifically we want to order it by that count star and we’ll give this the alias of total orders row num real original i know all right let’s go ahead and run this all right bam what we have here is well not really what we want uh so these all have total orders of one and it is uh providing a total orders row number one two all the way to 10 but i actually want to be ranked from highest to lowest so as remember from the beginning we can actually insert in a descending comment right in that order by now when we run it we can see that we get this total orders row number and we’re starting at the highest of 31 all the way down notice with this when we get to numbers like 26 26 and 26 repeating it does assign a number of four five and six but technically these are all tied ranks and so that’s why we need to learn other functions like rank so i’m going to go ahead and we’re going to insert another line in here i’m going to copy this row number because a lot of this is just boilerplate that we can repeat instead of row number i’m going to change this to rank and then for this alias instead of row number num at the end and i’m going to add the i’m going to change this to rank running this now okay we got the total orders rank and if you notice from this one whenever we get here and we have repeating values of 26 this is going to assign at four four so if we actually see then when we get to 25 it assigns it the number seven so then we’ll skip five and six and now you’re probably like well loop what happens if i don’t want to skip five and six in my ranking method and i just want to continue on ordering from there well that’s where dense rank comes in so let’s once again going to go ahead and start a new line paste in our old code change this one to dense rank and then also update the alias to dense rank running this one i put a comma at the end of this don’t need to have it going to run again we now have this new one where okay we get here we have the repeating fours and then when we get to the next one it jumps or it stays uh consistent in number and jumps to five and then we have some uh similar numbers again of 24 and so these are all sixes so when i need to order something or number something it really depends on what is my criteria on whether i’m going to use something like row number rank or dense rank that’s why got some practice problems now for you to go through get more familiar with these functions in the next lesson we’re going to be getting into lag and lead which we’ve been doing functions that really look at only the current row it’s on but we can actually use functions that look either before or after certain rows pretty powerful all right with that i’ll see you in there all right welcome to this fourth of five lessons on window functions and in this one we’re going to be getting into functions like lag and lead these type of things in a window function allow us instead of looking at the current row to allow us to look at things like the row above it or the row below it so we’re going to be exploring these main functions and we’re going to be doing this first with a very simple example where we can go in and look at our 2023 monthly revenue and be able to evaluate our month overmonth growth because now we can look either before or after row and be able to calculate this from there we’re going to shift into our final scenario which is slightly more complex in order to analyze the growth of cohorts over the years and basically see how they change from year to year so what functions are we going to be exploring for this well there’s generally about five different ones that you can use let’s start with the easy ones first we have first value last value and nth value for first value it returns a value evaluated at the row that is the first window or first row of that window frame the last one does obviously the last one and then for nth value you can specify an an integer inside of there and it will return that row now for lag and lead they’re very similar in that this one you’re providing either a row that is lagging or a row that is leading but we’re already to get too deep into it we need to actually explore this with an example so what i want to do is first we need to calculate this we need to get in 2023 what is the monthly net revenue and then we’re going to be applying these functions in a window function in order to evaluate first last lag lead and try all these different ones out so let’s start simple with just building this query so i’ll start with the select statement and the first column we want is the monthly v variables so i’m going to use the two care function for this i really like the output of this it’s a lot easier we’re going to run this on the order date and i want it in this format we’ll give it the alias month next is we want the net revenue so we need to use the sum function for this as always it’s quantity time net price times exchange rate we’ll give it the alias net revenue we’re getting this all from the sales table and we’re doing an aggregation so we need to do a group by specifically we want to do the group by by month let’s go ahead and run this bad boy okay first thing i’m noticing is it’s returning it back but it’s ordered all over the place so i’m going insert an order by we’ll also do this by month and we want to analyze this for just 2023 so i’m going to insert in a wearer and we’ll just do an extract to extract out the year and analyze for 2023 so with extract we’ll specify the part from and then order date and we want this equal to 2023 it’s a digit so we don’t have to put that in a string and bam we have what we need now now we can actually start getting into running all these lag lead and stuff functions now because we’re going to be running window functions on this i’m going to put this all into a cte we’ll use the width statement we’ll give it the alias of monthly revenue and then i’ll provide some open and closing parenthesis and then i’ll do a select star from this cte and make sure it’s still outputs yep still good to go so let’s get into exploring these we’re going to start with the easy ones first like i said of first value last value and nth value so for this i’m going to specify first value and we need to put something in here specifically we need to put the the value expression that we want out of this in our case it is net revenue which is right here inside of our cte the next is an over and then inside of here we don’t need to do a partition by because it’s everything’s already grouped but we do need to do an order by specifically where do you what do you want us to choose the first value from in our case we want the first value based on the months not on the value of net revenue so i’m going to put month in here and we’ll give it the name or alias first month revenue going ahead and running this we can see that we do in fact get that first month into all these different columns let’s now look at last value let’s change the function name and then also the alias running this one now this one unfortunately whenever you read it you find out oh heck this is not the last month’s revenue for each of these and that’s because the actual syntax inside of here of the last value of how we need to order by needs to change specifically i’m going to delete it and put this in here and we need special conditions of to use rows between unbound proceeding and unbound following when we look at it we can see that it is actually the last month net revenue anyway just wanted to demonstrate demonstrate that because it takes a little bit more to fine-tune this one we haven’t covered un unbound proceeding or unbound following we’re going to be covering that in the next lesson so stay tuned for that one all right next up is nth value i’m going to go ahead and copy this first value and then insert it down here underneath and then change the formula name of nth value also for this we need to specify the integer expression for what number of rows after this to do so right now we’ll just do three rows down and we’ll call this third month revenue okay running this bad boy we can see that we get the third month’s revenue now we do have this nonvalue in here once again if you want to fix this which we’re going to cover in the next lesson we can insert this statement right here right after this running this we now have the third month filled in for all this don’t worry those all make sense tomorrow when we cover all those but at least you got the basics now let’s wrap up the simple example with lag and lead so for lag we’re going to be looking it’s going to be a lag so it will be the previous month we’re going to look for this returns values evaluated at the row that is offset rows before the current row within the partition if we want to it’s optional because it’s here in square brackets but we could put an offset integer similar what we did for the nth value and we’re to offset that we’re not going to do an integer for these examples we just want the previous and then use previous to find the next months so going back to our query i removed some of the other ones because i don’t they’re it’s getting too much in there we’re going to specify this as lag with the parameter of net revenue we’re going to still order by month and this will be the previous month revenue let’s go ahead and run this and inside of here we can see yep this is in fact the previous month’s revenue this does take a parameter like i said you can do that offset so if i wanted to do two in this case run control enter it’s now offset by two values before it goes down we’re not using that for our case we’re just going to go we’re doing trying to calculate eventually month over month so we’re going to leave it at one and run this okay let’s now use lead i’ll specify the formula first and instead of previous month it is going to be next month go ahead and run this and we can see that it is actually the next month’s revenue for all of these so you’re probably like luke what the heck does this even matter yeah you can find this out but how does this even help me as an analyst well let’s say that i wanted to find my month overmonth growth which is pretty common in the finance industry for evaluating your performance so here i have a chart that’s showing basically our bar chart of the different revenue each month and then the line chart is showing what’s happening what’s the rate of revenue growth month over month here we can see that in may we had a substantial increase in growth and that’s because we had a pretty low one in april anyway it’s pretty big predictor in the business world so let’s get into calculating this so in order to do this we care about the previous month’s revenue in order to calculate this so we don’t really need this lead function right here i’m going to go ahead and remove it so what we need to do first we’ll just take net revenue and subtract it by this value right here to get basically our change every single time and we’ll give it the alias monthly revenue growth okay now we can see from this that it is calculating that growth an easy line to see is row two we went from the previous month at 4 million to that month of 2 million so we lost 2 million yeah calculations are doing well okay we want a a rate of change because of this we need to actually divide this value of net revenue minus the previous month revenue and we need to divide that by our original value which is our previous month revenue so i’m going to go ahead and copy that again paste that down here let’s go ahead and run this and bam there we have it we are now calculating that rate of change and we can basically see that um for this for in march we had about a 50% reduction in revenue whereas in something like may we had almost 150% rate of growth just to be clear this is decimals so this is a the percent if i wanted to i could just do 100 times this value and then we can see this percentage a little bit easier with this so let’s get into example to show the benefit of these type of functions that can be used and previously in the second lessons of this chapter on aggregation we went in and covered what was the average lifetime value based on your cohort if you recall back your cohort you’re assigned a cohort or customers assigned a cohort based on the year of their first purchase and we notice this trend that following around 2016 we see the lifetime value drop during these and that makes sense because their actual total lifetime is slightly shorter but we did have this unexpected rise in from 2015 to 2016 anyway what happens if we want to go through and actually analyze what are these different drops between each one of these ltvs well we can use our lag function for this now in order to do this we need or the table we need to start is this we need the cohort year verse their average cohort lifetime value now we basically calculated this back in lesson two of this chapter on aggregation here i have the formula here and actually what we got to inside of the lesson was this final table which consisted of one ct and then another formula underneath it anyway i took it a step further inside the notes so for those that have access to the notes you can go right to it and all you need to do is copy this the other option just pause your screen and copy this in also this next example we’re getting into isn’t that long so you can just watch along if you don’t even want to do any of these so we need to run window functions on this after i’ve gone through and inspected this it’s a ct inside of another ct but then we have to do the select distinct because we have multiple different rows so i’m actually going to put this one also into a ct i’ll give it the name of cohort final i need to go ahead and put a closing parenthesis on this and then do a select star from cohort final make sure everything’s appearing just fine underneath here and it is okay now we can work from here to insert in and we want to create our first column to look at what is the previous year’s lifetime value so for this we’re going to be using lag i’ll do a select star to basically show both those rows and then do a lag and the value we want to put in here is what value we want actually to appear and that is the average cohort lifetime value we’ll do over and then inside parenthesis we’re going to do an order by and we’re going to order by that cohort year and we’ll give it that alias of previous cohort lifetime value and okay we’re seeing that it is the in fact the previous cohort lifetime value all right all right final thing i want to do here is calculate that percent change so that year overyear change or better that the year of the cohort over year of cohort change for this remember our ratio is a final minus original over original so i’m going to take that final value of average cohort lifetime value and then subtract our windows function so command c this command v and then from there i want to divide this by the original so once again paste in that previous cohort ltv and we’ll give it the alias of lifetime value change okay we’re going to go ahead and render this okay it’s not the correct value i expect this to be more of a a decimal number i think this has to do with basically my order of operations here i think i have that parenthesis in the wrong place i want to do the subtraction first and then after that the division okay this is giving it to us i want to actually see these as a percentage so i’m going to put a 100 at the front multiply this and bam now we can see our lifetime value change yeartoyear as expected we had a slight increase in 2016 and then from there it went down now if you actually go through and visualize this looking at this is the average cohort lifetime value uh in the bars and then the actual rate of change we can see that the rate of change actually is picking up so although i would expect it to go down i wouldn’t expect it to go down at this high of a rate so maybe something we need to dig into in a real life scenario if this was happening all right you got some practice problems now go through to get more familiar with handling these type of functions and window functions in the next and final lesson of this chapter we’re going to be going into further detail and syntax understand to be able to use like we demonstrated that last value function how it wasn’t working properly without further syntax specified to it we’re going to be going over that in there all right with that i’ll see you in the next one welcome to this fifth and final lesson on window functions where we’re going to be focusing on frame clauses now up to this point with window functions we’ve only really focused on two things of the window definition that’s the portion after the over and that is looking at how we can use partition by and order by but there’s actually one more thing to cover with this and that’s the frame clause this aspect helps control what amount of data we want to actually control putting inside of the windows function what do i mean by all this well as we’re going to be solving in this say we have something like the monthly net revenue this is from 2023 as you can see it’s highly volatile going up in february and then going back down and then going back up well this is where frame clauses come to the rescue what we can do is look before and after certain rows and in this case average them and so we could in in this case perform a threemonth running average to basically smooth out our line this is very common in business analytics especially with seasonal data that has these types of ups and downs you want to remove all the noise and actually be able to look at it more clearly so postcrest has some documentation that goes all into this but it gets quite complex so i simplified it in our notes so this lesson is going to be focusing on using the frame clause of rows basically looking at what rows we want to put inside of this window function now as hinted to this comes right after our order buy and we can either include something like the start frame or we can use that between keyword to signify a start frame and then also an end frame but what the heck are the start frame what’s this end frame well there’s five main things we could put inside of here and is the majority of this lesson of what we’re going to be getting into for we’re going to be able to see how we can use current rows preceding rows and also following rows don’t worry we’re going to break each one of these downs so you’ll be more than familiar with it by the end of this now with postgress you can also specify besides using row you can also use things like range or groups this i would classify into more of advanced sql so we’re not going to be really covering it in this course additionally i had chat gbt make this fancy dancy table down here and i want to just show with this range and groups also isn’t supported in a lot of other popular databases specifically mysql and sql server so i don’t really want to waste your time if you don’t have those type of keywords available to use that’s why we’re going to focus on rows anyway you need to learn anyway to be able to apply range and groups if you want to learn that later on now for this entire lesson we’re going to be analyzing our monthly net revenue similar to what we did in the last lesson because as you remember we had some unanswered questions on how to use some of those functions without what we’re going to learn in here we’ll get to that by the end anyway you should remember this query or have it in your system already right this goes through and gets not only the month but also the net revenue for that month pulls it from the sales table and it only extracts 2023 i just want to look at one year so there’s not a lot of data we’re messed with since we’re doing an aggregation function we need a group by and then finally need or buy because it gets all out of whack and you should see something like this looking at our monthly net revenue if we graph it looks something like this that we saw at the beginning goes up in february and then also has a strong dip down in april then returns back to normal we’re going to be working towards getting a running average with this but we first need to understand current row which the keyword of this one before we move for any further so i want to run a window function on this query so i’m going to put it into a cte we’ll give it the alias monthly sales and i’ll put it into parenthesis we’ll select both the month and also the net revenue and pull this all from monthly sales let’s go ahead and run this and it’s exactly the same thing that we saw before now we can actually run or use a window function here instead of on here like we said before can’t be using window functions with the group by we can it’s really complicated we’re not going to do it anyway we’re going to do the window function below specifically all i want to do is get the average net revenue for this month so basically repeat that same value so we’ll start by calling average on net revenue we’ll go over i’ll start a new line and indent down and then before we actually anything else let’s just look what it generates it’s going to generate the average across all 12 months so we want to do this or we want to order it by the month itself so i’ll do an order by and then specify month now running this query we’re still getting the average but it’s slightly different now for january it’s still the same and that it’s the january average but if we look at something like february it’s not only it’s getting the average based on january and february for march it’s getting it based on january february march anyway we want to control this average so let’s move forward we’re going to start by renaming this column so i’ll give it the alias of net revenue current because we’re about to use current row now this has the following syntax of rows and then the start frame we’ll eventually get to this one of rows between but we’re just going to start simple with this one first and then remember our start frame or end frame can be any one of the following we’re just starting first with just current rows where we then move into calculating the average or should i say the running average all right so i’m going to insert in rows and then from there current row now this is selecting to run this window function on this current row so there should be no difference between any of these promise you we’ll see more of an impact in a little bit but i do want to demonstrate how you can also write this rows between and we need to do our start frame so current row and our ending and we’ll do once again current row as you would expect it’s only looking at the current one so it’s going to be the same across all of these let’s get into our next keyword of looking at how we can use something like n proceedings looking at preceding rows or preceding values this is going to be the final table that we get in it we still have our monthly and our net revenue and what we’re going to look at in this case is just one row back and the current row in order to get the average right here at the first one we expect it to be the same as this one but whenever we get to the second row right here it’s going to look at this current one and also the preceding one and get an average so the average of uh 3.6 million and 4.4 million is around 4 million and so for this we’re going to use inside of our start frame we’re going to use n proceeding specifying n as a number so getting back to our query right now we have in there to do rows between current row and current row we want to go one row back and also look at our current row so i’m going to remove this portion and we’ll specify one proceeding oh not receding but actually preceding okay let’s go ahead and run this and as we saw that in that demo table we’re now taking the average of the current row and the previous row we get 4 million in this case so we can take any number of values also we could just make sure that this is actually working properly by putting a zero instead in here instead of one which means the current row and we’re getting all the same values as before but we’re going to change by back to one now i’m going to take a step further just for demonstration purposes you don’t have to do this portion but i went through and we did before we did one proceeding i wanted to see what it was going to look like for also doing two proceeding and three proceeding and i got this fancy dancy table and from there plugged it into chatbt to actually visualize it and what we can see from this is that with each preceding row that we include so we include more to take an average of it this line becomes smoother and smoother it goes from it starts with a darker line with just core net revenue and it gets lighter and lighter depending on the preceding amounts that we used i know there’s a lot of overlap here so i also took it like this and i graphed each one of them individually showing how over time it gets smoother smoother and smoother so now you’re probably like luke is this what you do in a real world scenario i’d say well not typically just the proceeding but i combine this with something like following and so we’d use values before and after the current month and get something like a three-month average which let’s do it now so just as a reminder it’s going to be of the syntax and following where you can specify the number and it gets that many number of rows after the current row so back to our original query that we’re working with instead of doing one preceding and the current row what i’m going to do is now change this to one proceeding and one following running this query we can see that all our values now especially even that first month is smoothed out because it’s not only taking the current month but the following month whereas something like june is taking not only may’s month but also july’s month and then averaging it together to get this value this i feel is more representic of what i’d see in the business world i went ahead and visualized it and this shows how this actually smooths out our net revenue line by performing this three-month average now you could take it up a notch and do something like a fivemon or even seven months uh running average but at that point i think you’re going to be removing a lot of key insights from this so i’m going to stop right here at that all right last two start frame and end frames to end with and this is unbounded preceeding and unbounded following if you notice carefully all they’re really doing is replacing the n with unbounded and this says hey we want to use all rows from the start or maybe all rows from the end so let’s actually just do just that in here we’re going to place one on these with both of unbounded and what do you think’s going to happen here well if we use unbounded on both it’s taken the entire window function or window frame into the account for printing this average so we’re basically getting the average of all these 12 months now typically when i’m seeing anything like unbounded used i’m usually seeing it with something like current row running this we can see that the first row is equal to basically itself and then as it goes along it’s taking into account all the values behind it along with the current row and it seems like the line is just getting smoother and smoother and smoother as it goes along so where am i typically seeing these unbounded parameters being used well if you remember from that last lesson when we were looking at same uh chart of the monthly revenue we were able to use like lag lead functions specifically first value last value and nth value let’s go ahead and run this we saw that for first value it actually did give us the first value but then the last value it didn’t work out properly it just gave us what the current row and then for the third most or whenever we did nth value and specified three it gave non values for the first two but then finally gave us the third value for everything else well this is where unbounded comes in so let’s fix these functions be able to do this and then i’m going to indent this down to make a new line to make this a little bit more readable and then from there i’m going to insert in our frames clause specifying that rows between and then rows between unbounded proceeding and unbounded and f unbounded following running this one now whenever we look at that last month revenue we can actually see that it actually does equal the last month basically we had to open up what it was going to look at for that window function by using this frame clause similarly we can do the same with that nth value i’m going to go ahead and just copy and then paste that right into here now running this and we can see that now that third month’s rent is appearing in every single line regardless if it’s before or after so bam we’ve now covered all of the major aspects of using window functions you now have some practice problems to go through and get more familiar with using these different frame clauses inside of window functions and then in the next chapter we’re actually be getting into and how you can install this database locally and run it locally so you can have a workflow that’s actually workable with that i’ll see you in the next one all right welcome to the second half of this course don’t know why i had to jump in like that wanted to have a dramatic entrance for some reason anyway in this chapter i’m going to be taking you through all the steps necessary in order to install postgress locally onto your computer get you set up with its editor of pg admin and then also get you set up with an even better editor of dbaver so let’s break this down in this lesson we’re going to be installing postgress or the database itself locally onto your computer we’re going to be downloading it from the internet and then it’s going to install postgress but also pg admin now pg admin pg short for postgress uh is the editor used in order to interact with postgress databases so anytime you need to start or stop the database or even if you want to run a query with it we can do that with pg admin which we’re going to demonstrate all during this lesson now if you already have postgress and pg admin installed you don’t need to do it again but in this lesson after we do the install we’re going to go directly into actually loading the database specifically our contazo database that we’ve been using in those jupyter notebooks once we get the database set up we’re going to do a quick walkthrough of the entire pg admin ui so become more familiar with it now there’s one major flaw with pg admin and that it only connects to postgress databases and after this course you may be proceeding on to learn other databases because of that we’re going to be installing in the second lesson dber now this is a database management tool so can only connect to different databases you can also run queries on it to see the output we’re going to be using the community edition of this and it’s free and open source and has everything we need to do to get started this is the most popular database tool that i know of so i’m super excited to use this and everything that you learn for using dbver can also be applied with other databases which dbaver can connect to one last note before we begin some of you may run into installation errors or other errors along the way i highly encourage you to use something like chatbt to help you out it’s a lot quicker than trying to post a comment and helping or hoping somebody else comes in to help you out now let’s say you can’t figure it out or you’re on something like a chromebook and can’t install postgress in that case you can continue to run all of the different queries in our sql notebooks they’re going to work exactly the same and have the same output but as far as interacting with the guey and stuff like that you’re going to have to figure that all out yourself because obviously it’s going to be different than deep so first we’re going to navigate to the download page from postgress and from there you’re going to select your operating system that you’re currently on i’m on a mac so select that we’re going to be using the interactive installer by edb so we’re going to select right here of download the installer which everybody regardless operating system it’s going to get navigated to this page where you can then select your operating system once again and download it for mac or windows you’re going to want to launch this installer if it gets a warning message it’s okay click open it we’re now going to walk through the actual setup wizard that it has included for all this we’re going to leave all the defaults the same core things that we do want to make sure are installed which are by default our postgress server and also pg admin which is the guey interface next is the password and i’m actually going to set this one to a really easy one of password now my database i’m not going to have any confidential material on it and this database that we’re installing isn’t secret at all so i don’t care if somebody else accesses it but it doesn’t really matter because it’s local anyway so other people can’t necessarily get to it unless i have it access to the internet anyway that’s a long story key thing if you’re only going to be doing this course with this feel free to just set it to password you should be okay but if you’re not set it to something else and remember it keep the port number the same of 5432 it’s common to postgress databases and we’ll keep the default local and we’ll go ahead with this setup’s complete i don’t need to launch this stack builder exit i’m going go ahead and click finish and i’m going to verify this is now installed by going to my applications folder under postgress 17 i have these different options of what’s installed we’re going to be opening up pg admin this is the guey interface for interacting with our postgress database it’ll start loading up with this open we have two main panes in here we have our lefth hand side pane which is our object explorer which shows all the different databases we’re connected to right now it’s asking me to connect to the server specifically postgress 17 which is the one we installed so i’m going to go ahead and put that password in of password and i’m going to click this of save password so as we can see postgress 17 is a server we have one server and then from there we have databases inside of that server right now we only have this one standard database that comes in all postgress uh servers called postgress we’re not going to touch this bad boy i don’t really care about it but we can see that hey there’s only one database in here we also have options to adjust login or group roles and then also table spaces we’re also not going to be messing with any of this this dashboard over here on the right hand side i find pretty useless it just tells me when i have interactions inside of my server and when it’s getting used so now let’s install the kazo data set locally for this we need the database file for it which is right here of this kazo_100k.sql file it’ll go ahead and start downloading you don’t need to do this but i went ahead and opened this file just to show you the contents and in this it walks through actually creating all the different tables that we need inside of our database along with loading in all of our data into it it’s pretty long file so let’s get this file into a database and for this we need a database for this so we’ll rightclick databases select create and database we’ll name this contazo_100k keep all letters lowerase the owner will be maintained as the super user of postgress so we can use that same password to access it we don’t need to change any of these other settings right here they’re all good enough we’ll go ahead and click save now we can see we have two databases underneath here and it automatically dropped everything down underneath here now if i go under schemas and then it has a public schema i can see underneath tables if i click this to actually drop it down to see any tables there’s no tables inside of it so this is where we need to actually load that sql file into this database first thing you need to know the location of where it is i recommend just put it on your desktop we don’t need it after it’s done so you can just delete it back inside of pg admin i’m going to rightclick that contazo_100k database and i’m going to go here to psql tool this is effectively like using a terminal to interact with our database i’m going to start this command off with a forward slash then i this is going to tell it to execute the script that we’re about to insert into this and so inside of single quotes i’m going to then put the file location and i’m going to insert that in of users luke bruce desktop and then the sql file itself make sure that it’s exactly right on a mac if you go to the file itself click option and then rightclick it you’ll get this option here of copy the sql file as path name on windows all you need to do is shift and rightclick the file icon and select copy as path okay we got it in going to go ahead and press enter it says that pg admin like to access my desktop yes i want to allow this and it’s going through here and actually creating all the different tables and altering them and i can see from this that we’ve put in it looks like six different tables and it tells us all the different counts of the rows that we inserted into those tables so i’m going to come in here and go to tables and try to see it there’s no tables in here but all i need to do is just right click ino select refresh so now it should have it and we can see scrolling down the tables we have six tables inside of here which from this menu i can actually dive into in the case of sales actually dive into the different columns and everything else that has associated with this i can also just do a quick check of this by rightclicking something like the sales column going to count rows and it tells me at the bottom there’s over 199,000 rows in this sales table but how can we actually query this database of contazo 100k well first we need to make sure that it’s actually selected and then come up here and select query tool we can also see that they have a shortcut of option shift q it opens up in a new tab right here i see these other tabs here if i don’t want any of these other tabs i can go ahead and select x and close it out up here at the top it tells me which database i’m connected to if i had any others i could switch it right up here we have our query window which i’m going to put in a simple command to look at the sales table and the top 10 results to run this i’m going to come up here and select this play icon for execute script or i can press f5 and all the results are displayed here below i also have this scratch pad over here on the right hand side so if i don’t have queries that i want to keep track of i can just put it over on the right overall i don’t find myself using it that much other key features of this area are you could open a sql file right inside this window or if i want to save this i could save the file we also have options for explain which we’re going to go into more detail in some upcoming lessons now down here at the bottom we only have the data output but also any messages and notifications inside of that output they actually have a few unique capabilities with this in that you can copy any of your different exports of data out of it if you want to put in gbt or something let’s say we have a more complex query that actually does some analysis such as this one right here that looks at the total yearly net revenue well we can not only save results to file but also we can graph and visualize it right here i select line chart and then we want the year for the x-axis and then the total year net revenue for that y-axis and then select generate and not too bad to actually get into visualizing queries pretty easily in this last thing to note with pg admin is i can also do things like view the erd or the entity relationship diagram for the database by rightclicking it select erd for database and with this this is showing our sales table i can scroll in through this and actually see the sales table along all the different keys and columns in it and with this table how it’s connected to all these other different tables in it so a great way to visualize your database and tables that you’re working with all right we now have some practice problems for you to go through and get more familiar with this pg admin guey like i mentioned at the beginning we will be transitioning next to dbver but i do find myself from time to time having to jump in and use pg admin so it pays off and understanding the basics of this tool that’s why you got those practice problems all right with that i’ll see you in the next one welcome to this lesson on dbaver in this we’re going to be walking through setting up and getting dbver connected to our contaza database first thing we’re do is going to download the community edition of dbver which is free then from there walk through the steps necessary in dbver to connect to our postgress database and then finally once we have that set up we’re going to actually do a walkthrough of the dber ui understanding how we can run different scripts and how we can set up our project inside of it all right with that let’s get into it all right if you navigate over to dbeaver.io this is the homepage of dbaver community this covers a few details about the tools you can read further specifically dbe community the edition we’re downloading can connects to a variety of databases and has all these different editing and viewing options it’s by far talking to all my data analyst friends and also looking at the research it’s the most popular database editor so that’s why we’re using it now dbver needs to make money like any community so they also have a pro edition i’m going to go ahead and click this you don’t need to and so with that they have a few different edition editions that you can get and use some get pretty pricey if you’re a business but as far as the basic sql and coding that i run i don’t ever need the features that are inside the light enterprise or ultimate edition i can get it all done with a community edition but if you come a power user highly encourage you you buy a subscription because you support building out dbver further all right cool story luke let’s actually now get into downloading db and close the download and then from there you select what operating system you’re on and install it i’m going to be going through with mac windows is going to be very similar so i’m not going to cover them separately after your installer file loads you should click it and open it up on mac it’s pretty easy all you have to do is drag the beaver over into your applications folder and now it’s here i’ll go ahead and open it up if it asks if you’re comfortable with opening this app up yeah we know where we got it from i’m going to open it with dbaver opened up and launched um you may notice first that mine may be dark and yours may be white i have dark mode enabled on my mac so i guess it automatically picked it up change it to dark mode anyway it says “hey do you want to create a sample database that can be used as an example to explore basic db or features?” we’re not going to do that we’re going to just install the contazo data set and then i’m going to take you through this so it should have immediately popped open with this of a connect to database and now we’re going to get into installing the database if this select database didn’t pop up that’s okay there’s a few different ways you can get it up and we need to go through it anyway one other thing before that it does have this popup that says “hey do you want to share your data in order to improve performance i’ll leave it up to you on whether you want to do this or not.” so to create a new database connection you can either go up to the file menu and go to databases and select new connections or you can just come right here to this fancy dancy icon and select new database connection now this is one of the reasons why i recommend dbver so much is because it connects to a host of different databases and so that way you can connect all your different ones that you’re working on as a data analyst so in our case our canazo is a postgress database we’ll select that now we need to go through and fill out the connection details we’re going to be connected by host specifically our local host so it’s locally on your computer the database name is not postgress it’s the contazo 100k make sure you spelled exactly the same as what’s appearing in pg admin next we’re going to move into the username which we maintained it as postgress and then the password if you named it like me the password is just password all lowercase i’m going to leave save password enabled because i don’t want have to log it in every single time from here it’s already picking up that we’re using postgress 17 and everything else looks good let’s go ahead and test connection in my case it’s saying that the postgress driver files are not installed we need to install them basically like if you install a printer into your computer or attach a printer to your computer you have to install driver files to attach to it so similar here nothing wrong with this we’re going to go ahead and download with that we get our test results back and it says that we are connected now if you are not if you have issues with that one check all those credentials make sure they’re correct but two what may happen uh to you is that your database may not be started and so you may need to open pg admin and actually open it all the way up to the kataza data set and make sure that it’s actually running on your machine typically for both mac and windows your postgress databases should start when you restart your computer so you shouldn’t have to do this but you may have unintentionally disabled this feature and so you may have to restart anytime you restart your computer with all the credentials put in and the testing of the connection set we’re going to go ahead and select finish so let’s now walk through dbver and get into understand the ui and also running a few different files okay so we have this pane right here on the left hand side and that is our database navigator also it holds our different information on our projects which we’ll get to projects in a minute anyway this has all of our database information in it if we want to see it specifically underneath database navigator if for some reason this disappears like i accidentally close out of it you can go into the windows menu item and from there show the actual view of database navigator pop right back up so what’s inside of here well very much similar to what we saw in pg admin we can see all of our different databases in here we have our contazo 100k database we also have these folders on administer and system info these are ones that i’m using less i’m typically staying inside of here specifically inside this contrao 100k go into schemas under public because it’s the public schema we care about and i can go into actually viewing all our different tables if i drop something down like the sales table i can see if i wanted to go into all the different columns in it along with what is the data type of those columns i can also see a host of other information like foreign keys and whatnot anyway one thing that you may have noticed about this is that there are numbers over here on the left hand side these aren’t the number of rows but instead if you hover over it you can see that it tells you how much disk space that that specific table takes up so you can get a general idea of how big these tables are just on dispace alone so i can see that sales and also the customer table specifically are pretty big relative of course this is actually a pretty small database now what i like about tools like dbaver is how easy it is to dive into these tables without having to write a sql query specifically if i wanted to see what was in the sales table i can rightclick it and then just go to view table now this side is the database editor and it actually has a tab view it’s like i could do something like this open also the currency exchange and it has multiple different tabs that i can cycle through now with this i can view a bunch of different things underneath properties i can look at all the different columns foreign keys constraints whatnot next up is data i can obviously look at the different columns inside of here and scroll through it a lot easier similar to excel spreadsheet and then the other one is the er diagram or the erd and this shows how your tables are connected all together i actually feel compared to pg admin this one is more realistic and shows how they’re all connected whereas i don’t know if you remembered from pg admin but they all like connected into a single line and went all over it was a hot mess so db does a little bit better at this anyway the view that i’m typically looking at most is this data one right here and i can look at as a grid or also as a text text i don’t find very useful at all um except if i need to copy and paste it grid is mostly where i’m staying now this guey in here has a lot of different options that you can use to interact with these tables and view them specifically you could enter a sql expression to filter it down you could also actually put in custom filters in here to filter it down down at the bottom we can do things like add rows remove rows typically i do this with sql i’m not going to mess with it here in dbe can also cycle through the different pages and whatnot one that i do find useful however is this of export data and anytime you have any of your data that you have and you want to get it out of here you can put it into a variety of different sources typically i’m doing something either of exporting it to a csv or export it to sql which will make it into a sql insert statement all right so enough of that let’s actually get into setting up our project folder i don’t need these two tables open up it’s also asking me if i want to save these changes in the data set database i didn’t really change anything or i don’t want to change anything so i’m going to click no so we’re going to be creating a project folder i’m going to click projects right here in order to be able to save our sql files if we want to as we go along right now we just have this general right here which has bookmarks dashboards diagrams we can also see it right below here i don’t really care about the general one i want to now create a project specific to this course that we’re working on so i’m going to come up to the top right here and select create project and i’m going to call this intermediate sql project real original i know with this project i’m going to leave it to the default location which is inside of the dbe folder i could uncheck this and then re change that to wherever i want it to be just so you’re aware i’m going to leave it in the default location i don’t want to add the project to a working set so i’m going to go ahead and select finish so now i have this intermediate sql project and my general my intermate sql is my main project or my active project so i’m actually going to rightclick it and say set active project and then it should shift to bolded additionally if i go back up to windows in the file menu and go over to project explorer i can have that now appearing below if you didn’t close it out it was general probably it should have switched to that anyway i like this type of view because now i can switch between them but if you notice the database navigator we now don’t have our sql database in there anymore so what we can do is go back to projects and it actually makes it pretty easy in here right here under general i have the kazo but what if i tried to click the connections of the intermediate sql project there’s no database that it’s talking about so we loaded the database into general we want it to move it over down here and so bam now when i open this project folder which it’s the de uh the default one in this case now it’s inside of here in the database navigator as well so this is pretty neat of how i can keep this all grouped together in a single project now by default there’s four different folders in here they all should be empty of bookmarks dashboards diagrams and scripts bookmarks are just as they imply bookmarks if you have something that you frequently go to you can just put it there in this case let’s say i frequently go to the sales table i can stick it in bookmarks and now makes it super easy anytime i need to go to that just click on it and bam it appears right here next we have dashboards and nothing in this but you could create a new project dashboard if you remember from pg admin they had a actually a default dashboard that shows all the different sections transactions stuff like that i’m not a database engineer i don’t really care about all that so we’re not using that next are diagrams if i wanted to i could create a new erd i could call it contazo erd now this does have our five core tables in here but also it has a host of other different tables that just come natively inside of a postgress database whenever you install it so they’re all going to be there if you make it this way you can filter down we’re not going to go into that all right the last thing is scripts how the heck do we create a new sql script well you can do this from sql editor in the menu or just come up to the top here and select open sql script now there’s a few different options that popped up here since we’ve done this first it tells us what is the active database and it tells us what is the active database schema so this is especially important when you’re working with multiple databases to make sure that you’re running the queries on the correct database now notice whenever we open this up inside of here we also the script itself we also have a new script underneath here i’m going to go ahead and minimize these we don’t need this anyway we have script inside of here if i wanted to i can rightclick it and go to something like rest rename and then we can name it appropriately like this is just a test script and i’ll make sure that’s a sql file press okay and it’s since been renamed so let’s run our first sql query we’re just going to make a simple statement of we want to select all columns from the sales table and we want to limit this to 10 results if you notice this i was typing in all caps as i was going along and then it made it lowercase after we’re going to fix that in a little bit anyway if i want to run this single query on a mac i’m going to press command enter on windows i’m going to press control enter also if you forget you can just scroll over these icons and click it and it also gives you what the shortcut is now similar to what we saw before with how we can view different tables and outputs i’m going to have a tab here and then underneath this i can actually explore it in different ways with text or grid i can also cycle through it if i want to export the data so a lot of different options to manipulate this and dive into all of it let’s run a slightly more complex query to just demonstrate the the power of this i’m going to make this into a cte and then we’re going to run a query on that cte so i’ll enter this down put in a width keyword we’ll call this sales copy because that’s what it’s going to be give it the alias of as and then open parenthesis now i need to clean this up i like indentations and things like that i can actually highlight this all rightclick it and then go into format and i have this option for format sql on a mac the shortcut is control shift f so i can actually just do that instead of doing control shift f and it makes it slightly more readable although it didn’t end in anyway as always with any ct i’m going to go down here and do select and star and then we’re going to be doing this from the sales copy table above now if you notice this it automatically gave me this error message saying “hey sales copy is not located above.” and actually if i even tried to run this with this by pressing command enter um it tells me sales copy doesn’t exist but i can clearly see that it’s up here that’s because dbaver automatically treats any blank spaces or blank rows as a endline delimiter basically treats it as like a semicolon at the end so what we need to do is subtract that out of there and i’m still getting an error message i don’t know why i am but oh it’s running now and it cleared okay i just had to run it once anyway i don’t like those two things right now i don’t like how it’s automatically making everything lowercase and i don’t like that it automatically gives me this error message when there’s just a space in there so we need to change some settings before i want to proceed on if you’re on a mac you’re just going to select dbeaver up in the menu and select preferences on a windows i think you’re going to select file after you select settings this preferences window will open up and this allows us to go in and actually control things inside of the editor itself i want to control things in the sql editor specifically for the formatting right now the keyword case is set to default which is lowercase i want it to be upper so i’m going to change it to that you can also control your indent size down to like two i like bigger indents so i’m going to do four also if you notice previously it wasn’t indenting in those things that are in parenthesis i like that so whenever i click this it does indent it in so i’m going to have that selected as well last thing to uh update is under sql processing and moving this over in this we have blank line is statement delimiter always we remember sometimes we may have blank lines in there i don’t really like this setting so i’m going to change this to never you can also change it smart but i’m not guarantee it okay we’re going to do apply and close now i’m going to select all of this press control shift f and it formats it exactly like i like so basically it changed all those keywords to uppercase and indented it in like i like not too bad now that was just one query i could put a semicolon in here and then let’s say i wanted to do another query on top of this i can keep it in the same script some things i didn’t call it before but anytime we’re typing any words you’re going to have this autocomplete and it also tells you what is going on there similarly if i’m doing something like a function count in this case it tells me hey it’s a built-in function in the database i can use this and then after i insert something like from it also it automatically knows hey he probably wants to put in a table so i could put something like customer in here now i have multiple scripts inside of here so what i could do is if i just want to uh enter this script i’m going to go ahead and close this out down below if i just want to do that script right there i’m going to press command enter and it’s only going to run the one script tells me there’s 104,000 rows now the other thing i can do let’s say i want to run all of these scripts they have this icon right here for execute sql script for me it’s the shortcut of option x on windows i believe it’s alt x i can go ahead and well we’re going to close out of this first we’ll select here and press option x and it automatically opens up each of these in different tabs now one thing to note is like okay how do i keep track of what are all these different queries that i ran right here well you can put a brief comment up at the top using two dashes and we’ll call this one sales copy and then the one at the bottom we’ll call this customer count now running this again pressing option x it prompts me hey there are three unpinned results tabs do you want to close these tabs before executing the new query i want to do this all the time if i’m running a new query i just want to see the new results so i’m going to say “hey don’t ask me again.” and i’m going to say “yes i do want these closed.” and that didn’t work because i actually told you wrong we need to you need to actually specify that this is a title so you specify title colon and then whatever it needs to continue on after that for what the title is i’m going to do that for both of these press option x and now both of these are named both below this is very convenient when you have multiple queries and you’re going to have obviously multiple different tabs also there is this one of the statistics tab basically just tells you the statistics that it ran two queries how long it took and whatnot so now with this test script that we’ve created if i wanted to i can see that it’s not saved because it has an asterisk i can go into file and select save or press command s or control s and the asterisk went away and now i can close it and if ever wanted to go back to that certain script i can just pop it up here and run it as necessary bam so now hopefully you’re follow along and you went through and installed dbaver because you need to do that unless you plan on using uh jupyter notebooks or collab to run the future queries anyway we now have some practice problems for you to go through and get even more familiar with dbver of all these different settings and actually getting familiar with running sql queries with that we’ll be jumping into the next chapter on building views so that’ll see you there welcome to this chapter on views now views only takes up really a small portion of this the majority of this chapter is going to be an intro to the project using views now in this we’re going to be going through three lessons in this lesson specifically for this video it’s going to be an intro to views how to create views how to delete them how to manage them and why they’re so important in the second lesson we’re going to be using that view that we’re creating in this lesson in order to analyze it further and answer one of our second questions in our project now our project in total has three questions which i’m going to showcase here what we’re going to be doing in a little bit and you may be like luke what happened to question one well question one we actually answered or start to answer it earlier in the lesson we’re just going to be building on it further in some future lessons don’t worry i’ll be getting you up to speed okay and then the third lesson that we’re going to be getting to in this is actually installing vs code which is a code editor that makes it super easy for us to build up our portfolio project and then share it onto the internet now before we get into views i want to just showcase what we’re going to be building in this project specifically we’re going to be sharing this to your github profile and it’s going to detail everything that you’ve done now if you’re not familiar with github this is a location that you can store and also share or collaborate on files here in this uh menu area this shows all the different files in this repo as we can see we have some sql files and then a readme which i’ll discuss more in a little bit and then we have like something like this which is an images folder if i click on it i can see that they have something like image inside of it anyway getting to that readme the readme on the front page of a project is going to be displayed right below it so here i can go through and actually document all the different analysis that i’ve done so if i have some employer interested in different analysises that i’ve done they can come to my github and view all that here and now you may be like luke why the heck do i need to install vs code you already had me install dbaver what the heck am i doing with this well vs code if you’ve taken my basic course you know is really powerful not only in writing sql queries but also in other coding projects like using python or whatnot anyway what the special use case in this project is is actually building our readme here i’ve typed out all the different portions of the readme and if we actually view it here i can see it all dressed up on the right hand side how it’s going to appear on something like github unfortunately dbaever doesn’t have these capabilities along with the fact that i can also go through and push this and put this onto github right here from this guey so loads of benefit those that have come from a basic sql course you’ve used vs code you’re familiar with it there’s not going to be a lot of stuff new that i’m covering here you’ll probably be able to even skip this lesson on vs code so let’s get into views well first of all what the heck is a view it’s a virtual table that allows us to show the results of a stored query in it for example we’re going to be going through in our next example and creating a view you can find underneath the views folder underneath a public schema and we’re going to create this one called cohort analysis whenever i click on it it’s that virtual table so this is a has all the different results for a certain query specifically i can go here under properties and look under source and actually see what was the sql query taken to actually generate this virtual table and so with this virtual table in this case it’s called cohort analysis i can open up a script i go ahead and clear all this out and say i want to select all the rows from the name of the view cohort analysis is even appearing right here telling me that it’s a view when i run it i get all the results of it below now views are super important and are necessary to level up your sql skills they allow you to or basically prevent you from having to go through and write the same query over time and time again because you know what happens whenever you have to write the same query over time and time again you’re eventually going to make a mistake with a centralized view this prevents that and also ensures that if you have this dedicated view that any other queries that depend on that view will get updated if for some reason you have to update that view anyway i’m getting ahead of myself what’s the syntax for this all it is is we need to use the keywords create view give it an alias and then provide all of our different sql that we had below it to actually go into that view so let’s go in to create our first view i’m actually going to go ahead and delete this view that we’re going to create because you don’t have it yet for this just open up a blank script in this we’re going to do a simple query that allows us to get the daily revenue for this we’re going to use the order date and then also we’re going to use the sum of quantity times net price times exchange rate and give it that alias of net revenue this is from the sales table and we need to perform a group by since we did that aggregation all right let’s go ahead and run this bad boy and it looks like it’s done all correctly one thing to note i didn’t filter this or put this in any order one thing to note is you can actually do that in here i can just click one of these filters and say hey order by we’ll say in descending order and it shows me okay we start in april 2024 and go backwards with the total revenue pretty neat all right so this is the view or that we want to create so let’s create it using that syntax specifically i specify create view give it the name of daily revenue and then just use as don’t need to put this in parenthesis i’m going to go ahead and run this pressing command enter and you should get something like this at the bottom telling me that the query is in fact finished now if i go to views there’s nothing there what i need to do is i need to actually refresh it you could do this by right-clicking and clicking refresh or you see the shortcut right here of f5 i’m just going to do f5 so now in this case that daily revenue is there i can double click on it open it up it has a few different tabs underneath it like i said the properties underneath it so we can see something like the source which gives us that query that we needed to create the view so we don’t need to save our query separately it’s right there it also shows our data and then finally our erd in this case it doesn’t really connect to anything else just its own table bam that’s it now if i wanted to access this view all i have to do is just do select star and specify i want this from daily revenue and since i’ve put that semicolon in the last one it’s only going to run this one when i run command enter now all the results are appearing below okay let’s say i’m done with this view or i don’t need this view anymore there’s a couple ways i can get rid of it i can rightclick it and just come down here to delete it’s going to then prompt me are you sure you want to delete this view of daily revenue and it asks do i want to cascade delete basically if there’s other views based on those views it’s going to delete all of those as well so you need to decide whether that’s applicable or not and then click yes or no or not we’re not going to actually delete it via that method and i’m just going to confirm it’s still there by refreshing this and showing that it is in fact still there instead we what we can do once again put a semicolon i can do something like drop view as a keyword and then specify that view of daily revenue okay let’s go ahead and run this pressing command enter it tells me underneath it was completed pretty quickly and coming over here pressing f5 we can see that view is no longer there very important note is that deleting views is permanent you can’t recover it once you do that so make sure you really want to in fact delete that view now that we got the basics of views let’s actually get into creating the view needed to answer a few of the different questions we’re going to be answering in our project once again a reminder we’re only going to be answering three questions for our project and we’ll be working on that second problem in the next lesson anyway you haven’t created this yet but this is what we’re going to be getting to eventually and like i said we’re going to have our view in here our different sql files to answer our three questions and then our readme this create view is what we’re going to start working on in this lesson we’re not not going to necessarily finish it in this lesson we’re going to finish it in the text cleanup lesson but we’re going to get a little bit of a start so what the heck does this view actually provide us that we’re actually going to use so we’re going to be diving into shortly a more advanced cohort analysis than what we’ve done previously and we need a table a view if you will to help us out and speed up that analysis specifically this table is going to be basically broken down and aggregated to provide us key things about a customer specifically when were their orders how many order they had when was their first purchase date what cohort they fall into and then additionally some customer information from the customer table this is going to be super helpful especially for something like total net revenue which does that quantity times net price times exchange rate it’s just already there i don’t have to worry about the calculation everything’s there so let’s just start building this view and we’re going to be doing this by just checking out our query we’re going to start with the sales table only bringing in the information we need first i’ll do a select statement we’re going to do select starf right now and then we’re going to be coming from sales now we’re actually going to be doing multiple tables in this so i need to go ahead now and i’m just going to add this alias of s i can also press tab and it adds that and then running this we can start picking out things that i want out of this with this actual table below i know i want the customer key along with the order date as always i want that total net revenue so i’m going be doing quantity time net price times exchange rate and giving it the alias of total net revenue we’re going to do one more thing also to get a count of the number of orders and we’ll do this off of the order key now because we do that aggregation we got to do a group by and i’m lazy i’m just going to go ahead and copy this up layer and place it below all right let’s go ahead and run this looks like i got a typo over here look at this syntax highlighting helping out to figure that out running it now bam we got our results that we want below everything look like it’s aggregating correctly now with this table i also think want things like the first purchase date and the cohort year this is going to take window functions going to do remember i don’t want to put that in a group by so we’re going to need to create a cte and then do it anyway what i’m trying to get at is instead i’m going to move over here to the customer information and we’re going to extract some key customer information to put into our source table so we need that from our customer table because of that i’m going to do a left join which allows us to keep all that information from the sales table and thus attach any related things from our customer table attached to it i’m going to give the customer alias of c and we’re going to link this on the customer keys of both tables i’m going to just run this to make sure we have no issues okay it’s running just fine so what information do we want to add on i’m going to do a c do star so we can add all of it on we’re going to actually refine it down and it’s telling me i need the c.customer customer key in the group eye we won’t have to keep this but this will just help clear up this error that i’m getting from using that let’s try that again all right so now scrolling over we can see that we start to have the customer information in here so i want things like the country full the customer’s age the customer’s given name and then also their surname and that should be it and now we need to put this all in the group eye because remember we’re doing an aggregation right here so i’m going to come down here and actually put that underneath here clean this all up and then we don’t need this customer key anymore i don’t believe so i’m going to remove that now let’s try to run this query make sure it goes and everything looks good we have all those different columns in it okay now what we need to do is extract out for all these different customers here what is their cohort year or the year of their first purchase so what i’m going to do is put this all into a cte i’m going to indent this over and then also space it down so we can put that width and i’m going to give it the alias of customer revenue assign it as do an opening parenthesis and then finally a closing parenthesis then to make sure that this is all correct i’m just going to do a select star from our customer revenue running this we can see that okay it is providing the exact same information that we had for run a good path now we need to do window functions in order to basically use that order date to get what is the minimum order date for a customer in order to assign that cohort year so for this i want everything from our customer view table i’m actually going to give it the alias of cr and i’m going to do cr.star star and then i want to get that minimum order date so we’re going to do minimum specifying our order date there’s a window function so we’re going to do over and then we want to partition it by the customer key so like 180 here we want to look at that and see what is the minimum of this and then we’re going to give it an alias of first purchase date okay let’s just go ahead and run this to see how it’s doing and we can see so something like rows two and three we should see for i got to expand this out we should see the minimum order date is 2018 in this case also 2023 so the cohort year for this or the minimum purchase date should be 2018 which it is now we can go ahead and build another column for cohort year and all this is going to be is just a copy if you will but we’re going to be using extract and then with that we’re going to just be copying the contents above of that minimum order date pasting it in here and then giving it the alias of cohort year let’s go ahead and run this rushing too fast i realized okay i have to extract something from that window function right i have to extract year from the windows function so now let’s try to run this we can now see that for customer 180 and rows two and three it is in fact the cohort year of 2018 this is good so now let’s create this into a view that we can then reuse we go ahead and enter a line down here we need to use these keywords of create view we’re going to name this cohort analysis and then once again we’ll use that as we don’t need to put it all in parenthesis though okay let’s go ahead and run this pressing command enter and it ran and super fast the view is if it’s not appearing remember we need to run f5 and now it’s appearing underneath here shows all of our different columns in here on the dr diagram also on the data tab so now with this analysis it makes it super simple what i can do is just create a new well i need a new script so i’m going to say new script here let’s say i wanted to analyze something like the total revenue per cohort super simple now to do with this view i specify obviously that cohort year the sum of our total net revenue and then we want to do this from our actual view which is our cohort analysis we did an aggregation so we need to do a group by specifically on that cohort year running this we can see our different results i didn’t do an order by i’ll just actually use this and order in descending order and so now in a super simple query i can get that at a lightning speed because i don’t have to do all that other analysis that i did before in that view because it’s already captured before we wrapped up this lesson this is a future loop as you can tell i’m in a different flannel we made a little bit of a mistake in our view specifically with naming a column that with the number of orders i didn’t give it an alias what do i mean by this okay going into cohort analysis anytime i want to use it i’m going to press f5 to just make sure that it’s fully up to date and if we go into the data we can see that everything looks like it’s fine except here for this column this is the number of orders but we left it unfortunately as count and that’s not a descriptive name we really need to change it to a more descriptive name and so this problem actually comes up quite frequently so this is actually good use case to go through anyway as remember when under properties and underneath source we can see all the different code now unfortunately with this command right here create or replace view previously we just saw create view create or replace allows us to replace it if it already exists now unfortunately i can’t come through here and update count here as num orders because that’s what the alias i want it to be and then this countdown here as numbum orders and then if i wanted to run this if i clicked save down here it allows me to say hey do you want to execute this it says cannot change the view column count to num orders instead you should use something like alter view or rename column to change the name of the view instead now alter view is a great thing to know of and what it can do you can go through and add additional columns remove columns and in our case rename columns so we’re just going to use this syntax to rename it but that’s only going to be a partial solution we’ll see so we’ll use the keyword of alter view we’ll name the view itself of cohort analysis and then we’ll use rename column also they have the syntax highlighting saying that it’s wrong because the table reference expected don’t worry about this it’s actually a f a false warning it’s not correct and so what column do we want to rename we want to rename that count and what do we want to rename it to is numbum orders now going ahead and run this command enter looks like it ran fine when we come back over here we can see that has a star next to cohort analysis that means it updated so we need to press function f5 we need to select inside the database navigator sorry press function f5 to make sure that it updates and i would actually recommend just closing out of the old one because we’ve changed some properties in it if you didn’t do it already and so we don’t want to mess with this we want to see what the newest one looks like so it says hey do you want these changed to persistent database no i don’t want them to so clicking cohort analysis again to get the newest up to date we can see that okay it doesn’t it didn’t change count up here but it did change it down here to give it the alias as numbum orders and i’m a perfectionist and also this is just good practice in general i want to change it in both locations in order to do that we actually need to drop this view like dropping a table and then create this view again so what i’m going to do is just copy all of this code we don’t need this alter view script anymore i’m going to go ahead and paste that in here remember we want numbum orders right here and then down here since we’re actually using it we can actually just remove it now before we run this create or replace view we need to actually drop this view this one’s simply written drop view and then we list the view name of cohort analysis i’ll put a semicolon after this and then i just want to execute this entire script right here so i’m going to press option x and it said hey it ran those two queries and it got done with it once again i’m going to close out a cohort analysis just make sure we have this select inside of here press function f5 and open up cohort analysis looking inside of our source and we can see that it updated numbum orders in both locations so crisis averted with getting that column up to date and keeping our query concise all right now we have a few examples for you to go through and get more familiar with creating views we’ll be using some of the previous examples that we’ve done in previous lessons in order to build views with so you can reuse them in the next lesson we’re going to be building further on this view that we just built in order to answer that second question in our project to further analyze the cohorts all right with that i’ll see you there welcome to the second lesson and in this one we’re going to be diving into a question for a project specifically how do customers in a particular group generate revenue regarding the particular group we’ve broken it into groups before doing cohort analysis is what we’re going to continue on from this now spoiler alert for this analysis we’re going to be looking at the different cohort years and at the customer level seeing how they spend money specifically seeing how they spend over time if you will generally it’s good practice to have customers spend more because it means more money and so we would expect that over time a company would learn and be able to extract more value out of customers unfortunately we find out just the opposite so let’s quickly reexamine what we’ve previously done on cohort analysis i’m not going to walk through this entire query we did this inside of our window functions chapter and with the results of this query we were able to plot out and see how what is the impact of a cohort on future years total revenue so as expected net revenue is going up and there’s contributions to these net revenues every year from previous years specifically members of previous year’s cohorts because your cohort year is based on your first year purchase so honestly this didn’t really uncover a lot for us does it tell us really that much we went even further and also did an analysis looking at the number of total customers and from this we saw that it went up as well once again not a lot of insights from this so what do we need to do well using our previous view that we created in the last lesson we’re now going to take that a step further and we’re going to analyze for the total revenue and the total customers but then finally get what is each individual customer’s revenue well on average at least so let’s jump into building this query for this we’re going to be using that cohort analysis i can dig into it and see that it contains all the same values that we did in the last lesson but from this i want to get based on the cohort year what are the total number of customers using that customer key and then also that total net revenue for that cohort so for this i’m going to start a new script and i’m going to go ahead and fill in the from specifically from cohort analysis and i like to do this mainly because when i go to fill it in if i do something like cohort year which is what we want one i can see that this column does in fact exist and also it does the correct syntax highlighting as i go along now with this we want to get the total customers per year so we’re going to do a distinct count so i need to do count or distinct inside of count specifically on that customer key we’ll give this the alias total customers now we need the sum of the total revenue so inside of our sum function i’ll use that total net revenue and we’ll give it the alias of just total revenue all right we did an aggregation function so we need to do a group by on that cohort year okay let’s go ahead and run this and just see what we have so we get back those total customers and then the total revenue now let’s look at this visually because i think it’s important to understand why we’re actually taking this a step further diving down to that customer level to analyze this well here i’ve plotted it where the bars are the revenue so you see it on the left hand side and then the line is the count of the total customers which is over on the right hand side as expected you can see that these lines basically correlate well to an extent to the size of the bars themselves so simply put mo customers equals mo revenue which that’s nothing new here and that’s not really any new insights that you’d go to your boss and tell them we actually need to dive deeper into finding out some key characteristics about the customers to actually give them insights of what their spending habits may be like so let’s get this customer revenue all we’re going to do is take our total revenue up here and divide it by the total customers itself and then we’ll give this the alias of customer revenue i’ll go along and run this query and now we have the customer revenue on the code here just a shout out to that views table look how simple this query is now now that we have the data in that view that makes this super uh quick to actually do this anyway back to actually exploring this customer revenue revenue over time we can see that it’s basically dropping over time let’s look at it visually and with chad gbt plotting this i have it showing that over time these customers spend quite a bit i’ll be honest uh they’re at around $3,000 per customer but then it starts to go down this is a exponential trend line that i had chad put on there anyway this is concerning that customer revenue or the per cohort year their revenue is dropping year after year i would expect like i mentioned at the beginning that either remains the same or goes up over time that’s not necessarily a good thing for this now i will say this remember that older cohorts so we’ll say in this case cohort 2016 they have all these years to contribute to their cohort so you could be part of cohort 2016 and also buy something in 2024 and so i would expect in general that earlier cohorts would have a higher customer res revenue so we need to adjust our query to account for this but you may be like how the heck do we do this do we use some sort of window function and limit the time of each of the cohorts and what is that is it like one day or one year that they’re in their cohort that you allow it to attribute to the customer or to the revenue for that cohort well i actually did some further analysis on this you don’t need to actually run this query and what does it show well we’re not going to walk through step by step all the parts of the query because that’s not important the main thing is what it provides out of it and that’s that what i have it plotting or what i had it do is go out and calculate what contributes to the total revenue based on the days since this first purchase so in total about $127 million were spent on day zero i.e the day of the first purchase and then after that it dropped significantly to like 31,000 51,000 and whatnot anyway with this total revenue i went and took it a step further and put it into a percentage and we can see that it goes from 61% to less than a percent i also plotted it for more of those visual type and so what we can extract out of this is that in general or on average a customer spends for the total revenue that it spends spends about 60% of it on the first day and then minimal after that so what we’ll do is go back in and adjust our query to take account for this and for a cohort year we’ll only look at the revenue for that cohort year if the purchase was completed on their first day and we won’t take into account anything else because the majority of purchases are done on the first day so how can we do this well conveniently in that view that we created we have not only the order date but also the first purchase date so we can use those two dates and match up where they’re equal to each other to get only get those purchases so i’ll put a wear statement in and we’ll set the order date equal to the first purchase date and that’s really all we have to do for this now pressing command enter we have some updated results and it looks like our customer revenue dropped a little slightly and plotting it we can see yeah it drops down slightly below 3,000 although it was before around 3,000 anyway the main thing here is now whenever i have this exponential trend line which i thought that you thought like with removing those previous years that had more spending actually it’s more pronounced that the future years such as 2022 and 2023 spend even less so this is a pretty big breakthrough that we’ve come to basically uncover in this and could lead to especially at the trend that we’re going at right now this could have serious implications on the business and would be a great insight to bring up to our superiors or to our stakeholders so that’s the end of what we’re doing for answering this question on analyzing customer groups as i feel like we’ve found a pretty significant insight with that what we’re going to be doing in the practice problems is actually going in and doing an analysis of the revenue and the total customer count but looking at it over time on a monthly basis to get to find out why do we have certain years lower than others and to uncover other insights that i’ll detail more in the beginning of the next video after you get done with those practice problems we’ll be jumping into installing vs code which we’re going to be using to document the insights for our project specifically in the next lesson we’re going to be documenting what we learned from this question specifically with that see you there welcome to this last lesson as we’re going through this chapter on an intro into our project specifically for this we’re going to be going through and setting up vs code now as a refresher on why we’re doing this and not using something like dbver dbaver is great at actually going through writing sql queries analyzing them and improving them but when it comes to actually sharing it and collaborating with others using things like github or even documentation tools like markdowns it gets quite hard so for both my workflow and kelly we like to use this in cooperation with vs code and this code editor is going to allow us to do two major things for this the first is we’ll be able to build a readme or a markdown file that will document all of the different analysis that we’ve done whenever we want to go and share this and the second it makes it super easy to push this up to github and share it with others to see the work from our readme or markdown file now if you take my basic sql tutorial you’d probably have vs code so you can skip that portion of the lesson but we will be going on to how to actually build out that readme specifically for question two which we answered in the last lesson but before we jump into that we’re going to quickly go over the analysis you did in the practice problems that we uncovered even further insights in the last lesson we went through and evaluated how do different customer groups generate revenue specifically we broke it down by cohort year and we found out what is the average customer revenue per cohort year at this macro level we were able to see some call out specifically that there’s a general trend going down for c uh per each customer revenue which is not good and so as our practice problem assigned you went further into analyzing why do we have these dips was there something deeper going on in the data set besides just customers spending less so to catch people up that didn’t do the practice problem we went through and analyzed customer revenue and total amount of customers on a monthly basis we got this final table which has that total revenue total customers and then the customer revenue let’s start with the customer revenue first because we’ve just been talking about that as we saw in that last lesson we saw that it’s slowly going down over time so analyzing at the monthly basis not really helping out that much now if we look at something like the total revenue and the total customers and we plot this we get something like this where the blue bars are the monthly revenue and the line chart here is the total number of customers so looking at general trends overall if we actually plot a line of best fit we would say or we would think that our revenue is going up over time or our net revenue is with the exception of a pretty big dip down in 2020 probably due to some sort of pandemic that happened during that time period and then it rose after that and then it was slightly less down in 2023 anyway the major insights that i think we are applicable to us from that last analysis is if you actually look at it how we said previously you know more customers equals more revenue it does match up but then when we get to 2022 and 2023 we can actually see that there’s pretty large gaps in between here there’s a lot of customers but the revenue is not matching which helps us explain even further what’s going on in this graph basically yeah we’re getting higher number of customers but customers are spending less anyway pretty interesting insight on this let’s get into installing vs code if you navigate over to the link on the screen you’ll get directed to the download page for visual studio code more recently microsoft has been advertising this with github copilot so it has this hey it’s redefined with ai and they’re really pushing that we’re not going to go too much into ai features we’re just going to be downloading this code editor you should download some sort of file click it get it launching and in the case of mac it unzips this file and it’s automatically the visual studio code app which i can just take and drag and put into my application folder so it’s in a much safer more secure location if you’re on a windows machine it’s going to walk you through an installer so quite a bit more steps but it’ll actually direct you on where you could actually put this vs code and if you want an icon anyway regardless you get a system message asking if you want to install this app that’s installed from the internet yeah you’re fine with it open it upon launch you’ll get this welcome message that will actually guide you through a step-by-step process that if you want to do you can do but we’ll be covering all the key features you need to know for this so don’t feel like you have to do this so let’s briefly explore visual studio code before we actually get into installing or setting up our projects folder that we’re working with over here on the lefth hand side is our activity bar and whenever we press our activity bar a sidebar slides in or out depending on if we want it there this first one’s an explorer we don’t have a folder open yet i’m just going to open a dummy folder you don’t have to need to do this and anytime you’re opening any of these it asks if you want to trust the authors i’m opening from my own computer i trust myself i think so at least anyway this basically shows a file breakdown of what’s inside of this folder and then folders themselves have these carrots that you can drop down or open if you want to see inside of it if i actually want to see these file locations you can just rightclick it and then on mac it’s reveal and finder on windows it’s going to be reveal and file explorer anyway you can see that the structure of this is the same as what we’re seeing over here in that file explorer all right other things in the activity bar we have a search functionality so if i want to search sql all the different occurrences will pop up here and i can go to it and it will take me right to it next is on source control which controls how we’re going to get this onto github we’ll be covering more of this and interactions with github near the last chapter or in the last chapter so don’t worry about this too much oh and i guess what i forgot to mention previously whenever this popped up over in this right hand side so if i’m to close this sidebar right here this is our code editor itself so if we actually open back up sorry and open that second query i’m going to close this now we can see i’m going to actually expand this by pressing command plus or control plus on a windows we can see that we have that sql query right inside there and if i needed to add anything like i did want to use an alias right here i could just type it in and then now we’re noticing that up here in the top there’s this white dot appearing that means it’s the ch we have changes this is not saved you can just save it by pressing command s or control s now what we’re going to be doing later on is actually going into the readme and building this readme out and what i really like about vs code is as you can see we have all this fancy dancy markdown language typed into here and if i wanted to see what it actually looks like with the readme selected i could select this right here and it allows me to preview the readme with all the different images and whatnot right next to it as i’m scrolling through so one of the main benefits why we’re using vs code all right i’m going to go ahead and close all this out also going to zoom back out so we can see everything um okay last two things they have a debug run and debug section we’re not going to be using that and then finally extensions it’s really popular if you’re using this for a particular programming language like python or whatnot to have the appropriate extension installed so in this case you install python to use python we’re recommen so overall there’s not a lot of extensions or really any that i think you need to install for this if you do want to install one just to see what it’s like i recommend this one on code spell checker when i click it it opens right up next door and if i want to install it i just click install it asks if i trust this publisher yes i do and now i have this code checker inside of here so if i actually went back to that sql file we had previously it will now go through and flag some of these keywords that don’t have an underscore and it calls it an unknown word you can actually go through and try to do a quick fix with it but those are the column values that came with the database so we’re not going to change them at all mainly i find it useful for if say i need to create a new alias and i’m going through it and i wanted to sum something like total customers and i assign the alias if i were to assign an alias with misspelling in it like this when butchering customers it’s actually going to call it out and so i know that i misspelled it there anyway that is extensions no i don’t want to save any of this the other two things in the activity bar to be aware of are your account right here and then any settings specifically what i find myself gravitating towards using a lot is this command pallet which has the shortcut of command shiftp or control shiftp that’s the one shortcut for vs code i would highly recommend having memorized when i do this this search bar comes up at the top and then i can search any type of settings i want to change in vs code so say i wanted to change the maybe the color theme of this i would type in something like color oh i can see that i have preferences color themes and then it allows me to go through a menu and select a host of different options in here the last thing to note is the status bar down at the bottom we won’t be using it too much like in our case right now i’m zoomed in and i could go back and reset it you’ll also have information down like the lefthand corner if we’re using git and then if there’s any issues going along with it let’s now get into setting our projects folder up that we’re going to be eventually pushing to github we’re going to be setting it up building out our readme and also adding all those sql files or the last sql file from the last lesson for this we want to open a folder that has our project in it but we need to create a folder if you will now back in dbe if you remember we created a project folder already and it has bookmarks dashboard diagram and even all of our scripts in it i’m not about reinventing the wheel i think we should just use this project right here as our project folder itself so we can do this a couple different ways i want to find the location of this so i’m going to rightclick it and i’m going to say hey show resources in explore now this is the projects folder i’m actually going to back back out just one location so that we can see okay so this is the folder itself of this intermediate sql project along with those folders underneath it i want the file path location to this specifically i want to go to this folder location when we go to open this in vs code so i’m going to open up a folder in vs code on macs unfortunately the folder location that this is within is hidden so i’m going to hit a shortcut of command shift period and i know in my uh home folder of luke baruse that it’s in the library folder and then from there i can navigate to the debaver specific folder going into my workspace i then see the project itself and then i can open it it’s going to ask if you trust the authors of this file in this folder i do i’m also going to just enable this to trust the files and all folders within here so now we have inside of our explorer right here we have a few different folders if you will i’ll be honest we’re not going to use any of these at all actually one thing to call out is you may visually only see the bookmarks diagram and scripts but then we also have these other they’re called dot files and once again if i press command shift period on this we can see these dot files they’re actually just hidden files i’m going to maintain them hidden by pressing command shift dot anyway key thing here is those folders and files aren’t important along with we can be selective on what we actually put into github so we will need to be because we don’t really want to put these up there anyway anyway let’s make our first file we come up here to the top and we select new file i’m going to give this the name of two i like to do two to basically designate hey this is the second question and call this cohort analysis.sql and as you notice as soon as i name that sql file i got this new icon right there that shows me it’s a sql file when i press enter it automatically opens in the text editor to the other hand side what i can do is now copy that query that we did previously then inside of vs code paste it all in i have this white dot saying that it’s not saved so i can press command s or control s and it’s now saved in there so i’m going to go ahead and just close this all out one thing to note inside of dber underneath that projects folder itself we’re not seeing any of the different files pop up like we just created that sql file that’s because we haven’t refreshed it if i actually rightclick select refresh the query now will appear inside of this project folder if you’re clicking refresh and it’s not refreshing or showing that i actually had to just restart db to get this to work so yeah just word of warning anyway this sql file is now here and so i’m actually going to close out this script and this one so if i wanted to i don’t necessarily have to go back into vs code if i want to edit it i could edit it from right here say in this case i call this ca and then i save this pressing command s whenever i come back in here and actually check this sql file i can see that the alias got added i don’t want it i’m going to add command s so let’s get into building our readme file as a reminder that’s going to be basically the front page of our project detailing all the different analysis we did breaking down each of the three questions that we’ve gone through or will go through now key things to note for github we want this file to be called readme.md and that’s because github will specifically pick up on this naming convention and then display this below here so we’ll get into creating a file i’ll call it readme and all caps locks then this icon changes to that readme icon and for the file it’s a markdown file so i’m going to give it m then go ahead and press enter and it’s open up right next to it first thing i’m going to do is just start by giving it a title remember we can do different headings depending on how many hashtags we have i’m going to give it this one of intermediate sql sales analysis but what the heck does this actually look like we can click this icon right here for it to appear right on that right hand side so as we go through and type different things we can see how it is actually formatted as we go through this now what sections are we going to be putting into this well really it’s up to you you don’t have to follow all or even any of the things that i’m going to put into here but i’m going to recommend these major sections first we’re going to have a short little overview then from there we’ll get into our three business questions just giving the short description and then from there getting into the analysis approach breaking down each one of those we’re going to be doing question two on the cohort analysis and i’m going to walk you through that shortly now below these three questions in the analysis approach i only included one example right here we’re going to have our ending which has things like our strategic recommendations what we got out of this and any technical details of what we actually used to build this so let’s start going through and filling this in we’re going to start with business questions here i’m going to put uh one two and then three for the second question we did a cohort analysis and with it we were asking how do different customer groups generate revenue now i’m not really liking how this is formatted so i’m going to use some extra markdown in here putting double asterisks before and after cover analysis and then it like bolds it makes it stand out more so now let’s go into filling in the analysis approach that second question i’m going to go ahead and just copy this one right here paste it below and we’re going to start going to for this i’m going to title this section cohort analysis next we need to put in an analysis approach that we actually used here so i put some short bullet points in here of how we track revenue and customer counts per cohort what is a cohort is that we’re grouping it by year of first purchase and we analyze customer retention at the cohort level the next thing i like to include is the query itself now you can go ahead instead of doing a link we’re going to go over link shortly you could put in a code block so i’m going to just do three back ticks in this case you can find it up here top of your keyboard anyway i could just copy this query right here and then put it into our readme and it’s displayed right here i could also format it as sql by putting sql after those ticks and then it’s getting colorcoded like this oh this is all smushed i’ll be honest i’m a fan of dry or do not repeat yourself we already have this code somewhere so i’m actually not going to put that right here instead what i want to do is put a link to this sql file and we can do this by putting square brackets and in square brackets is what the text is going to be i’m just going to name it the name of that file and then in parentheses is the actual file location on a mac i’m going to press backsplash i think windows you can press forward slash and then all the different things that i have access to are going to appear right here i’m going to select that first one of the sql file and then yeah now over here on the right hand side i can see whenever i click it oh the file itself actually pops up so i know the link is working properly and these links are also going to work on github when we get there all right next section i have are on visualizations and that’s if you’ve generated any images you don’t have to do this per se but in my case i really like doing this so what i’m going to do is i’m going to come over here and i’m going to create a new folder and call it images i like to organize all my images in one location so i’m going to take that image it’s on my desktop i’m going to drag it over into the folder itself it’s right here conveniently it’s just named image i’m actually going to change that by right-clicking it and selecting rename and call it this of two_cohort_analysis okay now going back into the readme itself the image name is just the alt text you can put with it mainly we need to be more uh pertinent about what the actual image name is once again i’m going to sl backslash and then from there i want to go into the images folder and i want to select two cohort analysis oh it’s popping up right next to it no i’m good after this we’re going to dive into key findings and i’m going to summarize this calling the main points that revenue per customer shows an alarming decreasing trend over time i call out specifically that 2022 and further years are just declining over time although net revenue is increasing is likely due to a larger customer base which we found out when we did deeper analysis and this finally brings us into the final section of what are the business insights and so for this i have the following that the value extracted from customers is decreasing over time and needs further investigation we need to find out what is the root cause of this in 2023 we also saw a drop in the number of customers and so we also saw a drop in revenue because of these two facts alone the company is facing a potential or actually what we saw in 2023 is seeing a revenue decline so overall this is a good step in the right direction on what we need to recommend on where we need to go all right it’s your turn to now go through and build out that readme document and hopefully you’ve been following along with installing vs code and whatnot we do have a few practice problems for you to go through and get more familiar with vs code if you want that practice along with we’re going to have that template for you available in order to build out this question number two all right in the next chapter we’re going to be getting into data cleaning my favorite part of data analysis so i’ll see you there welcome to this chapter on data cleaning and in this we have three lessons we’re going to be covering for this in the first two we’re going to be covering some core concepts you need to know about data cleaning specifically this lesson we’ll be going over conditional expressions for handling nulls things like coales and null if in the next lesson we’re going to be going over strings because from time to time you’re going to be dealing with strings and you’ll need to clean them up and maybe put them together or even separate them at the end of that lesson we’ll be applying all the concepts we’ve learned in order to further refine our view on cohort analysis finally in the third lesson of this chapter we’re going to be getting into answering question one from our project which focuses on customer segmentation now our project consists of three questions and in the previous chapter we focused on that second question on cohort analysis in this one we’re going to be using customer segmentation in order to find out who are our most valuable customers and we’re not only going to be using that cleaned up view of cohort analysis to help answer this but also some functions we learned earlier on statistics so we’re focused on two functions for this lesson and you may be like luke how the heck did you pick that out well if we go into the postgress documentation underneath the sql language we can see that underneath the functions and operators there’s a host of different ones that we’ve covered if you’ve covered along since the basics course we’ve covered and touched on a lot of these and we actually have covered on conditional expressions navigate under this we can see that for postgress there’s four main types and the main one is case which we covered back in basics but there’s two more that we need to cover around coales and null if now postgress has this one on greatest and least these functions have different capabilities depending on which database you’re working in also kelly and i don’t really use this this much so we’re not covering greatest and least anyway let’s get into how we can actually use coass and nullif in a very simple example you don’t have to follow along with this i’m just doing this for demo purposes so the easy way to demonstrate this is with a fake table i’m creating here i’m calling this a data jobs table it has three columns in it technically four i guess if you count the id and what does it contain well let’s just run this query to actually see we get this table and in it we have things like a job title a column on whether is it a real job and then a final column on salary notice inside of here that there’s some null values in here we’re going to be using coass and nullif in order to clean these values up depending on what we want so let’s say for this column on is real job we wanted to fill in null values specifically let’s just assume that the database administrator assumed that all null values were no but we needed to make it no well we can use the coales function and in this it returns the first non-null value from a list of expressions right now we’re just going to use one expression we’ll move on to two after this but we can provide a default value in this case of no and ultimately in our case this is going to be used to replace a null value with a default value so here is a query that returns back our original table let’s modify this to fill in null with no so i call that coales function for expression one i leave it as the column of is real job and then for the default value or the last one we’re just going to put in no let’s go ahead and run this bad boy oops forgot to put a comma run it again okay we can see now we have this column called coales after the function and it’s filled in yes no kind of a better practice would be to actually assign this an alias when done so that way we can actually see it and bam we have the updated column title now what’s going on here with that coales function where we have this second expression well let’s say we wanted to fill in this null value for salary but we didn’t want to use a default value we wanted to just fill it in with if it’s null maybe just put in something like the job title specifically depending on where it matches up you would fill it in for the appropriate row that it comes from let me demonstrate it okay so we’re going to use that coales function again we’re going to leave salary in there and then for the second column we’re going to specify job title i’m going to leave the default value blank for right now finally i’ll give it an alias of salary pressing command enter now whenever i run this i’m going to make this a little bit bigger says error colas types integer and character varying cannot be matched the problem is salary is an integer and job title is a string so anytime you’re using this to have a column replace other they have to be the same data type in this case we’d have to cast salary as a text or varcar in order for it to match that same one that is job titles now when i go ahead and run this it actually works below and we have in fact filled in the appropriate column from that job title into salary you could put a default value in here i’ll just name it default value but in our case when running it not going to come up so let’s reset this back so we can get into null if now with our original table back say we had a scenario where we knew certain values weren’t correct or we didn’t want them in there and we wanted to make them into a null and like in this column of is real job kind of isn’t really an answer maybe we want to now make this into something like null now with null if this returns null if two expressions are equal otherwise returns the first expression and this one’s even more simple in that it can have either expression one or expression two where they can be either columns or single values let’s jump into it so let’s say we wanted to replace this kind of with null we call our null if function is real job would be expression one and then expression two would be that kind of as usual i’m going to give this the alias of is real job okay let’s go ahead and run this okay we in fact replace that kind of with null now you don’t also have to just do a default value i could do like i said an expression so i could do another column so in this case i could do salary once again i got an error message and it revolves around having a mismatch between the data types i can just fix this by casting salary as a text running command enter and bam anyway the point null if right none of these comparing these as it goes through none of these match so it doesn’t convert any of the values to null this value was always null anyway let’s jump into some real bro practice problems now previously whenever we’ve been doing any of our analysis all of our customer keys have conveniently always had some sort of purchase associated with it what we’re going to demonstrate is that all the customers in that customer table don’t necessarily have an associated purchase with it and whenever we merge them together they can actually have non values or none values or null values now if we were to run an average to find out what is the average net revenue per customer whenever we just have these nine values they’re not going to be counted but say we do want to count them because hey they are customers and we want them to be zero instead that’s going to affect the average overall and we’ll actually get to demonstrating how much it’s going to change the average revenue per customer quite a bit now let’s get into combining our customer keys with net revenue to show those customers that don’t have any purchases previously we’ve gone through and in our sales table gone through and got the customer key um got the net revenue by multiplying quantity times net price times exchange rate then obviously we’re doing a aggregation and so we need to group by customer key now running this we have net revenue for all these different values in here if i try to filter to find any net revenues that are null if i go to run this we’ll see that down below there’s no values in there so we hadn’t been seeing this previously but what we can do is with all of these revenues that we have right here we could merge this onto our customers table and then this will expose customers that don’t have a net revenue so what i’m going to do is convert this into a cte use a width statement calls this sales data and then assign it in parenthesis as always i like to make sure that this works so i’m just going to select star from sales data and go ahead and run this yep working below now what we want to do is i’m actually going to go into kazo erd is take that customer table that we have here and merge onto it that sales table so we’re going to make the customer table our a table and then sales table our b table so what we want to do is using our customer table use a left join to join on our sales table so we’ll move the sales data down we’ll say that this is going to be the left join and we’ll give it the alias of s for the sales data and then for the from we’re going to be doing from customer with the alias of c let’s go ahead and just run this to see if it’s working and i got this error message saying syntax error at end of input basically i didn’t say where or on what we’re going to actually merge on specifically we’re going to be merging on the customer key of both of these different tables okay let’s try to run this now all right we have the customer keys and all the information from the customers table we don’t need all this information per se we just want to make sure we have all the customer keys along with all the different net revenues and as you can see there are now no values in here because there’s customers that don’t have net revenue so let’s modify what we’re actually bringing in here we’re bringing in from the customer table the customer key and then from that sales data uh cte above we’re bringing in net revenue running this boom simplified version of actually being able to view this so first let’s fill in these null values with a zero just to demonstrate it in a new column we’re going to call that coales function running that on net revenue and we want to place those nulls with a zero running command enter bam we got this over here so not bad now what we want to do is to show the difference between these we’re going to run an average on only net revenue and a average on the net revenue with zeros filled in so basically all customers i’m going to remove that customer key so that way we don’t have to do a we want to do an average on all of that so i’ll call the average function on that first column and an average on that other second column for zero filled in for the null values running this we can see that the averages are quite different right so the first one is around 4,000 and the second one is less than 2,000 now these names for columns aren’t that descriptive so i’m going to name the first one as spending customers average net revenue because they’ve spent money so that’s the only the customers that we use for this and the next one is all customers average net revenue now running it more descriptive titles for this and viewing it visually we can see that when we look at all customers the average net revenue is actually less now this was mainly done for demonstration purposes cuz there may be situations where you do want to consider all customers in our case we are going to just consider only the spending customers in our analysis and not necessarily all customers so that coales we’re not going to do a real world example of null if because it’s going to be frankly very similar except opposite if you will but what i have are practice problems for you to go through now and get familiar with both of these options in the next lesson we’re going to be jumping into understanding how and all the different functions for formatting strings so with that see you there and in this lesson we’re going to be going over further on data cleanup specifically around strings how to format them we’re going to be covering four key functions that i find myself using from time to time and then from there going into modifying our view that we created on cohort analysis specifically we have columns on a first name and last name we’re going to combine it into one let’s get into it now in the last section we were looking at function operators specifically going down here we were looking at conditional expressions in this one we’re going to be going back up into this section on string functions and operators now inside of here there’s a host of different functions and operators that we can use on strings and the first one we’re going to be jumping into is this one here on lower how to convert something to lowercase and with any of these functions they’re going to take string values so in that case i’m going to do that lower function and i’m just going to put a string in in there and we’ll just put my name in all uppercase we’ll go ahead and run this we can see it outputs it below in all lowercase if we have lower we probably also have something like upper running this we can see that it’s all upper this would usually in the case that you have some lowercase values in there and it would raise it all up if there were some lowercase values in there the next is the trim function we’re focus on the one up here this one down here is a non-standard syntax so we’re not going to use it and from this it relieves the longest string containing only characters and by default it’s a space let’s actually just look at this real quick to understand what’s going on so in the case of our example if we’re using this trim right now whenever i run this command enter there’s no really change in this whatsoever now let’s say that there was a space at the beginning and we’ll do a space at the end running command enter you can see that there’s no spaces when if we were to just run it without this trim function i’m going go ahead and actually just remove this running command enter we can see that it does in fact enter spaces in there even when i check it so using this function is very important especially whenever you’re working with databases with very dirty data and you need to remove any different spaces now let’s say that we had some symbols in there like we had dirty data and we had some symbols surrounding this that we wanted to actually remove in this case i have two amperands around each when i run this command enter we can see we have this but we want to remove that from here well going back to this definition of it we can specify whether we want to trim based on the leading trailing or both being the front or the back of a string and by default it does both is well both is the default then we can specify the character text which is what we want to remove and then from the string text so inside of here i could do something like both i want to remove that amperand sign and i remove it from this let’s go ahead and run this and it removes the both those amperands on the front and the back also i notice that my e is missing in here now going ahead and run it boom now that’s what we have so what we’re going to be cleaning up with this view well if we open it back up go into cohort analysis we can see underneath that data tab we specifically want to focus on this that given name and surname which will be like last name and first name anyway we want to combine these into just one column we don’t need them to be separated for our analysis now because of this we’re going to have to actually update our view for what we have currently now with this we’re going to be basically removing two columns and then adding a new column so because of this we can’t necessarily just run create or replace view since we’re altering columns we need to run like alter view but that’s even going to get complicated i’m going to recommend we just start over with this query and drop this view so i’m going to go ahead and go ahead and copy this command c and then in here i’m going to go ahead and paste it now remember we want to combine our given name and surname i’m actually going to just run this query to show what’s going on here press command enter i have this as in front of here that doesn’t need to be in front of here so i’ll move that up top run it again okay we have all of it as we saw before the given name and surname so we want to combine these two going back to the documentation on string functions operators scroll on down until we get to other string function operators they have on here the concat function this concatenates the text representations of all the arguments null arguments are ignored so for given name and surname what we can use is this with the concat function specifically i’ll type out concat open and closing parenthesis around here so we have the given name and surname and we’ll name this as the clean name okay let’s go ahead and run this and now we can see the names are now combined okay not too bad now one thing to note is well we need spaces in here and sometimes i find especially with text columns there may be extra spaces in here so one let’s just add that space i’m talking about i’ll do a i’ll do a single quote space single quote and then comma run command enter and now we can see that that there’s a space in between here but like i said sometimes the names may have spaces around them so just as good measure i’m going to put trim around both the given name and around the surname okay running this pressing command enter we now have all the values cleaned up in here and we did some protection so now we need to actually update this cohort analysis where if we went to look at it remember if we just tried to run this right now so create a replace view as the cohort analysis pressing command enter i’m not going to get it because of the column issues that we addressed before so we need to actually just drop this database or drop this view first so we’re going to call that drop view on cohort analysis and then run everything underneath it i’m going to run this all by pressing option x and looks like two queries are done as always i’m going to just close out of this to make sure i have the most upto-date one click inside of here press f5 to refresh and open up cohort analysis scrolling all over we can see we have now that clean named inside of here so our view is good to go now for the project all right you got some practice problems now go through and get more familiar with these text formatting functions in the next lesson we’re going to be jumping to another question for a project on customer segmentation looking forward to it see you there welcome to this third and final lesson in this chapter on data cleaning and for this we’re going to be focusing on question one for a project specifically this is going to build further on analysis we actually did earlier with segmenting customers and some discussions on customer segmentation specifically we’re trying to find out who are our most valuable customers for this we’re going to be breaking up our customer into tiers using percentiles into highv value midvalue and lowv valueue customers now this is a very typical business process that you would find yourself doing in order to target certain customers and then distribute marketing that fits their need and so shout out to kelly for coming up with this example because i feel it’s a really good demonstration of what you find yourself be doing as a data analyst as always i like to start with what is the final data set we’ll be getting for this and so we’ll be calculating or actually finding out based on customer key and that clean name that we did in this chapter to determine what is their total ltv lifetime value or their if you will net revenue and then based on these values we’re going to use percentiles to categorize customers in either to low value midvalue or high value we’ll also take this calculation a step further and also dive into analyzing not only just having those names that way marketing can target these customers but also actually understand these values such what are the percentages of these different segments how much they’re contributing and whatnot so let’s start a new sql script documenting this analysis similar before we had a our own script for sql here i’ll go ahead and we’ll start a new sql script then i’m going to go up here and then rename this we’ll name this to one_c customer segmentation.sql okay this should be good all right now one thing to note this is going to be inside of our scripts folder but we don’t necessarily want it here if i rightclick this and then go into show resources in explore underneath scripts i can see it’s right here and i actually want it higher up so i’m going to move it out if you’re on windows you do something similar with your file explorer and then down here it’s not actually showing up i can press function f5 and that sql script disappeared but now i’m actually seeing this one and also readme is popping up now i guess i didn’t uh refresh as well anyway i can now open this back up this is what the sql file is we want to work with so first three columns of interest that we want to get into here so we want that customer key we also want that cleaned name and then from there we want the revenue for each of these customers or that total lifetime value so we use a sum of total net revenue and we’ll assign this as total ltv just as a reminder going back to that cohort analysis we could have multiple entries in here like 180 did multiple purchases on different days and we had that in a total net revenue so that’s why we’re renaming this new column total ltv because now with 180 we’ll have the total lifetime value we want this from our view of that cohort analysis and we’re doing a um aggregation so we need to do a group by using the customer key and then also clean name so let’s go ahead and run this and see what we have i have no active connection so if there was a reminder if you’re already connected reconnect to your database and i just need to reselect it up here now everything’s looking good let’s try to run this again all right bam this is what we want clean name customer key and then the total ltv for each of these i can even i can even do the order by descending or actually i want to do order by ascending and see that hey okay 180 is now combined into one looking good now that we have this total ltv we can now bucket these customers into high value low value and what’s the other one midvalue now we’re going to be doing this on percentiles using the 25th percentile and 75th percentile so because we’re using or running a percentile on this aggregation right here i’m going to put this into a cte we’ll call this customer ltv and we’ll put this in parenthesis from there on this ct we’re going to run that percentile continuous function remember we’re doing the 25th and the 75th percentile basically everything between the 25th and 75th percentile is our midle so i’ll do 0.25 and then we’ll do the nomenclature of within group because we want to group it by a certain thing and we need to order the ltv or the total ld ltv column to make sure we’re pulling out the correct um ltv as we go through and sort this and then we’ll give this the alias of just ltv 25th percentile let’s go ahead and just make sure this is right so i’m going to do a from the customer ltv pressing command enter boom okay 25th percentiles at 843 i did this already i know that this is actually correct so let’s now get the 75th percentile i’m going to go ahead and just copy this all because it’s just going to take some changing to do to it and then putting it inside of here changing that two in multiple locations to a seven and then running this bad boy boom so just so you understand what’s going on here those that spend around $843 are at the 24 25th percentile and so if you spend less than this you’re less than the 25th percentile whereas the 75th percentile those are around spending around $5,500 if you spend more than this you’re greater than the 75th percentile these will be our high value customers and those less than 25 will be our low value now what we can do is let’s just go back and i’m going to run just this query up here now what we can do because we have those percentiles we can use this basically customer ltv cte and we’re going to convert this percentiles one into a cte as well we can categorize or bucket them using a case when statement of whether they’re high value or low value or mid-value so we’re going to be making this into a ct i’m going to tab this over put a comma and we’re going to name this customer segments and then as once again opening and closing parenthesis so that way this is in a in a cte from this we’re going to select and i want all the customers from customer ltv so i’m going to do a um we’re going to give it the alias of c so i’m going do c.star and i’m going to go ahead and just put this down here of from customer ltv and like i said that’s going to be of the alias c and it’ll clear that little syntax error let’s actually just make sure that everything’s appearing right down here i have a comma after this so i need to remove that it’s appearing down here now we need to go through and now make that case when to basically bucket all of these into their different tiers gonna move this down some so it stops uh so it stops cutting it off okay we’re going to do a case and then we’ll do when and the first one i want to categorize is everything less than 25% as low value so when total ltv is less than this ltv 25 percentile which i’m realizing now we haven’t imported in now we don’t necessarily need to do a join with this what i can do is a comma because we’re not join it to the data i can just list it of customer segments and give it the alias of cs so now that’s available we can say cs.ltv 25th percentile then we want to assign it the low value we’re going to do one tac low value we stick a 1 2 3 at the front just to make the easier if we want to ever sort it um if you just do low value only and then try to like sort by the name alphabetically it like throws a fit mainly meaning you can’t sort it like that anyway let’s go into the next one and i’m just going to copy this because a lot of this can be repetitive and for this one we want to get everything that’s midvalue so everything that is less than or equal to the 75th percentile so that mid-range is going to cap encapsulate everything that’s equal to the 25th percentile up to the 75th percentile and equal to it and then i’ll change this to two mid value now since we got everything underneath the 75th percentile we can now categorize everything else as high value all right let’s go ahead i’m going to remove this space here let’s go ahead and run this and fingers crossed i see my issue i have a syntax error at or near when and that’s because i don’t have a comma after here also i never gave this alias or this case statement a name specifically we want to name this customer segment all right let’s go ahead and run this bad boy okay it’s working now let’s make sure that these are categorizing correctly if i remember previously i’m actually going to just select this one time and press command enter oh and i realize now it’s uh not going to let me do it like this unfortunately i’m going to just copy this go into a new script because i want to actually show this then paste this bad boy in and run this and i’m silly i need the other ct anyway you don’t have to do this main purpose of this is just to see what these numbers are remember it was 843 for the 25th percentile and 5,500 for the 75th percentile now running this complete one are these numbers making sense yeah because 5500 should be the high value it’s greater than that for this second entry and then looking at these these fall in between yeah it looks overall let’s look for a low value okay we got a low value down here of $23 falling in okay data looks like it’s calculating correctly all right so this table right here would be great to now export and send to our business colleagues to basically find those people that we want to target i don’t know why this window is so wide like this um but on mac let me know if you’re having these same problems it shouldn’t be this big anyway i’m not going to do this right now but this would be great to send to business colleagues and for them to actually now send targeted campaigns to these individual items depending on what our strategy is and how they are segmented but let’s dive a little bit further first to analyze how much do these different customer segments how much um how much are they contributing what is their customer revenue and also not only what is individual and average customer spending but also what is their total revenue so what i’ll do is i’ll now convert this into a ct as well cts on cte on cte i’ll give this the name of segment values and then as and then once again open and closing parenthesis with this we’re going to just go first and get the customer segments and mainly that customer segment column i mean and we’re just going to get this first from that new cte of segment values okay let’s make sure that this is just working correctly and we can see it’s doing this all down here so the first thing i’m going to do is get a sum of all the different revenues so we’ll run sum on total ltv and we’ll name this conveniently total ltv because we’re doing a summation we also need to do a group by on that customer segment all right we’ll go ahead and run this all right not too bad so this is telling us the total amount and our high value is at 135 million whereas our low value customers have only contributed 4 million if we plotted this on a pie chart to see how much they actually contributed part i only recommend pie charts if it’s three or less values we can see that low value is oh my gosh we got to do something here and target these lowv value customers better so this is great the analysis that we found out of this midvalue is around 33% which you would expect it to be about a third and then the high value is almost 2/3 so that’s pretty i mean really high for high value so based on just this little data piece alone this is evidence enough that we need to do some different marketing strategies especially with our lowv value customers now because we put those numbers in the front of this customer segment if we wanted to we could also do an order by and we could do that on customer segment and we could put it in descending order running this we’re now getting those high value up top mid value low so that’s why we put those numbers in the front for those columns so we can sort it more easily anyway let’s calculate a couple other things mainly i want to find out what are the number of customers in each one of these segments so that way we can then go through and find out what is the basically average ltv or lifetime value for a customer in a certain segment so getting the count first we’re going to be using the customer key for this we’ll assign this as customer count running this we got the counts and as expected one and three should be equal because there’s 25% in this 25% of that and then this one right here should equal the basically double of these um because it’s the 50th uh it’s 50 percentile or 50% in between so now that we have this total ltv and this customer account we can divide these to get the average customer value so i’m just going to take this and then divide it by this value right here and this will be our average ltv bam all right now i’m going to close out of this right here so this is pretty interesting our highv value customers are submitting or submitting on average around $11,000 whereas our low value are only around $350 pretty substantive probably why those low values only contributed around 2 or 3% of that total revenue now there’s a host of different marketing strategies you could go about doing this you can feel free to pause the screen and look at into each one of these of what you could go for doing this isn’t necessarily a master class on marketing strategies it’s on data analytics we’re not going to spend too much time on this but i did want you to understand what are some different capabilities we can do with this powerful data now there’s no practice problems for this lesson but i do expect you to go through and update your project readme specifically i added a little description of what we’re doing for all the different segments i added a link to our sql script along with that visualization that i showed you earlier breaking down those things because that was the key insight we got from this i broke down the statistics talking about the highv value midvalue and low value segments how much they contribute of each and what their contribution of revenue is and the big disparity there and then i wrapped it up with business insights what could we potentially do to target highv value midvalue and low value customers all right it’s your turn now to go through and update all that we’re jumping into the next chapter in query optimization where we’ll not only go query optimization but also answering our third and final project question see you there welcome to this chapter on query optimization for this we have three lessons we’re going to be going to the first two are going to be focused on understanding how to use the explained keyword along with some query optimization basics the second one’s going to jump into more intermediate and advanced ones and then finally in the third lesson we’re going to wrap it all up with our final problem for our project now in this video in the next one we’re going to be going over query optimization techniques and i have a list here of beginner intermediate and advanced techniques you should be familiar already with beginner ones but we will do a refresh during this video and then in the next one and the next lesson we’ll be jumping into that intermediate and advanced and this will all be done while using explain and analyze to break each one of these down now for the third lesson it’s conveniently on the third question or the last question in our project specifically we’re going to be doing retention analysis analyzing who hasn’t purchased recently we’re going to get a visualization similar to this and wherever we break it down by the different cohorts years and see how many customers are active and how many are turnurned or didn’t purchase something recently this is a super common business concept to understand and coincidentally kelly was just telling me that she actually was implementing it in her job today so let’s get into breaking this down we’re going to be using these keywords of explain and explain analyze for each of these they just go at the beginning of a sql command whether what you’re using but what’s the difference between these two well explain demonstrates the execution plan without actually executing it whereas explain analyze basically means like it’s going to analyze it and it actually does execute it so we understand what the execution times are so say we have this simple query of select star from sales i could use explain at the beginning of this running command and enter and it’s going to tell me the query plan i’ll break this down in a second now we also could use something like explain analyze right and remember this one is one row when i run explain analyze we have two more rows mainly this one has the execution time it has well not only the planning time but also execution time so you may be like luke when the heck would i use explain and when would i use explain analyze like why would i want this if this doesn’t even tell me the execution time well let’s say you’re working with an extremely large database like millions or even billions of rows it could be extremely cumbersome to run this query and cost a lot of money not only time but also money so there may be cases that we would only want to use explain but since this database we’re working with is so small we’re going to always be running explain analyze with all the queries we do that way we can also see the execution time so let’s break this down on what this is actually providing so this first row here says that it does a sec scan which means it does a sequential scan basically going row by row by row and it specifies that it’s doing this on the sales then from there we have three variables inside of parentheses here cost is just an arbitrary unit and that’s just assigned by postgress if you want to be real about it it’s just made up but quantitywise it remains consistent the one thing to remember is that these numbers just because say you have a cost of 500 and then another one you have a cost of a,000 the query is not going to necessarily take double the time it’s just going to take longer anyway with this cost you can see it has the syntax of this starting value then this dot dot dot and then the next value after this this is the start cost and then this is the final cost so ultimately this query cost in this case that is shown in demo 18.5 next are the rows and so that’s the estimated number of rows and then finally the width is basically still that row size but what it is in bytes going back to our original query we can see that we have a cost a final cost of 4500 there’s almost 200,000 rows and the width is 68 bytes ultimately this took this query took 30 milliseconds to run and the planning time or what it was going to do whenever it was planning on how to execute this query took less than a millisecond now all the times with dealing with all of this are going to be in milliseconds so you may be like luke what does this even matter like we’re talking about 30 like half a second well once again this is going to come into play whenever you’re dealing with databases that are millions and billions of rows those milliseconds aren’t going to be that anymore and i’ve had queries run as long as an hour so query optimization is a must for you to understand now expand this out we can actually see there’s some other things as well to cover here specifically going along with that execution time we have the actual time the rows and then loops rows remain the same in this case loops is a more complicated topic and really relies on if we’re basically performance and sort of recurrent loops especially whenever we do joins we’re not going to worry about that too much but what i do want to focus on is this actual time right here and this tells us that hey we started this query at 017 milliseconds and it ended at 14.8 milliseconds and then the time after this of the execution was the time it took to display it and do all that now it’s important to understand that with explain whenever i run this one it does not have that extra parameter over there as well i didn’t have it uh pulled out at the moment but we still have that same thing of cost rows and width okay so let’s build on this further by calculating the net revenue per customer and seeing how this query plan changes for this we’ll say we need that customer key and then like usual we’re going to be doing a sum and this will be summing up the quantity times net price times the exchange rate and we’ll give it the alias of net revenue okay let’s go ahead and run this bad boy and i got ahead of myself we got to do a group by anytime we do an aggregation so i’ll be specifying that customer key and then running this okay now with this what’s going on here is there’s actually two steps here each of the steps are denoted by this basically this arrow right here so this right here is a step and then this up here these three rows are now a step the first step would sort of counterintuitive but that first step is that sequential scan on sales it’s the most indented one in the fact that we’re doing the scan on the sales table getting all the rows we needed of 199,000 and we can see that that took 9 milliseconds to do then the next step goes into here performing the hash aggregate basically it does a hashing system in order to perform the aggregation we’re not going to go into hashing right now the important thing is understand that we are doing a sum function which is an aggregation it tells us it takes from 54 to 56 milliseconds so about 2 milliseconds to do this and this is done on only 49,000 rows and that’s because that group by or that customer key condenses it down the amount of rows under this it has other other information like group key and then also how much memory was used ultimately this query ended up taking slightly longer than we did previously now we’re up to 57 milliseconds total i’m going to add just one more thing to this and let’s add a filter to it and we’ll say we want orders only from 2024 so we’ll say that it’s greater than or equal to 2024 january 1st okay let’s go ahead and run this and we got an error for this why do we get an error right here well it’s because i have it out of order and that’s because even in the execution aspect we’re going to filter our sales table by that in 2024 and then perform our group by and aggregation and we can actually prove this by our execution plan in the fact that in our first step right here our sequential st scan it actually goes through and filters by those dates so that are in uh 2024 then after this which after we filter it for 2024 we can see that we have down to 10,000 rows then it sends it into our aggregate to do the group y and our sum and this takes 27 to 28 so this is like well even less than a millisecond and ultimately because now we do this wear clause we have a much shorter execution time so not only are we learning about how to read query optimization we’re understanding why do we have the order of these keywords such as where and group eye anyway it’s important to note that we’ve been using explain and explain analyze but over here on dbver they have an explain execution plan now if i try to run this with this explain analyze up here and click this and click okay it’s going to give me an error because we already have explain in there so it’s important that you select what you want to use and then use the explain execution plan or the shortcut of command shift e now this is going to pop up and asking what you want to do if you don’t want to do the explain analyze you want to leave that unclked i usually just maintain all of these clicked including the analyze and then from here click okay this i find is slightly less descriptive but it is more ordered in the information that it provides it’s still pro uh still in that same order of sequential scan is the first step and then the aggregate is next starting at the bottom going up but as far as the times and every and the cost they’re actually more in a readable format than that execution plan anyway is available if you want to use that way we’re going to compl continue to use explain analyze throughout the reigning of this video and the next video cuz i find that post useful last thing note is if you encounter any keywords you don’t know the best thing to do is just go ahead and copy this all and go into your favorite chatbot just paste it in and have it go through and actually explain what’s going on here step by step by step so in the remainder of this video we’re going to be going over some beginner optimization techniques that we’ve touched on briefly throughout this course but actually using explain analyze to prove why you should be doing these in the next lesson we’re going to go into more intermediate and also briefly cover some advanced techniques in order to further level up your optimization skills for this one on basics we’re going to be covering three ones and we’re going to be going over examples on the first two on why we use select star why we use limit and then for the third one we’re just going to just briefly discuss of using where instead of having we’ll start with the easiest one to actually prove it’s efficient and that’s using limit if i just run select star from sales on the entire sales table running command enter it’s taking around 18 seconds and what what you will notice for this i’m going to run this a few times the time actually jumps around so on average it looks like it’s around i don’t know around 24 or so anyway i have this in a not not optimized sql query i’m going to come over here to the optimized query so we can compare them before and after and we’ll put a limit statement we’ll just say we want 10 and this bad boy we get it in 03 milliseconds run this a few times yeah it’s maintained pretty consistent around 03 compared to our previous of almost 21 so there’s your actual proof that those limit statements are very helpful in minimizing the amount of data and saving you time next is on select star you’ve heard me time and time again saying “hey i don’t recommend using select star to select all the columns of the table.” so let’s try to optimize this and you know what i’m just going to list one column and that’s it customer key whenever i run this we can see actually i need to run a few times we can see that it’s over 30 seconds whereas the not optimized one whenever i’m running this one it’s less than 30 seconds what the heck’s going on here well in some cases like this one postgress makes it super efficient for them to retrieve data using this select star nomenclature and so yeah it is more efficient in some cases to use select star but i’m still sticking with it in the fact that i do recommend especially when you get into bigger databases millions and billions of row i would still stick with only listing one or however many columns you need for your analysis and not using select star now the last one to look at is using where instead of having and unfortunately you may not always have control over just easily switching from where instead of having so let’s say we have this query here where we’re actually getting the customer keys along with all their net revenue for it now if we wanted to filter this data based on the net revenue and maybe get some that are higher than or less than a certain value we would use a having clause in this case and we say hey we want that having greater than 100 or greater than a thousand and if we remember from order of operations that we first do that sequential scan so we get all 199,000 rows then when we’re doing the aggregate we then are doing the filter in this step with that 199,000 rows now we’re moving back to that original query to demonstrate this now with a wear clause unfortunately the having benefit is using it in aggregation but we may have a case where okay we can alter this instead to have it to where we want to get customer keys that are we’ll say less than 100 so not using that aggregation but more posts on a customer key anyway with this one the main point is that the filtering is done in that sequential scan out in the beginning and as we can see from this because it limits with how many rows are done our execution time is a lot shorter now you may be like luke you’re using where with a customer key of less than 100 and you’re using having the aggregation that we did of greater than a thousand yes i know these aren’t necessarily comparable but there may be a situation where yeah you know you want these net revenues a certain value and thus you want the maybe the order keys of a certain thing and you could then filter instead by the certain order key values or specifically not order keys sorry customer keys mainly what i’m getting at is if you have a choice that you could potentially modify to do where instead you need to take advantage of it all right you now have some practice problems to go through and get more familiar with using explain explain analyze explain feature inside of dbver along with testing out some of those basic techniques that we just went over the next lesson we’re going to be going over some intermediate and also advanced techniques along with a real world price problem all right with that see you there welcome to this lesson on optimization techniques we’re going to be starting by jumping back up where we picked off last and jumping into intermediate intermediate techniques we also briefly cover advanced techniques but overall they’re going to be outside the scope of this course and you’ll see why then at the end of this we’re going to be going into optimizing our query that we built in the last chapter on data cleaning basically applying all these techniques we’ve learned into how we can make a query run faster so what are we going to be covering for these intermediate techniques well we have four ones to cover but the first one we’ve really been covering in the last lesson and in this lesson of using query execution plans basically using that analyze basically using things like explain explain analyze or even dbver’s built-in options now we’re going to be going over three other scenarios besides this on minimizing group buy reducing joins when possible and optimizing order buys so let’s say we have this query here where we’re going through and getting things like the customer key order date order key line number and then also getting an aggregation of the net revenue which i need to add an alias of net revenue okay let’s go ahead and run this just to see what’s going on anyway we can see from this that well it’s two main steps of doing a sequential sand and then actually aggregating with our group five but our execution time even running this a few times is pretty high up there i mean sometimes getting as much as 100 milliseconds once again we’re dealing with milliseconds but if you have databases that have millions and billions of rows this can easily turn from milliseconds to seconds soon as those queries get longer than something like one or two seconds they get annoying af anyway with this query itself i’m just going to select it and run it we may not necessarily need to find out every individual line number and so if we were to go ahead and remove something like n line number in our case i’m going to go ahead and move it over to this optimize area and then take it out here along with taking it out here and then running this oops got a little bit of typo try again okay we can see that now we have a much lower execution time here at 67 seconds but consistently less than 100 that was just by removing one group buy so an important concept to understand is it really necessary to do all those group buys it can get costly over time and in this case we’re almost saving half the time if you will just by removing one group by next concept to get into is minimizing the number and also types of joins if you will on when we’re doing a query in this case let’s say we’re pulling in multiple tables into our sales table we’re also pulling customer product and date table i’ll go ahead and just run just the query itself and we can see we have yeah a lot of information in here now running this full query to actually run the explain analyze we can see that this query takes over 100 seconds running this query a few times we can see it runs around h around 80 milliseconds anyway pretty intensive anyway if we go back to that original query in this case we can see that we’re pulling in the year from the date uh table what happens if we remove this interjoin and just added in a way to extract out the year out of the order date so i’m going to go ahead and just copy this all paste this into the optimized i went ahead and already moved the inner join down here and then we’re going to just write an extract function using year from s.order date and we’ll give this the alias of year okay and we can clearly see whenever we run this this is providing us the exact same information of that year running the query completely now i need to move it up some running this one we can see well it’s looked like it’s maintained around 70 so the other one i already forgot what it was running around it’s running around 80 or so 80 or 90 so this one is i mean almost we’re getting a 10% gain or increase in performance just by removing an inner join and instead doing a function instead now it’s also important to note that establish the number of joins but also the type of joints and we’ll get into that with the practice problem coming up in a little bit last major concept to cover is optimizing your order buys sometimes order buys are not something you can just like negate but if you can they will save you some times specifically here well let’s just go ahead and print out this query pressing command enter i’m getting the customer key order date order key and net revenue and in our order by we can see we order by our net revenue first followed by customer key then the order date and then the order key rarely do i find ordering by all columns is really necessary anyway there’s a few different ways we can optimize our order by the first one the easiest is just limiting the number of columns in order by the second is avoid sorting on computer columns or function calls the third probably the most intuitive is place the most selective columns first in the order by basically if it’s doing something that’s going to filter out other rows you’d want to use this one first and then finally use index columns for sorting to leverage existing database indexes unfortunately we don’t have control over indexes indexing usually a database administrator does but if you did you’d want to use it in that case anyway let’s go with one of the recommendations of removing function calls so we’re going to go ahead we have explain analyze up at the top i’m going to run it and i’m seeing it around 90 milliseconds or so now i’m going to take this exact query put it over here and we’re going to remove that well the net revenue the aggregation up here on the sum let’s go ahead and run this and this one i’m seeing a lot less usually around the 70 to 80 millisecond range so we just cut off about 10 to 20% doing that alone now maybe we find out even more that we can remove the order by let’s say we want to just remove all the way to the customer key itself and then run it in this case i’m not finding that much of a difference even seeing it get as high as 90 so i’m not finding that this one is as great as the aggregation but overall we can see that it does have an appreciable impact compared to our notoptimized query so let’s get into optimizing our view that we’ve previously worked with and that was with cohort analysis we can go into it under databases right underneath our data set underneath schema public and then our views ourselves that cohort analysis now we can actually look at the query itself going under source and we have it all here i’m actually going to go ahead and just copy this bad boy and put it into this script here on not optimized i don’t actually create or replace any view so i’m going to actually remove this and instead put up at the top of explain analyze so i’m not liking how this is formatted so i’m going to highlight it all go to format and go to format sql and it’s going to break everything out more of like how i like it now let’s go ahead and run this explain analyze on here to see what we’re working with for our current execution time um and i have a typo because i didn’t get rid of this as in the front of here going to go ahead and run again okay boom we have this all and wow there’s a pretty hefty query plan and we can see this is some of our highest execution times that we’ve seen so far around 200 milliseconds to get this bad boy done it looks like it has a total of 1 2 3 4 five six different steps and that looks about right with the ctes and all the different group eyes we have in here now recalling back to what we just previously covered of how we can improve a query just looking at this we can see that the cte well it has a a bunch of group eyes also has a join so that may be able to be optimized and then we look down here to this bottom one this bottom or the main query itself and there’s not a lot of different techniques that i feel i can put into this as this is just doing a simple select and then from so primarily we’re going to be focusing on inside of the ct of customer revenue and the first one we’re going to focus on is the join now previously we discussed about minimizing joins but actually which is also just important is understanding when you should be using what type of join now a lot of our course and the previous course used either left joins or inner joins with left joins they’re specifically used in the case of for table a you want to keep all values in it and if there were maybe some null values that matched up with from the b table for the a table you would want those null values to fill in for a so we didn’t remove any of the a well if we knew what we were matching on had all contents from both the a table and the b table thus there are no nulls inner join is slightly more efficient to use because we no longer have to do this null check before actually joining so what i’m going to do is i’m going to copy this query and put it over into the optimize section and i’m going to change this here from left to inner the first thing i want to show though is the actual query itself the output pressing command enter what i can do down here is i can i want to see the total row count so i click this okay that’s around 83,099 if i go back to our one with a left joint i hope that whenever we run this query i’m going to select it all and then press command enter we can see for this one the row count is 83,999 so the same so they’re still doing the same thing where this one has a left join and this one has an inner join so now the question is is the explain analyze whenever we run this on the nonoptimize and on the optimize is it going to be quicker well for the not optimize or with that left join it’s around 200 milliseconds and that of optimize is well around the same thing around 200 milliseconds so although it didn’t work in this case to further optimize our query and just works out to basically break even there are cases where using an inner join instead of left join when appropriate can save you potential time in your query execution all right so we talked about joins the other last thing that we can do in order to optimize this query is has to do with this group by look at this we have a bunch of group eyes and when we actually look at it based on what we’re aggregating by we’re having a lot of repeating values and country full age given name and surname what do we mean by this well let’s actually look at the query itself for customer say 180 yes the order date is going to change yes the total net revenue number of orders but as far as things like their country full the age their clean name their first purchase date or even the cohort year that’s not going to change so why are we doing a group by on things that aren’t going to necessarily change we really care about just grouping it by the customer key and then also the order date so what we can do for this is like we said we want to minimize those group by but this query is not going to work if we go to run it right now it’s not going to work anymore what we can do is we can do an aggregation function with this we just need the max value from each of these or you could do min whatever not it’s very popular just to do max and i’m doing this for age the given name and then the surname with this it’s also important that you give it back its alias so i’ll be giving it for country full age given name and also surname so first let’s go ahead and just run this query and make sure that it’s working properly scrolling on over here we can see that everything’s remained the same for the country full the age and then also the given name or that that final clean name that we got to i think i called out first purchase year and cohort year when i first talked about this they didn’t have anything to do with the group by they’re done lower below anyway let’s go ahead now and run this query see how long it takes and with this one we can actually see that we’ve now got the execution time down to 160 milliseconds where previously it was around that 200 milliseconds so now that we have this optimized query that’s taking less time we can go ahead and update our view and we’ll be using that create or replace view cohort analysis as so we’re not changing any columns with this so technically this should work without doing a drop i’ll go ahead and run this it’s now telling me that it’s changing data type so in fact we do need to delete it or drop the view first of cohort analysis we’re going to now just go ahead and execute the entire sql script said it ran both the queries come over here press f5 to make sure that we have it refreshed open up cohort analysis and i can see that we have those max values and we’ve minimized that group by along with changing our join and it looks like it actually saved it to a simpler join now now that we were using that inner join to just join alone which that’s the default so makes sense now the last last thing to cover are these advanced optimization techniques we’re not going to be walking through any of these but they are ones that you should be aware of specifically they have these three major ones of using proper data types so basically not referencing data uh integers or numbers over something like a string using indexing in order to speed up your queries so basically relying on certain columns with indexes built in to sort them more quickly and then for large tables you can have partitioning built into them to per uh improve their performance now all three of these are controlled by database administrators specifically they control the data types they control whether columns have indexes and they control how data is partitioned so we’re not going to go into this because it’s going to be really specific on whether your database administrator has done this now if you ever run into a situation where you’re finding queries are running excessively long and you plug into chat gbt and you can’t find any results except for this around indexing or partitioning or data types this is when you’re going to have to go to your database administrator and ask them to make changes to your table that way you can get more efficient queries hopefully you have a good database administrator i have in the past and been able to go directly to them and get what i needed out of it and saved me a lot of time in the long run all right you have some practice problems to now go through and get more familiar with those intermediate techniques and optimizing your query along with using explain and analyze again all right with that i’ll see you in the next one where we’re getting into our third and final problem in this project see you there welcome to this last lesson in the chapter and for this we’re going to be tackling our third and final question looking into performing retention analysis specifically we’re going to be looking at who hasn’t purchased recently we’re going to use terms such as active and turned customers we’re going to look at it totally overall and then from there actually break it down into the different cohort years to see how it’s actually trending over the years as we have these different cohorts so we’re trying to identify which customers haven’t purchased recently and the technical business terms for this would be we’re trying to act identify active verse churned customers for us active will be those that have made purchases within the last 6 months whereas churned are those that haven’t made a purchase in over 6 months now 6 months isn’t necessarily something you’re always going to use as the hard and fast to delineate between active insurance it’s really going to depend on your industry and maybe even other factors as a general rule of thumb i have these four different areas and contazo falls into an e-commerce and typically we’d see them use a 6 to 12 month period since last purchase for this whereas something like a mobile app is going to be much more quicker with turnover so they’re going to have a 7 to 30 day since last session to identify active verse churned customers and now you may be wondering why the heck this even matters well we can send off this data that we end up calculating on finding out if a customer is active or churned and we can do specific targeting marketing campaigns in order to get them to re-engage also when we look at this holistically towards the end to get these like percentages per cohorts and stuff we can understand maybe the effectiveness of previous campaigns that we’ve used in maintaining activeness and preventing churn overall this deals with tracking our customer retention and also engagement which is necessary because we know we have customers that have bought from us before and they’re likely to do it again so we need to take use of that so what are we going to be working towards well we want to basically build this table here which is going to have information like our customer key and our clean name along with calculating things like when was their last purchase date and ultimately was this in the last 6 months and then classifying this as either churned or active to make this easier we’re going to be using the view that we’ve been using of cohort analysis because this has all the information that we need from it in order to extract out this information now like our last two problems i want to be working in a script that we’re going to be saving as our final script to upload into our project we put on to github right now we have our um one and also question two what i’m going to go ahead is go to vs code and then inside of here i want to create a third file and i’m going to name this retention analysis remember this is a sql file so i do dossql and then click enter we’re not going to edit it inside of here i just want to actually create it and then now going back inside of dbever and clicking inside of here i’m going to run f5 to refresh it and we now have our sql file right here that we’re gonna be working in now what are we gonna be quering well let’s go back to that cohort analysis and open it up things we definitely want are the customer key we have this clean name that we saw additionally we’re going to be using that order date and we can also use the first purchase date which will be used for some filtering that i’ll explain later so let’s start defining all this we’ll start with a select statement we’ll define that customer key the clean name the order date and that first purchase date and we want this all from cohort analysis okay let’s go ahead and run this looks like i don’t have any active connections so we’ll update it real quick to select the right data source and run all right so now we want to get well we’ll target specifically this customer 180 right here we have these two purchases we already have a column for the first purchase date but really we want to know when was their last purchase made in order to understand if they bought it within that sixmonth period so we need a way to go through and basically identify in a numerical way what is the most recent purchase and we can do this using row number and partitioning so right after order date i’ll enter inside of here and we’ll do row number we want to do a partition so we use the keyword over then we put inside parenthesis the partition by and specifically we want to partition it by that customer key and then we don’t want the we don’t want it assigning just numbers willy-nilly we want to actually specify it depending on the order date so we’ll do an order by specifying order date and we’ll name this as row number which you’ll typically see this written as rn okay let’s go ahead and run this and not bad now looking at customer 180 again we can see that we have the most recent purchase is actually number two so it goes one two we actually want it opposite of this so like their most recent purchase is numbered number one so i can change this order by to descending run this again pressing command enter and now we have it in that manner and it does order it we can also double check on some other ones here at 387 everything’s looking good so not looking bad we’re almost to what we need out of this mainly i don’t need any more duplicate duplicate entries i just want to get the most recent purchase out of here and that can be done by basically filtering for row number equal to one so what i’m going to do is put this all into a cte and then pull out what i actually need so i’m going to tab that over and we’ll give this the name of customer last purchase put it all within opening and closing parenthesis and then from here do a select statement we want that customer key clean name order date and we’re getting this from the customer last purchase remember we want to filter this right where that row number is equal to one go ahead and run this have a little typo in here put something i didn’t need to go ahead and rerun it all right looking good now order date technically now this order date is i mean it is an order date but it’s actually now the last purchase date so i’m actually going to rename this with an alias of last purchase date looking good we now need to get into classifying each one of these customers as whether they’re active or churned now i need to show you something real quick so i’m actually going to do underneath this another query we’re just going to look real quick at the order date column and that’s from the sales table i’m going to go ahead and just run just this query then i’m going to filter this in descending order and so what we can see is actually the most recent date this data ends on the 20th of april on 2024 now i can actually just query this by doing a max of the order date run controll enter and it’s still that 420 so why am i telling you this well as of filming this it is march of 2025 so we’re almost a year ahead anyway if we went back from my time now six months none of the customer there’s not any data in the system within that six-month period and so we’d have like a 0% act well we would have a 0% active rate so the point is we have to do this from our last data point of 6 months past this 420 or 6 months before 420 so what we’re going to do is this we want to now actually let’s go back and run up this query we want to now categorize these using a case statement looking back seeing if they’re within 6 months of april 20th 2024 and if that case classified as active otherwise classified as churned so in our main query down at the bottom i’m going to go ahead and put a case in and then we’ll do a when then we want to use this column we know it’s called order date when order date is less than 2024 of april 20th and specifically minus 6 months so within 6 months of that period so we’re going to do an interval specifying 6 months in this case we want to classify it then as churned so better said going back or doing this actual calculation right here if there are any purchases before october 20th 2023 we’re going to classify them as churned else we’re going to mark them as active and then we’re going to end this case statement also i want to give it an alias of customer status okay let’s go ahead and check this out i’m going to remove this extra line right here and now run command enter okay we got this error message right here invalid input syntax for type interval specifically right now for this we’re trying to do a comparison for order date basically 10 20 of 2023 this sees this as a string we need to actually cast this as a date using this operator go ahead and run command enter and now it is working now we have this extra column we can go through and actually double check it we can see here november 2023 is active december 2023 active and looks like it’s matching up pretty well now there’s one other thing or one minor detail that we need to actually filter correctly for to make sure we’re getting the right calculation i’m going to go ahead and filter this data to show what i mean all right this is it anyway these are the last purchase dates and actually this isn’t actually showing what i want to show we want to show first purchase date i’m going to show it right next to this so i’m going to call it first purchase date right here remember we have it up in the cte up above so i can just call it right here and whenever i run this it’s now here let me actually now filter by this okay this is what i’m trying to show their first purchase date was or these customers however shown was firstly on uh april 20th and there is active so if we keep scrolling back all of these for a number of period are active and so they haven’t been a customer really for six months yet so they’ve never even qualified to become churned i would argue that this would cause a bias to increase our numbers especially in 2024 for making it look like all of our customers are actually active actually and that’s what will happen in 2024 if i scroll all the way through here and we get to the beginning of the year all of these customers in 2024 will remain active when i run a percentage on this it’s going to say 100% active in 2024 which is completely useless we need to actually go back and remove everybody until 10:20 because that’s when it’s keeps on being active but then here this is when we actually start getting churn customers cuz we actually get customers that have been part of the system for greater than 6 months so all i’m going to do is we’re going to modify this query here at an a end and we want this for first purchase date basically all of this to categorize that a period of october 20th 2023 i’m going paste it right here now let’s go ahead and run this and i’m once again going to filter by that first purchase date and we don’t have a first purchase date now until after october or before october 20th when we actually start having churned and active customers so we’re going to have a lot better description or actually key statistics that actually match up with the data now we have this cleaned up we actually don’t need this first purchase date anymore that’s not something that marketing may necessarily need we now have that final table that i was getting at that we needed to get with customer key clean name when was their last purchase date and whether they are active or churned one minor note on this query i am not a fan of hard- coding values into a query because say we got a like a data dump and we get more recent data files this number may change in the system instead we could use something like a subquery so i’m going to put this all within to parentheses and then i’m going to just take this command exit and i’m going to place it right here and then also right here okay and then now whenever we run this we have exactly the same results that we had previously and this is much cleaner so way if those dates ever do change or we get new data into the system it will automatically update so we have the data or the table we need for marketing but now i want to take it a step further and actually perform an analysis on this and one we’re going to look at just holistically overall what is the active and churned rate for everybody and then we’ll finally break it down by cohort year to see how the cohorts are trending over time so let’s just get a percentage for the active and then the churned i’m going to make this now into another cte i love me some cte and we’ll call this one churned customers i’ll put that into open and closing parenthesis then for this we’re just going to start simple first i just want the customer status and i want to count of these on active verse churned for each so i’m just going to do a simple count and we’ll do this of the customer keys itself and we’ll name this as numbum customers we’re gain that from that churn customer cte and we did an aggregation so we also need to do a group by on customer status okay let’s go ahead and run this all right so not bad looking like we have around 4400 active and 4,200 churned i prefer percentages so we’re going to move forward to calculating that one thing real quick we don’t need to necessarily run distinct on this if i ran command enter still going to be the same values and that’s because in here we ended up filtering down to only where row number equals one so there should be technically only one customer key so i’m not going to i think that’s just a little unnecessary we’re not going to include it so now we need to get basically another column if you will of total customers i want to basically add these together should be around like 46,800 but if you see here we’re doing group by well we can actually use window functions to expand bigger because window functions are done after the aggregation so what we can do is we can use this once again this count here to get the counts of these keys but then we want to do a sum of this using a window function and we’re going to do this over but we’re not going to partition by any anything because we want to do all of this and we’re going to name this as total customers okay let’s go ahead and run this all right and we have i was little off on the math that’s actually 46,913 now we need to have both of these right if i just did sum and tried to run this we get an error along with if i just ran count we’d also get an error we have to do the count of them and then we want to do a sum of this to do all of this in order for it to work so now what we can do is we can divide these two values to get our percentage so i’ll take this first value here paste it down right here we’re going to divide by the total customers which is this command c command v and we’ll name this as status percentage okay let’s go ahead and run this and not too bad numbers there’s a lot of numbers here so i could just do a round function around all of this and i only really care about two decimal places we’ll go ahead and do this and now it’s down to 9% and 91% which comparing this to the industry i’m using some like perplex perplexity which is a chatbot that searched the internet to get some values anyway i asked it what’s a typical turn rate in the e-com in a e-commerce company and it says hey a turn rate of under 5% is considered good however the average turn rate for the e-commerce industry is around 22% so this one’s pretty or our company is a lot lower than industry standards all right let’s just take this one more step further and now we’re going to be forming or finding out what is this active versus churned rate for our cohort years and see how it progresses over the years now all we need for this is well we need to add another column on cohort year um but the problem is we actually need to import it higher up specifically it’s inside of our cohort analysis if i actually look inside of here you can see cohort year is there so after that first purchase date i’m going to add in cohort year and then in our second subquery i’m also going to add it in here now because we added these extra parameters up here we needed to add it into our group by to make sure that it’s working just fine specifically i’m going add in cohort year and i actually want cohort year before this okay let’s go ahead and just run this this isn’t going to be the correct calculations just yet and so we do have that cohort year inside of here we have that active verse churned as we can see we have our number of customers but our total customers are 46,000 the entire time basically this is all of our total customers and then this is driving our percentages down so for 2015 we have 1% and 6% together these two rows should equal 100% the problem is we’re dividing this 237 by 46,000 we don’t want the total customers per se we want to be the total customers of 2015 so conveniently all we have to do is inside of our window functions we just have to add in a partition by and we want to do it by cohort year so we add in cohort year go ahead and run command enter and i need to learn how to spell partition okay we got good syntax highlighting now all right and i’m seeing this this is it looks like it’s adds up to the correct amounts all right looking good i’m actually going to take this partition by and also throw it into our status percentage below so we have the correct status percentage calculated and now whenever i add up these two values it will equal 100% for all of these and as we we can see it goes from around 8% up to in more recent years up to 10% graphing it visually we can also see this trend that it’s slowly going up over the time from that 8% up to even 10% all right so only one thing left to do now is update our readme we already have our uh third sql file in there and actually i need to make sure that it’s saved so i’m going to command s it and now when i go inside of here i can see that okay we have our entire sql file next thing is our readme i’m going close out of this and make this more viewable so what did i add to the readme first i attached the readme which is linked apparently not correctly and it looks like i spelled analysis wrong so always double check your spelling anyway when updating it now whenever i click on this it actually directs right to it so it’s always good to go through and actually click any different hyperlinks or links that you attach with this next thing i went in to do is attach a visualization this one i had generated by chai gbt just copied and pasted it in and it puts that graph in that we previously had from there i moved into the key findings talking about how our churn rate stabilized around 90% for the last 2 to 3 years and then studying the fact of that retention rates are consistently low 8 to 10% way less than what the industry normal is and then finally cap it off with that newer cohorts are showing similar churn directories and basically we need to take action now to start improving these churn rates so what can we do with this data well we can work in the future to basically target those within the first year or two to improve that active rate from churned we can also combine this with other analysis and re-engage not only our churn customers but also our highv value churn customers so we can be really specific with our targeting taking this a step further we could use this analysis in predicting future churn rates and how a customer may act that goes more into data science and machine learning we’re not going to go into that but it is something that we could take away from this analysis now that we have our third and final question done we now need to get into finalizing our readme packaging it all up and putting it on github and then finally sharing on linkedin which conveniently we doing in the next chapter with that i’ll see you there and we don’t have any more practice problems for the remainder of this course so congratulations to everybody that’s been doing those practice problems with that see you in the next one welcome to this final chapter where we’re now going to go into sharing our project and first of all i want to congratulate you for making it this far and getting through this entire project it’s been quite an accomplishment thus far now this chapter only has two short lessons the first lesson which this one right here is going to be about how we can create our github repo and then our next lesson will be in actually sharing this github repo onto platform like linkedin so dialing into this lesson we’re going to be focusing on two core technologies that you may or may not be familiar with the first is git git is a version control system similar to like track changes in microsoft word anyway it tracks our changes and you can install on your computer and use it to track changes within files we’ll go more into depth of this as we go through this video but we’re going to be using git to create a repo or repository and we’re going to be pushing it into github github is an online platform that allows you to share remote repositories and remote being you can access it from anywhere and what’s great about this is it allows us to then share our project so what are the steps we’re going to be going through in this video well first thing we need to do is actually clean up that readme after that we’re going to do a deeper dive if you’re unfamiliar with repos git and github we’ll do an explanation of all of this thirdly we’ll move into installing git on your computer and getting this repo set up for those then to put onto github and then we’ll in the fourth and final one we’ll be synced between the two and i’ll show you how you can manage it so if you’ve been keeping up with your readme so far with all of the different analysis that we’ve done since the beginning there’s not a lot that we need to do to update this specifically we need to fill in an overview business questions and then finally any strategic recommendations we have from this the overview i just have this one sentence of hey it’s analysis of customer behavior retention and lifetime value for e-commerce company to improve customer retention and maximize revenue for three questions we bucket them into these three that we’ve gone into inant detail on each one of these not going to rehash it feel free to pause this video now copy whatever you want off of it after this you should have that analysis approach that we’ve gone through after each one of the questions and actually updated it to include everything we need and then finally our strategic recommendations i went through and bucket these based on our three different questions and i’ve outlined a lot of the key tactics that we can take away from it within those questions so i’m not going to rehash it here again but i highly encourage you to brainstorm and think through what are some strategic recommendations that you would take and then from there put it into this section final and last sections on technical details i just have information on what technologies were used so people are aware that yes i use postgress for this and i use chatgbt for the visualizations so this is looking good i’m going to go ahead and press command s or control s to save it and now we can get into our next steps of initializing a repo or publishing to github but i want to just cover some background knowledge first so the first concept to understand is what is a repository or as i’ll call it going forward a repo this is a personal library for your project and basically allows you to keep manage and record every change to all your different files within it like i hinted to before it’s like track changes in excel except we’re using a version control system like git in order to manage these changes and thus we can go and revisit previous editions of it if we need to now in order to create this repo we need to use a technology like git git is a free and open-source distributed version control system and it’s designed everything from small to large projects i use it all the time for my version control we’ll be installing it here in a second so what exactly is going on and how is this becoming a git repository well here is the files that kelly and i are working together for to build this course and it has all our different lesson plans in it we actually used git for the version control for building this course anyway on the surface we can see that it just has some folders and some files in there nothing special but whenever i go to unhide the hidden files on a mac i can do that by pressing command shift period it shows that there is other files in here specifically this git folder orgget folder dot being at the front mean that the file is hidden that’s why you can’t see it that’s why i unhide it anyway this tracks all the different changes inside of here of what’s going on inside of my project we don’t even need to attach that or touch it in order to adj make any adjustments it does it automatically as we go through and make changes and update the git but i just wanted to show it so you understand what is going on there i’m actually going to go back and hide it so now that we understand that git is the version control system and is what is used to create a repository we understand that there are actually two types of main repositories one is a local repository and the second is a remote repository local is as suggested it’s local it’s stored on your computer what i just showed you right here is in fact a local repository because it’s local you don’t need any internet connection it’s super fast and you can do this and it’s very common to do this for your initial development now remote repositories are stored on a server they obviously because they’re on a server and not local you need internet but because they’re now in a remote location they allow you to collaborate with others so once again i have that that local repository here but also that same repo is on github and this is a remote repository and github allows me to work with kelly as a contributor for us to work back and forth on different files so frankly it’s more than just a version control system it’s also great at collaborating with others anyway this is now bringing us into github which is one of the most popular tools for using git specifically it allows you to store those git repos here and then share them with the world so by the end of this video we’ll be publishing your project to github so you can make it publicly accessible so if it’s not clear git is the version control system it maintains that local repo and it also has uh some different command line tools we can actually type into it we’re not going to go into that in the video that’s out of scope and it’s open source and free so that’s why we’re using it github hosts these repositories it’s located via web browser interface and we can access it remotely so that way we can also collaborate with others let’s now get into actually setting up this repo and sharing it with the world we’re going to go through four different steps first we need to install git if you don’t have it second we need to create the repo within our project folder third we need to set up a github profile if you don’t have one already and then finally the fourth step we’re going to actually share it to github so let’s see if we have git before installing it i’m going to open a terminal that’s going to be what you use on a mac and on windows you should have the terminal available or you can open up a command prompt inside of here all i’m going to type is get and then d- version and click enter in my case i do have it installed so it’s going to say what the version is i don’t need to install it on the machine now if it’s not installed you’re likely to get an error message and next we can now go through actually installing it so navigate to get.sem.com and from there you’re going to download for your appro appropriate uh operating system for windows machines you’re just going to be going through the setup for this most modern computers are 64-bit so you should be fine with installing this if you have a newer computer anyway it’s going to walk you through a guey just leave all the default values the same and click okay all the way through for mac they have a couple of different options for you to install and all of them are through that command line or that terminal my recommended option is through homebrew and if you don’t have homebrew not a big deal you go to this link up here and all you have to do to install homebrew is just copy this entire code here and then paste it in and actually run it i’m not going to run it because i have homebrew installed then after it’s installed all you need to do is copy this command of brew install git and execute that in the terminal you’ll have git installed now once you have it installed you need to verify is it in fact installed all we have to do is run git attack version and it should output the version there now some of y’all may get an error when running this it’s going to say something like this please tell me who you are and it gives you the instructions run this get config message in order to provide your email and your name all you got to do is copy it and then paste it into your terminal you need to actually go through though now and actually update things like your name to what actually your name is along with that email address above and then all you got to do is press enter you’re done now we need to move on to the next steps of creating our repo creating our profile and sharing to github conveniently these steps specifically steps two and four can actually be done together if i navigate to vs code specifically you go to this source control tab right here they have two options inside of your project itself you can either initialize the repository or publish to github but publish to github actually initializes the repository and then publishes to github the important thing though is you need a github profile before you click this so if you navigate over to github.com on the homepage right there it has a sign up for you to enter your email in and then go through the sign up process to create a github account once you have that account all set up then we can proceed forward so now i can go back into vs code and in here remember i’m going to do publish to github because it’s also going to initialize our repository at the same time i’m going to click publish to github it’s going to prompt me that hey the extension github wants to sign in using github i want to allow this i select my account that i want to be associated with and then i’m navigated back to vs code it asks me this first do i want to create a private or public repository i wanted to do a public repository the next thing this is very important you get this right in the first try it is select which files should be included in the repo i don’t care about this ds store any of these hidden files per se which hidden files are anything with a dot in the front of it i do want my sql files in there i don’t need that bookmarks folder that diagrams folder that scripts folder i do need my images because they have the images for my analysis you could if you want include the scripts i don’t really care about that that’s just like throwaway scripts i’m not maintaining that in version control okay i have in my case five files selected i’m going to go ahead and select okay it’s be going through publishing and it says that hey this was successfully published to my github now what i’m going to do is navigate back to github and go to my repository i can just get to my repository or get it to my profile by clicking the name in the top upand corner one thing to note if you haven’t done this already i recommend filling out any of the different social information along with your name and adding a picture to make it look like you have a legit profile anyway if i want to look at the repos myself i can go up here to repositories tab and it has this project which i titled intermediate sql project i can go ahead and click on it and includes that images folder which i click into it has all three of my images next is my git ignore which basically ignores all those different files that we said hey don’t import into git so that’s why it’s there and you didn’t see it before then our three sql files and finally our readme which we talked about before the readme is actually going to appear down below in this section and this has all of our analysis on this homepage this is really great that it is all there this is actually the url that i would want to share to other people to showcase my work now there’s one last core concept that i want to go over before we conclude this on github and that’s on how to actually sync your profile with github or sync your project with github as you go along let’s say that we do have a change inside of our readme file itself specifically i’m going to go under analysis tools and not only do i use postgress but i also use things like dber and pg admin okay so i have these new changes in here i’m going to go ahead and do command s and save it i can now see based on these changes that we have an m next to this this means that this file is now modified and underneath the version control i can see i have this notification here for one and it’s telling me that the readme is in fact modified now just to show i’m going to refresh this page real quick just to show if i scroll on down these tools are not present here but we want to get them updated on the technical details up here so what we need to do is stage as you can see this is underneath changes we need to stage these changes in our local repository we need to give a message of what we did here and i’ll say you can just leave it super short i’m going to say we updated tools and i’m going to click commit and it’s going to say hey there are no stage changes to commit would you like to stage all of your changes and commit them directly basically previously we’d have to go through staging all the changes and then committing them we’re going to basically combine this step together and so i’m going to click yes now these are still not up on github this is only committed locally to our repo so what we’re going to do is now push it to our github repo by clicking sync changes and it’s going to say hey this action will pull and push commits from origin main which is the version that we want to get updated to in our repo we’re going to click okay and also i don’t want to show this again so i’m going click okay don’t show again so now we can see we went here from first commit to now our second commit updated tools and if i navigate over to the readme refresh this page we can now see that it is now added to it so that was an example of pushing our changes we can also pull changes basically if there are changes in our remote repo like when me and kelly work together she pushes those changes up i want to pull those changes but we’re going to just implement a change in here and then uh pull it from there so what i’m going to do is i inside of github here i can actually edit this file oh and let me actually show you what i’m going to edit so these images i have them around 50% i’m going to update this one this one’s just too big i’m going to update it to 50% i’m going to click edit and then i’m going to go to the code now this is slightly this is html code that i’m using for this i’m using an image tag you don’t need to necessarily do that all you need to understand is that i want to update the percentage and this is actually the image for question number three i want question number two oh which isn’t in an image tag so what i can do is copy this code from down here put it down and this says image what from our source specifically i want this image right here from this source so i’ll put this in we want the alt tag or the name of it to be cohort analysis and then for the width 50% height auto i’m going to go ahead and do this by the way you don’t need to necessarily use this uh html formatting for image this is just sort of fancy dancy i’m using to get this 50% anyway now that we’ve done these changes in the readme i’m going to go ahead and go hey i want to commit changes for the commit message i’m actually going to change it to more descriptive to update second image size if you want to do an extended description you can and i want to commit directly to the main branch i’m going to go ahead and commit changes now scrolling on down in that readme i can see that we have that image formatted correctly so our remote repository is updated but if i actually view our readme from our local repository we can see the number three image is pretty small but the second image is pretty big and it’s still using that other sorry it’s using our just markdown format for having our images so going over to version control we want to pull these changes down from github so i’m going to select more actions and specifically pull and we can see down below underneath it we not only have first commit update tools we now we have update second image size and closing this out and looking at this we can see from this that all the image sizes are now formatted correctly and i have that updated code here so remember there’s two main concepts that we got went over here one is pushing the changes and this is sending our local repository changes up into our remote repository in github and then secondly we can pull changes if there are any changes on the remote repository i can then pull them down into my local repository and then my local repository can be up todate now that was just a brief intro into git and github if you’re new to this and you want additional resources on it i have an entire youtube video on it a link up above and you’ll be able to go through and see even more detail to understand the ins and outs of git all right in the next lesson we’re gonna now that we have this public repository on github we’re going to go forward with sharing it on linkedin so with that see you in the next one all right welcome to this very last lesson once again congratulations for finishing that project it’s now time to get into sharing that github repo onto linkedin and we’re also going to be going through how those that purchase the course perks can also upload their course certificate which you’re going to get after completing your end of course survey so let’s navigate on over to linkedin you should have a profile if you don’t i highly recommend that you create one this is where employers are at and this is where they’re checking your work anyway i’m here on my profile in here they have sections on and about featured activity but what i care about is this section here on licenses and certifications this is where we’re going to be uploading your course certificate remember you complete the end of course survey i’ll be emailing it to you you’ll not only get a link but you can also download the physical certificate as well now if you’re not seeing licenses and certifications you can actually come all the way up to the top click add profile section and underneath recommended you can click add licenses and certifications anyway let’s add that certificate to it by clicking this plus icon from there you’ll go through and fill this all in here i filled in with intermediate sql for data analytics put me as the issuing organization i have the uh issue date of march 2025 the certificate never expires so you don’t put an expiration date in there’s a credential it id located on the certificate so put that in along with the url underneath skills you can list up to five skills i’d recommend these of postgress sql git github and dbaver finally i also like including the image of the certificate itself so i select add media and then add media and then attach the file itself in there give it an appropriate title from there click apply all right once you have everything in there all you got to do is click save and it’s there the next thing you’ll do is actually update this project because we have our certificate but we also part of this course a project so underneath this project section i’m going to click the add icon from there i give it a name of intermediate sql sales analysis short description that i stole from our readme fill in the five core skills that were saying from that certificate next is the media and i like to include a link specifically a link to the github repo so what i’m going to do is just grab this url right here and copy it and then paste it right into here and then click add i’m liking everything that it has for this i’m going to go ahead and click apply now that we have our media in we can do our start date i started this bad boy making this back in october of 2024 and i just finished it this month in march of 2025 if you worked on with somebody like kelly mcklly so i could add her as a contributor and i can associate other projects i don’t have that okay now i’d go through and click save and my project’s going to be updated to my linkedin the last thing i’m going to recommend to do is sharing a social media post on linkedin or making a post if you will of this project completion to let everybody know that you’ve done this in it i’d call out that you completed the course and also did the project don’t forget to tag me and kelly in this as it’s super awesome i love going through this and being able to see all the different work and i can comment on it as well once again congratulations on all the work that you’ve put into completing this course and also this project what are the next steps well i have coming up shortly in the next few months we’ll be releasing an advanced sql for data analytics course which i’ll link somewhere on here that you can go and check it out if you’re interested in with that don’t forget to follow me on linkedin and smash that like button see you in the next one
Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
The provided texts offer a comprehensive introduction to databases and SQL, covering fundamental concepts like tables, columns, and records, alongside essential SQL commands for data manipulation and querying. They further explore the role of SQL in data analysis, outlining necessary skills, qualifications, project work, portfolios, and internships for aspiring data analysts. Advanced SQL topics such as joins, subqueries, stored procedures, triggers, views, and window functions are examined in detail through explanations and practical examples using MySQL. Finally, the material transitions to PostgreSQL, demonstrating similar SQL functionalities and introducing more advanced features like case statements, aggregate functions, and user-defined functions, while also discussing the importance and top certifications in the field of data analytics.
SQL Fundamentals Study Guide
Quiz
What is the purpose of the GROUP BY clause in SQL? Provide a brief example of its syntax. The GROUP BY clause in SQL is used to group rows that have the same values in one or more columns into summary rows. It is often used with aggregate functions to calculate metrics for each group. For example: SELECT department, COUNT(*) FROM employees GROUP BY department;
Explain the difference between the WHERE clause and the HAVING clause in SQL. When would you use each? The WHERE clause filters individual rows based on a specified condition before any grouping occurs. The HAVING clause filters groups based on a specified condition after grouping has been performed by the GROUP BY clause. You use WHERE to filter individual records and HAVING to filter groups of records.
Describe the main categories of SQL data types discussed in the source material. Give one example for each category. The source material outlines several main categories of SQL data types: exact numeric (e.g., INTEGER), approximate numeric (e.g., FLOAT), date and time (e.g., DATE), string (e.g., VARCHAR), and binary (e.g., BINARY).
List three types of SQL operators and provide a brief explanation of what each type is used for. Three types of SQL operators are: arithmetic operators (used for mathematical calculations like addition: +), logical operators (used to combine or modify conditions, like AND), and comparison operators (used to compare values, like equal to: =).
What are SQL joins used for? Briefly explain the purpose of an INNER JOIN. SQL joins are used to combine rows from two or more tables based on a related column between them. An INNER JOIN returns only the rows where there is a match in both tables based on the join condition; rows with no match in either table are excluded.
What is a subquery in SQL? Provide a simple example of how a subquery might be used. A subquery is a query nested inside another SQL query (such as SELECT, FROM, or WHERE). It is often used to retrieve data that will be used in the main query’s conditions. For example: SELECT * FROM employees WHERE salary > (SELECT AVG(salary) FROM employees);
Explain the concept of a stored procedure in SQL. What are some potential benefits of using stored procedures? A stored procedure is a set of SQL statements with an assigned name, which is stored in the database. Benefits include reusability of code, improved performance (as they are pre-compiled), and enhanced security by granting access only to the procedure rather than the underlying tables.
What is a trigger in SQL? Describe a scenario where a trigger might be useful. A trigger is a stored program that automatically executes in response to certain events (e.g., INSERT, UPDATE, DELETE) on a particular table. A trigger could be useful for automatically updating a timestamp field whenever a row in a table is modified, ensuring data integrity or auditing changes.
Describe what a view is in SQL. How does it differ from a regular table? A view is a virtual table based on the result of an SQL statement. Unlike regular tables, views do not store data themselves; instead, they provide a customized perspective of data from one or more underlying tables. Changes made through a simple view might affect the base tables, but complex views are often read-only.
What is the purpose of the ORDER BY clause in SQL? Explain how to sort results in descending order. The ORDER BY clause is used to sort the result set of a SQL query based on one or more columns. To sort results in descending order, you specify the column(s) to sort by and append the DESC keyword after the column name(s). For example: SELECT * FROM products ORDER BY price DESC;
Answer Key
The GROUP BY clause in SQL groups rows with the same values in specified columns, often used with aggregate functions for summarized data. Example: SELECT department, COUNT(*) FROM employees GROUP BY department;
WHERE filters rows before grouping, while HAVING filters groups after GROUP BY. Use WHERE for record-level conditions and HAVING for group-level conditions on aggregated results.
The main categories are exact numeric (e.g., INTEGER), approximate numeric (e.g., FLOAT), date and time (e.g., DATE), string (e.g., VARCHAR), and binary (e.g., BINARY).
SQL joins combine rows from multiple tables based on related columns. INNER JOIN returns only matching rows from both tables based on the join condition.
A subquery is a query nested within another query, often used to provide values for conditions in the outer query. Example: SELECT * FROM products WHERE price > (SELECT AVG(price) FROM products WHERE category = ‘Electronics’);
A stored procedure is a pre-compiled set of SQL statements stored in the database, offering benefits like code reuse, improved performance, and enhanced security.
A trigger is a database object that automatically executes SQL code in response to specific events on a table. Useful for auditing changes by logging every update to a separate history table.
A view is a virtual table based on the result of a query, providing a specific perspective on the data without storing it directly. It differs from a regular table by not holding persistent data.
The ORDER BY clause sorts the query result set. To sort in descending order, use the DESC keyword after the column name in the ORDER BY clause (e.g., ORDER BY salary DESC).
Essay Format Questions
Discuss the importance of data types in SQL. Explain how choosing the appropriate data type for a column can impact database performance and data integrity. Provide specific examples of scenarios where different data types would be most suitable.
Elaborate on the different types of SQL joins (INNER, LEFT, RIGHT, FULL). Explain the conditions under which each type of join is most useful and provide conceptual examples illustrating the results of each join type using sample tables.
Analyze the benefits and drawbacks of using stored procedures and triggers in SQL database design. Consider aspects such as performance, maintainability, security, and complexity. Provide scenarios where each would be a particularly advantageous or disadvantageous choice.
Explain the concept and benefits of using views in SQL. Discuss how views can contribute to data security, query simplification, and data abstraction. Describe different types of views and their specific use cases.
Compare and contrast the use of subqueries and joins in SQL for retrieving data from multiple tables. Discuss the scenarios where one approach might be preferred over the other, considering factors such as readability, performance, and the complexity of the relationships between tables.
Glossary of Key Terms
Clause: A component of an SQL statement that performs a specific function (e.g., SELECT, FROM, WHERE, GROUP BY, ORDER BY).
Data Type: The attribute that specifies the type of data that a column can hold (e.g., numeric, string, date).
Operator: Symbols or keywords used to perform operations in SQL expressions (e.g., arithmetic, logical, comparison).
Join: An SQL operation that combines rows from two or more tables based on a related column.
Inner Join: Returns rows only when there is a match in both tables based on the join condition.
Outer Join (Left, Right, Full): Returns all rows from one table and the matching rows from the other; if no match, NULLs are used for the non-matching table.
Subquery (Nested Query): A query embedded inside another SQL query.
Stored Procedure: A pre-compiled collection of SQL statements stored in the database.
Trigger: A database object that automatically executes a block of SQL code in response to certain events on a table.
View: A virtual table based on the result of an SQL SELECT statement.
Aggregate Function: A function that performs a calculation on a set of values and returns a single summary value (e.g., COUNT, SUM, AVG, MIN, MAX).
GROUP BY Clause: Groups rows with the same values in one or more columns.
HAVING Clause: Filters the results of a GROUP BY clause based on specified conditions.
WHERE Clause: Filters rows based on specified conditions before grouping.
ORDER BY Clause: Sorts the result set of a query based on specified columns.
DESC: Keyword used with ORDER BY to sort in descending order.
ASC: Keyword used with ORDER BY to sort in ascending order (default).
Alias: A temporary name given to a table or column in a SQL query for brevity or clarity.
Briefing Document: Review of SQL Concepts and MySQL/PostgreSQL Usage
This briefing document summarizes the main themes, important ideas, and facts presented across the provided sources, which primarily focus on introducing and demonstrating various aspects of SQL using MySQL and PostgreSQL.
Main Themes:
Fundamentals of SQL: The sources cover core SQL concepts, including data manipulation language (DML) commands (SELECT, INSERT, UPDATE, DELETE), data definition language (DDL) commands (CREATE TABLE, ALTER TABLE, DROP TABLE, CREATE DATABASE, DROP DATABASE, CREATE VIEW, DROP VIEW), clauses (WHERE, GROUP BY, HAVING, ORDER BY, JOIN, LIMIT), data types, operators, and basic SQL functions.
Database Management Systems: The documents illustrate the practical application of SQL within two popular database management systems: MySQL and PostgreSQL. This includes installation (for MySQL), connecting to servers, and executing SQL commands within their respective interfaces (MySQL Workbench, command-line interface, and online compilers for PostgreSQL).
Data Filtering and Sorting: A significant portion of the content focuses on how to effectively filter data using the WHERE and HAVING clauses and how to sort results using the ORDER BY clause. The use of comparison operators, logical operators (AND, OR, BETWEEN, LIKE, NOT LIKE), and pattern matching is highlighted.
Data Aggregation: The GROUP BY and HAVING clauses are explained and demonstrated for summarizing data based on groups, along with aggregate functions like COUNT, SUM, AVG, MAX, and MIN.
Joining Tables: The concept of joining data from multiple tables is introduced, with a focus on INNER JOIN and the importance of common fields for linking tables.
Advanced SQL Concepts: The sources delve into more advanced topics such as subqueries (nested queries), views (virtual tables), stored procedures (reusable SQL code), triggers (actions performed automatically in response to database events), Common Table Expressions (CTEs/WITH expressions), and window functions (for analytical queries).
SQL Functions: Various built-in SQL functions are explained and demonstrated, including mathematical functions (ABS, GREATEST, LEAST, MOD, POWER, SQRT, SIN, COS, TAN, CEILING, FLOOR) and string functions (CHARACTER_LENGTH, CONCAT, LEFT, RIGHT, SUBSTRING/MID, REPEAT, REVERSE, LTRIM, RTRIM, TRIM, POSITION, ASCII).
Practical Application and Examples: The sources heavily rely on practical examples and demonstrations within MySQL Workbench and online PostgreSQL environments to illustrate the usage and benefits of different SQL concepts and commands.
Database Connectivity with Python: One source provides a basic introduction to connecting to a MySQL database using Python, creating databases and tables, inserting data, and executing queries.
Common Interview Questions: One section focuses on typical SQL interview questions, covering topics like INDEX, GROUP BY, ALIAS, ORDER BY, differences between WHERE and HAVING, VIEW, and STORED PROCEDURE.
Most Important Ideas and Facts (with Quotes):
SQL Clauses for Data Manipulation:“we condition one condition two and so on then we have the group by Clause that takes various column names so you can write Group by column 1 column 2 and so on next we have the having Clause to filter out tables based on groups finally we have the order by Clause to filter out the result in ascending or descending order” (01.pdf) – This outlines the basic structure and purpose of key SQL clauses.
The WHERE clause filters rows before grouping, while the HAVING clause filters groups after they are formed.
SQL Data Types: The document lists various SQL data types, categorizing them as exact numeric (integer, small int, bit, decimal), approximate numeric (float, real), date and time (date, time, timestamp), string (char, varchar, text), and binary (binary, varbinary, image).
SQL Operators: Basic arithmetic, logical (all, and, any, or, between, exists), and comparison operators (=, !=, >, <, >=, <=, NOT <, NOT >) are fundamental for constructing SQL queries.
MySQL Workbench Installation: The source provides a step-by-step guide to installing MySQL Workbench on Windows, including downloading the installer from the official Oracle website (myql.com), choosing a custom setup, and selecting components like MySQL Server, MySQL Shell, and MySQL Workbench. The importance of setting a password for the root user is emphasized: “now here set the password for your root user by the way root is the default user this user will have access to everything” (01.pdf).
Basic MySQL Commands: Commands like SHOW DATABASES, USE <database_name>, SHOW TABLES, SELECT * FROM <table_name>, and DESCRIBE <table_name> are introduced as essential for navigating and inspecting database structures.
Creating Tables: The CREATE TABLE command syntax is explained, including defining column names and their data types, and specifying constraints like PRIMARY KEY and NOT NULL.
Inserting Data: The INSERT INTO command is used to add new rows into a table, specifying the table name and the values for each column.
String Functions:“there’s also a function called position in MySQL the position function Returns the position of the first occurrence of a substring in a string” (01.pdf)
“the asky function Returns the asky value for a specific character” (01.pdf)
PostgreSQL’s string functions like CHARACTER_LENGTH, CONCAT, LEFT, RIGHT, REPEAT, and REVERSE provide powerful text manipulation capabilities.
GROUP BY and Aggregate Functions: The GROUP BY clause groups rows with the same values in specified columns, allowing the application of aggregate functions to each group.
HAVING Clause for Filtering Groups: “the having Clause works like the wear Clause the difference is that wear Clause cannot be used with aggregate functions the having Clause is used with a group by Clause to return those rows that meet a condition” (Source 17.pdf).
JOIN Operations: SQL joins (INNER JOIN is primarily discussed) are used to combine rows from two or more tables based on related columns.
Subqueries (Nested Queries): A subquery is a query embedded within another SQL query, used to retrieve data that will be used in the main query’s conditions.
Views (Virtual Tables):“views are actually virtual tables that do not store any data of their own but display data stood in other tables views are created by joining one or more tables” (01.pdf).
Views simplify complex queries and can enhance data security. The CREATE VIEW, RENAME TABLE (for renaming views), and DROP VIEW commands are used to manage views.
Stored Procedures:“a stored procedure is an SQL code that you can save so that the code can be reused over and over again” (01.pdf).
Stored procedures can take input parameters (IN parameters) and help in encapsulating and reusing SQL logic.
Triggers: Triggers are SQL code that automatically executes in response to certain events (e.g., BEFORE INSERT, AFTER UPDATE) on a table.
Window Functions: Introduced in MySQL 8.0, window functions perform calculations across a set of table rows that are related to the current row, allowing for analytical queries (e.g., calculating total salary per department using SUM() OVER (PARTITION BY)). The RANK(), DENSE_RANK(), and FIRST_VALUE() functions are examples of window functions.
Common Table Expressions (CTEs): CTEs, defined using the WITH keyword, are temporary, named result sets defined within the scope of a single query, improving readability and allowing for recursive queries.
Database Connectivity with Python: The mysql.connector library in Python can be used to connect to MySQL databases, execute SQL queries, and retrieve results. The basic steps involve creating a server connection, creating databases, connecting to specific databases, and executing queries using cursors.
PostgreSQL Specifics: The sources also demonstrate SQL concepts within a PostgreSQL environment using online compilers, highlighting similar SQL syntax and the availability of functions like BETWEEN, LIKE for pattern matching (% for any sequence of characters, _ for a single character), and various mathematical and string functions. The ALTER TABLE … RENAME COLUMN command is shown for modifying table schema. The LIMIT clause in PostgreSQL restricts the number of rows returned by a query.
SQL Interview Preparedness: The final source provides insights into common SQL interview questions, emphasizing understanding of fundamental concepts and practical application.
Overall Significance:
The provided sources offer a comprehensive introduction to fundamental and advanced SQL concepts, demonstrating their application in both MySQL and PostgreSQL. They emphasize practical learning through examples and hands-on exercises, making them valuable resources for individuals learning SQL or preparing for database-related tasks and interviews. The inclusion of database connectivity with Python further highlights the role of SQL in broader data management and application development contexts.
Understanding Fundamental SQL Concepts and Operations
1. What are the fundamental components of a SQL query?
A fundamental SQL query typically involves the SELECT statement to specify the columns you want to retrieve, the FROM clause to indicate the table(s) you are querying, and optionally, the WHERE clause to filter rows based on specific conditions. Additionally, you might use GROUP BY to group rows with the same values, HAVING to filter groups, and ORDER BY to sort the result set in ascending (ASC) or descending (DESC) order.
2. What are the common data types available in SQL?
SQL supports various data types to define the kind of data a column can hold. These include exact numeric types like INT, SMALLINT, BIT, and DECIMAL; approximate numeric types such as FLOAT and REAL; date and time types like DATE, DATETIME, and TIMESTAMP; string data types including CHAR, VARCHAR, and TEXT; and binary data types such as BINARY, VARBINARY, and IMAGE.
3. What are the different categories of operators used in SQL?
SQL uses several categories of operators. Arithmetic operators perform mathematical operations (+, -, *, /, MOD). Logical operators (ALL, ANY, OR, BETWEEN, EXISTS, etc.) are used to combine or negate conditions. Comparison operators (=, !=, >, <, >=, <=, NOT <, NOT >) are used to compare values.
4. How can you set up and connect to a MySQL database using MySQL Workbench and the command line?
To set up MySQL, you typically download the MySQL Installer from the official Oracle website. During the installation, you can choose to install MySQL Server, MySQL Shell, and MySQL Workbench. You’ll need to configure the server instance, set a password for the root user, and execute the configuration.
To connect via MySQL Workbench, you open the application, click on the local instance connection, and enter your root password.
To connect via the command line, you need to navigate to the bin directory of your MySQL installation using the cd command in the command prompt. Then, you can use the command mysql -u root -p, and upon entering your password, you’ll be connected to the MySQL server.
5. What are some basic SQL commands for database and table manipulation?
Some basic SQL commands include:
SHOW DATABASES; to list the existing databases.
USE database_name; to select a specific database to work with.
SHOW TABLES; to list the tables within the selected database.
SELECT * FROM table_name; to view all rows and columns in a table.
DESCRIBE table_name; or DESC table_name; to show the structure of a table (column names, data types, etc.).
CREATE DATABASE database_name; to create a new database.
CREATE TABLE table_name (column1 datatype, column2 datatype, …); to create a new table with specified columns and data types.
DROP TABLE table_name; to delete a table.
DROP DATABASE database_name; to delete a database.
6. How do GROUP BY and HAVING clauses work in SQL?
The GROUP BY clause in SQL is used to group rows in a table that have the same values in one or more columns into summary rows. It is often used with aggregate functions (like COUNT, MAX, MIN, AVG, SUM) to compute values for each group.
The HAVING clause is used to filter the results of a GROUP BY clause. It allows you to specify conditions that must be met by the groups. The key difference from the WHERE clause is that WHERE filters individual rows before grouping, while HAVING filters groups after they have been formed.
7. What are SQL JOINs and what are some common types?
SQL JOINs are used to combine rows from two or more tables based on a related column between them. This allows you to retrieve data from multiple tables in a single query. Common types of JOINs include:
INNER JOIN: Returns rows only when there is a match in both tables.
LEFT JOIN (or LEFT OUTER JOIN): Returns all rows from the left table and the matching rows from the right table. If there’s no match in the right table, NULLs are used for the right table’s columns.
RIGHT JOIN (or RIGHT OUTER JOIN): Returns all rows from the right table and the matching rows from the left table. If there’s no match in the left table, NULLs are used for the left table’s columns.
FULL OUTER JOIN: Returns all rows when there is a match in either the left or right table. If there is no match in one of the tables, NULLs are used for the columns of the table without a match. (Note: MySQL does not directly support FULL OUTER JOIN, but it can be simulated using UNION ALL with LEFT JOIN and RIGHT JOIN).
JOIN conditions are typically specified using the ON keyword, indicating which columns should be compared for equality.
8. What are subqueries and stored procedures in SQL?
A subquery (or inner query) is a query nested inside another SQL query. Subqueries can be used in the SELECT, FROM, WHERE, and HAVING clauses. They are often used to retrieve data that will be used in the conditions or selections of the outer query. Subqueries can return single values, lists of values, or even entire tables.
A stored procedure is a set of SQL statements with an assigned name, which is stored in the database. Stored procedures can be executed by calling their name. They offer several benefits, such as code reusability, improved performance (as the code is pre-compiled and stored on the server), and enhanced security by granting execute permissions without direct table access. Stored procedures can also accept input parameters and return output parameters.
Understanding Relational Database Tables
In relational databases, data is stored in the form of tables. These tables are the fundamental structure for organizing and managing data. You can think of a table as a grid composed of rows and columns.
Here’s a breakdown of the structure of a database table:
Table Name: Each table has a name that identifies the data it holds, for example, “players”, “employees”, “customers”, or “orders”.
Columns (or Fields or Attributes):
Columns are the vertical structures in a table.
Each column represents a specific attribute or category of information about the items stored in the table.
At the top of each column is a column name (also known as a field name) that describes the data in that column, such as “player ID”, “player name”, “country”, and “goals scored” in a “players” table. Other examples include “employee_ID”, “employee_name”, “age”, “gender”, “date of join”, “department”, “city”, and “salary” in an “employees” table.
Each column is associated with a specific data type that defines the kind of values it can hold. Examples of data types in SQL include integer, smallint, decimal, float, real, date, time, varchar, char, text, binary, etc.. The data type ensures that all values stored in a specific column are of the same type or domain.
Columns are also sometimes referred to as fields in a database.
Rows (or Records or Tuples):
Rows are the horizontal structures in a table.
Each row represents a single instance or record (also called a tuple) of the entity that the table describes.
For example, in a “players” table, each row would contain the information for one specific player. In an “employees” table, each row would contain the details of a single employee.
Cells: The intersection of a row and a column forms a cell, which holds a single piece of data. Each column in a row will contain only one value per row, which is a rule for the first normal form of normalization.
Primary Key: A primary key is a special column or a set of columns that uniquely identifies each row in a table. It ensures that no two rows have the same primary key value, and it cannot contain null or empty values. Primary keys are crucial for linking tables together and maintaining data integrity. For instance, “employee_ID” could serve as a primary key in an “employees” table.
Index: Tables can be indexed on one or more columns to speed up the process of finding relevant information. An index creates a sorted structure that allows the database to locate specific rows more efficiently without having to scan the entire table.
SQL (Structured Query Language) commands are used to interact with these tables. You can use SQL to query (retrieve), update, insert, and delete records in a table. The SELECT statement is used to retrieve data by specifying the columns you want to see and optionally filtering the rows based on certain conditions using the WHERE clause. INSERT is used to add new rows to a table, UPDATE to modify existing rows, and DELETE to remove rows.
The logical structure of a database, including its tables and their relationships, can be visually represented using an Entity-Relationship (ER) diagram. An ER diagram shows entities (which often correspond to tables) and their attributes (which correspond to columns) and the relationships between these entities. This helps in understanding the information to be stored in a database and serves as a blueprint for database design.
Understanding SQL: Core Concepts and Commands
SQL (Structured Query Language) is a domain-specific language that serves as the backbone of data management and analysis for relational databases. It is the standard language used by most databases to communicate with and manipulate data. Initially developed by IBM, SQL allows users to interact with databases to store, process, analyze, and manage data effectively. As businesses become increasingly data-driven, proficiency in SQL is a crucial skill for data analysts, developers, and database administrators.
Here are key aspects of the SQL query language based on the sources:
Core Functionality: SQL queries enable you to access any information stored in a relational database. This includes retrieving specific data, updating existing records, inserting new data, and deleting unwanted information.
Efficiency: SQL is designed to extract data from databases in a very efficient way. By specifying precisely what data you need and the conditions it must meet, you can minimize the amount of data processed and transferred.
Compatibility: The Structured Query Language is compatible with all major database systems, ranging from Oracle and IBM to Microsoft SQL Server and open-source options like MySQL and PostgreSQL.
Ease of Use: SQL is designed to manage databases without requiring extensive coding. Its syntax is relatively straightforward, focusing on declarative statements that specify what data should be retrieved or modified, rather than how to perform the operation.
Applications of SQL: SQL has a wide range of applications, including:
Creating databases and defining their structure (e.g., creating tables with specific columns and data types).
Implementing and maintaining existing databases.
Entering, modifying, and extracting data within a database. For instance, you can use INSERT to add new records, UPDATE to change existing ones, and SELECT to retrieve data.
Serving as a client-server language to connect the front-end of applications with the back-end databases that store the application’s data.
Protecting databases from unauthorized access when deployed as Data Control Language (DCL).
Types of SQL Commands: SQL commands are broadly categorized into four main types:
Data Definition Language (DDL): These commands are used to change the structure of the database objects such as tables. Examples include CREATE (to create tables), ALTER (to modify table structure), DROP (to delete tables), and TRUNCATE (to remove all rows from a table). DDL commands are auto-committed, meaning changes are permanently saved.
Data Manipulation Language (DML): These commands are used to modify the data within the database. Examples include SELECT (to retrieve data), INSERT (to add new rows), UPDATE (to modify existing rows), and DELETE (to remove rows). DML commands are not auto-committed, allowing for rollback of changes. The SELECT command is also referred to as Data Query Language (DQL).
Data Control Language (DCL): These commands control access to data within the database, managing user privileges and permissions. Examples include GRANT (to give users access rights) and REVOKE (to remove access rights).
Transaction Control Language (TCL): These commands manage database transactions. Examples include COMMIT (to save changes permanently) and ROLLBACK (to undo changes).
Basic SQL Command Structure: A typical SQL query follows a basic structure:
SELECT column1, column2, …
FROM table_name
WHERE condition(s)
GROUP BY column(s)
HAVING group_condition(s)
ORDER BY column(s) ASC|DESC;
The SELECT statement specifies the columns you want to retrieve. You can use SELECT * to select all columns.
The FROM statement indicates the table from which to retrieve the data.
The optional WHERE clause filters rows based on specified conditions. You can use comparison operators (e.g., >, =, <), logical operators (AND, OR, NOT), BETWEEN to select within a range, and IN to specify multiple values.
The optional GROUP BY clause groups rows that have the same values in one or more columns into summary rows, often used with aggregate functions.
The optional HAVING clause filters groups based on specified conditions (used with GROUP BY).
The optional ORDER BY clause sorts the result set in ascending (ASC) or descending (DESC) order based on one or more columns.
Data Types: SQL supports various data types to define the kind of data each column can hold, including exact numeric (integer, smallint, decimal), approximate numeric (float, real), date and time (date, time, timestamp), string (char, varchar, text), and binary data types (binary, varbinary, image).
Operators: SQL uses different types of operators to perform operations in queries, such as arithmetic operators (+, -, *, /), logical operators (ALL, ANY, BETWEEN, EXISTS, IN, LIKE, NOT, OR), and comparison operators (=, !=, >, <, >=, <=).
Functions: SQL provides built-in functions to perform various operations on data, including:
Aggregate functions: Calculate a single value from a set of rows (e.g., COUNT, SUM, AVG, MIN, MAX).
String functions: Manipulate text data (e.g., LENGTH, UPPER, LOWER, SUBSTRING, CONCAT, TRIM, POSITION, LEFT, RIGHT, REPEAT, REVERSE).
Date and time functions: Work with date and time values (e.g., CURDATE, DAY, NOW).
Joins: SQL allows you to combine data from two or more tables based on a related column. Different types of joins include INNER JOIN (returns rows only when there is a match in both tables), LEFT JOIN (returns all rows from the left table and matching rows from the right), RIGHT JOIN (returns all rows from the right table and matching rows from the left), and FULL OUTER JOIN (returns all rows when there is a match in either left or right table). UNION operator can also be used to combine the result sets of two or more SELECT statements.
Subqueries: A subquery (or inner query) is a query nested inside another SQL query. Subqueries can be used in the WHERE, SELECT, and FROM clauses to retrieve data that will be used by the outer query.
Stored Procedures: These are pre-compiled SQL statements that can be executed as a single unit. They can take parameters and return values, helping to encapsulate business logic and improve performance.
Triggers: Triggers are special types of stored procedures that automatically run when a specific event occurs in the database server (e.g., before or after an INSERT, UPDATE, or DELETE operation on a table).
In summary, SQL is a powerful and versatile language essential for interacting with relational databases. It provides a structured way to define, manipulate, and retrieve data, making it a cornerstone of modern data management and analysis.
Essential Skills for Aspiring Data Analysts
Based on the sources, becoming a data analyst requires a combination of technical and soft skills. The document “01.pdf” outlines several key skill areas for aspiring data analysts.
According to the source, the steps to become a data analyst include focusing on skills as the first crucial step. These skills are categorized into six main areas:
Microsoft Excel Proficiency: While advanced tools exist, proficiency in Excel remains vital for data analysts. Its versatility in data manipulation, visualization, and modeling is unmatched, making it a foundational tool for initial data exploration and basic analysis.
Data Management and Database Management Skills: This is indispensable for data analysts as the volume of data grows. Efficient management and retrieval from databases are critical. Proficiency in DBMS systems and querying languages like SQL ensures analysts can access and manipulate data seamlessly. As we discussed previously, SQL is the backbone of data management and analysis. It allows data analysts to access any information stored in a relational database with SQL queries. This includes writing queries, joining tables, and using subqueries.
Statistical Analysis: This skill allows analysts to uncover hidden trends, patterns, and correlations within data, facilitating evidence-based decision-making. It empowers analysts to identify the significance of findings, validate hypotheses, and make reliable predictions.
Programming Languages (e.g., Python, R): Proficiency in programming languages like Python is essential for data analysis. These languages enable data manipulation, advanced statistical analysis, and machine learning implementations. The source also mentions R programming language as one of the tools a data analyst should be familiar with.
Data Storytelling and Data Visualization: This skill is paramount for data analysts. Data storytelling bridges the gap between data analysis and actionable insights, ensuring that the value of data is fully realized. The ability to present insights clearly and persuasively is crucial as data complexity grows. Tools like Tableau and Power BI are mentioned as data visualization tools.
Problem Solving and Soft Skills: Strong problem-solving skills are important for data analysts when dealing with complex data challenges and evolving analytical methodologies. Analysts must excel in identifying issues, formulating hypotheses, and devising innovative solutions. In addition to technical skills, data analysts in 2025 will require strong soft skills to excel. These include:
Communication: Data analysts must effectively communicate their findings to both technical and non-technical stakeholders, presenting complex data in a clear and understandable manner.
Teamwork and Collaboration: Data analysts often work with multidisciplinary teams alongside data scientists, data engineers, and business professionals. Collaborative skills are essential for sharing insights, brainstorming solutions, and working cohesively towards common goals.
Domain Knowledge: Knowledge of the domain in which the analyst is working (e.g., pharmaceutical, banking, automotive) is really important. Without a basic foundational domain knowledge, it can be difficult to provide accurate results.
In summary, a data analyst needs a blend of technical skills in data manipulation (including SQL and Excel), statistical analysis, programming, and data visualization, along with crucial soft skills in communication, teamwork, and problem-solving, complemented by domain knowledge. Our earlier discussions highlight the fundamental role of SQL in a data analyst’s toolkit for interacting with databases [1, Your previous turn].
Understanding Database Management and SQL
Based on the sources, database management encompasses the organized collection of structured information or data, typically stored electronically in a computer system. This data is managed using a Database Management System (DBMS), which acts as a storage system for the collection of data.
Here are key aspects of database management as discussed in the sources:
Role of a DBMS: A DBMS is crucial for controlling and managing databases. It provides the necessary tools and functionalities to ensure data is easily retrieved, managed, and updated.
Relational Databases: A significant aspect of database management discussed in the source is relational databases. These systems store data in the form of tables. This tabular structure allows for organizing data into tables, rows (records or tuples), and columns (fields).
Organization and Indexing: In relational databases, data can be organized into tables with specific structures. Furthermore, data can be indexed to make it easier to find relevant information. An index helps speed up data retrieval operations. A table consists of:
Column Names (Fields): These are the attributes of the data stored in the table (e.g., player ID, player name, country, goals scored). Each column should have a unique name. All values within a specific column should be of the same data type or domain.
Rows (Records or Tuples): Each row represents a single instance of the entity being described by the table (e.g., information about a specific player).
SQL for Database Management: As highlighted in our previous conversation, SQL (Structured Query Language) is a domain-specific language used to communicate with databases [1, Your previous turn]. It plays a vital role in database management by allowing users to:
Query databases to retrieve specific information.
Update databases to modify existing data.
Insert records to add new data.
Perform many other tasks related to managing and manipulating data.
Store, process, analyze, and manipulate databases.
Create a database and define its structure.
Maintain an already existing database.
Popular Databases: The source lists several popular database systems, including:
MySQL.
Oracle Database.
MongoDB (a NoSQL database).
Microsoft SQL Server.
Apache Cassandra (a free and open-source NoSQL database).
PostgreSQL.
Database Management Skills for Data Analysts: Our previous discussion on data analyst skills emphasized that data management and database management skills are indispensable for data analysts [Your previous turn, 3]. The increasing volume of data necessitates efficient management and retrieval from databases, making proficiency in DBMS systems and querying languages like SQL critical. Data analysts need to be able to access and manipulate data seamlessly using SQL.
In essence, database management involves the strategic organization, storage, retrieval, and manipulation of data using a DBMS. Relational databases, structured in tables, are a common model, and SQL is the primary language used to interact with these systems for various management tasks. These skills are fundamental for professionals like data analysts who work with data to derive insights and support decision-making.
SQL for Data Analysis Functions
Based on the sources and our conversation history, data analysis functions involve the process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. SQL plays a crucial role in performing many of these functions when the data resides in relational databases [1, Your previous turn, Your previous turn].
Here are some key data analysis functions that can be performed using SQL, as supported by the sources:
Data Retrieval and Selection: SQL’s SELECT statement is fundamental for retrieving specific data required for analysis. You can choose particular columns from one or more tables. For example, to analyze player performance, you might select player name and goals scored from a players table.
Filtering Data: To focus on relevant subsets of data, the WHERE clause in SQL allows you to filter records based on specified conditions. For instance, you might analyze data only for players from a specific country.
Sorting Data: The ORDER BY clause enables you to sort the retrieved data based on one or more columns, which can help in identifying trends or outliers. You could sort players by the number of goals scored in descending order to see the top performers.
Removing Duplicates: The DISTINCT keyword is used to retrieve only unique values from a column, which can be important for accurate analysis, such as finding the number of unique cities represented in a dataset.
Aggregation: SQL provides aggregate functions that perform calculations on a set of rows and return a single summary value. These are essential for summarizing data:
COUNT(): To count the number of rows or non-null values. For example, counting the total number of employees.
SUM(): To calculate the total sum of values in a column. For example, finding the total salary of all employees.
AVG(): To calculate the average of values in a column. For example, finding the average age of employees.
MIN(): To find the minimum value in a column. For example, identifying the lowest salary.
MAX(): To find the maximum value in a column. For example, determining the highest salary.
Grouping Data: The GROUP BY clause allows you to group rows that have the same values in one or more columns into summary rows. This is often used in conjunction with aggregate functions to perform analysis on different categories. For instance, finding the average salary for each department.
Filtering Groups: The HAVING clause is used to filter groups created by the GROUP BY clause based on specified conditions, often involving aggregate functions. For example, identifying countries where the average salary is greater than a certain threshold.
Joining Tables: When data for analysis is spread across multiple related tables, JOIN operations in SQL are used to combine data from these tables based on common columns. This allows you to bring together relevant information for a comprehensive analysis, such as combining customer information with their order details. As mentioned in the source, you can even join three or more tables.
Using Inbuilt Functions: SQL provides various inbuilt functions that can be used for data manipulation and analysis. These include:
Mathematical Functions: For performing calculations (e.g., ABS(), MOD(), SQRT(), POWER()).
String Functions: For manipulating text data (e.g., LENGTH(), CONCAT(), UPPER(), LOWER(), SUBSTRING(), REPLACE()).
Date and Time Functions: For working with temporal data (e.g., CURRENT_DATE(), NOW(), extracting day, year).
Creating Calculated Fields: Using SQL, you can create new columns based on existing data through calculations or conditional logic. The CASE statement allows you to define different values for a new column based on conditions evaluated on other columns, enabling the categorization of data (e.g., creating a salary range category based on salary values).
Subqueries (Nested Queries): SQL allows you to write queries within other queries, which can be used to perform more complex data retrieval and analysis. For example, selecting employees whose salary is greater than the average salary calculated by a subquery.
Views: Views are virtual tables based on the result of an SQL statement. They can simplify complex queries and provide a focused perspective on the data, making analysis easier by presenting a subset of data in a more manageable format.
Common Table Expressions (CTEs): CTEs are temporary, named result sets defined within the scope of a single query. They can break down complex analytical queries into smaller, more readable, and manageable parts.
These data analysis functions, facilitated by SQL, are crucial skills for a data analyst, as highlighted in our earlier discussion about the necessary skills for this role [Your previous turn]. Proficiency in using these SQL features allows data analysts to effectively extract, manipulate, summarize, and analyze data stored in databases to derive meaningful insights.
SQL Full Course 2025 | SQL Tutorial for Beginners | SQL Beginner to Advanced Training | Simplilearn
The Original Text
hello everyone and welcome to SQL fos by simply learn have you ever wondered how apps manage data or how businesses handle massive data sets the answer lies in SQL structured query language is the backbone of data management and Analysis making it a must have skill for data analyst developers and database administrators as well as IND indes become more datadriven the demand for SQL experts is skyrocketed and by 2025 job opportunities in fields like SQL development and data analysis with search with starting salaries reaching around $50,000 in the US and around 4 to8 lakh perom in India and even experienced professionals earn around $100,000 or 20 lakh perom in India this course will take you from a beginner level to see SQL expert you learn how to write queries join tables use subqueries and apply SQL for Hands-On data analysis and by the end you’ll be equipped to manage and manipulate data like a pro so let’s get started but before that if you’re interested to make a current data analytics check out Simply learn’s postgraduate program in data analytics this comprehensive course is designed to transform you into a data analyst export this program covers essential skills such as data visualization statistical analysis machine learning using industry leading tools and Technologies like XLR Python and even tablet the course link is mentioned in description box below and in the pin comment so hurry up and enroll now in this session we are going to learn about databases how data is stored in relational databases and we’ll also look at some of the popular databases finally we’ll understand various SQL commands on my SQL Server now let’s get started with what is a database so according to Oracle a database is an organized collection of structured information or data that is typically stored electronically in a computer system a database is usually controlled by a database management system or dbms so it is a storage system that has a collection of data relational databases store data in the form of tables that can be easily retrieved managed and updated you can organize data into tables rows columns and index it to make it easier to find relevant information now talking about some of the popular databases we have mySQL database we also have Oracle database then we have mongod DV which is a no SQL database next we have Microsoft SQL Server next we have Apache cassendra which is a free and open source nosql database and finally we have postgress SQL now let’s learn what is SQL so SQL is a domain specific language to communicate with databases SQL was initially developed by IBM most databases use structured query language or SQL for writing and querying data SQL commands help you to store process analyze and manipulate databases with this let’s look at what a table is so this is how a table in a database looks like so here you can see the name of the table is players on the top you can see the column names so we have the player ID the player name the country to which the player belongs to and we also have the goals scored by each of the players so these are also known as fields in a database here each row represents a record or a tle so if you have the player ID which is 103 here the name of the player is Daniel he is from England and the number of goals he has scored is seven so you can use SQL commands to query update insert records and do a lot of other tasks now we’ll see what the features of SQL are SQL lets you access any information stored in a relational database with SQL queries data is extracted from the database in a very efficient way the structured query language is compatible with all database systems from Oracle IBM to Microsoft and it doesn’t require much coding to manage databases now we will see applications of SQL SQL is used to create a database Define its structure implement it and let you perform many functions SQL is also used for maintaining an already existing database SQL is a powerful language for entering data modifying data and extracting data in a database SQL is extensively used as a client server language to connect the front end with the back end the supporting the client server architecture SQL when deployed as data control language DCL helps protect your database from unauthorized access if you categor the steps to become a data analyst these are the ones firstly you need to focus on skills followed by that you need to have a proper qualification then test your skills by creating a personal project an individual project followed by that you must focus on building your own portfolio to describe your caliber to your recruiters and then Target to the entry level jobs or internships to get exposure to the real world data problems so these are the five important steps now let’s begin with the step one that is skills so skills are basically categorized into six steps Ed cleaning data analysis data visualization problem solving soft skills and domain knowledge so these are the tools Excel MySQL our programming language Python programming language some data visualization tools like TBL Loop powerbi and next comes the problem solving so these are basically the soft skill Parts problem solving skills domain knowledge the domain in which you’re working maybe a farma domain maybe a banking sector maybe automobile domain Etc and lastly you need to be a good team player so that you can actively work along with the team and solve the problem collaboratively now let’s move ahead and discuss each and every one of these in a bit more detail starting with Microsoft Excel while Advanced tools are prevalent Proficiency in Excel remains vital for data analyst Excel versatility in data manipulation visualization and modeling is Unown Managed IT serves as a foundational tool for initial data exploration and basic analysis data management database management skill is indispensable for data analyst as data volume saw efficient management and retrieval from datab basis is critical Proficiency in ddb systems and querying languages like SQL ensures analyst can access and manipulate data seamlessly followed by that we have statistical analysis statistical analysis allow analyst to uncover hidden Trends pattern and cor relationships within data facilitating evidence-based decision making it empowers analyst to identify the significance of findings validate hypothesis and make reliable predictions next after that we have programming languages Proficiency in programming languages like python is essential for data analysis these languages enable data manipulation Advanced statistical analysis and machine learning implementations next comes data storytelling or also known known as data visualizations data storytelling skill is Paramon for data analyst data storytelling Bridges the gap between data analysis and actionable insights ensuring that the value of data is fully realized in a world where data driven communication is Central to business success data visualization skill is a CornerStore for data analyst as data complexity grows the ability to present insights clearly and persuasively is Paramount next is managing your customers and problem solving managing all your customers data and Company relationships is Paramount strong problem solving skills are important for data analyst with complex data challenges and evolving analytical methodologies analyst must excel in identifying issues formulating hypothesis and devising innovative solutions in addition to the technical skills data analyst in 2025 will require strong soft skills to excel in their roles here are the top ones data analyst must effectively communicate their findings to both Technical and non-technical stakeholders this includes presenting complex data in a clear and understandable manner next soft skill is teamwork and collaboration data analysts often work with multidisciplinary teams alongside data scientists data Engineers business professionals collaborative skills are essential for sharing insights brainstorming Solutions and working cohesively towards common goals and last but not least domain knowledge knowledge on domain in which you’re currently working is really important it might be a formatical domain it can be an automobile domain it can be banking sector and much more unless you have a basic foundational domain knowledge you cannot continue in that domain with accurate results now the next step which was about the qualification to become a data analyst Master’s courses online courses and boot camps provide strong structured learning that helps you gain in-depth knowledge and specialized skills in data analysis masters programs offer comprehensive academically recr training and often include research projects making sure you’re highly competitive in the job market online courses allow flexibility to learn at your own pace while covering essential topics and boot gaps offer immersive Hands-On training in a short period focusing on practical skills all three parts enhance your credibility keeping you updated on industry Trends and make you more attractive to potential employers if you are looking for a well curated allrounder then we have got you covered simply learn offers a wide range of courses on data science and data analytics starting from Masters professional certifications to post graduations and boot camps from globally reputed and recognized universities for more details check out the links in the description box below and comment section now proceeding ahead we have the projects for data analyst data analyst this projects demonstrate practical skills in data cleaning visualization and Analysis they help build a portfolio showcasing your expertise and problem solving abilities projects provide hands-on experience Bridging the Gap between Theory and real world application this show domain knowledge making you more appealing to employees in specific Industries projects enhance your confidence and prepare you to discuss real world challenges in interviews proceeding ahead the next step is about the portfolio for data analysts a portfolio is a testament that demonstrates your skill and expertise through real world projects showcasing your ability to analyze and interpret data effectively it provides tangible proof of your capabilities making you stand out to the employers additionally it highlights your domain knowledge and problem solving skills giving you a Competitive Edge during job applications and interviews last but not the least data analyst internships internships provide hands-on experience with real world sets tools and workflows Bridging the Gap between Theory knowledge and practical application they offer exposure to Industry practices helping you understand how data is used to drive decisions internships also build you Professional Network enhance your resuming and improve chances of securing a full-time data analy role so let’s understand what 10 year diagram is an entity relationship diagram describes the relationship of entities that needs to to be stored in a database ER diagram is mainly a structural design for the database it is a framework made using specializ symbols to define the relationship between entities ER diagrams are created based on the three main components entities attributes and relationships let’s understand the use of ER diagram with the help of a real world example here a school needs all its Student Records to be stored digitally so they approach an IT company to do so a person from the company will meet the school authorities note all their requirements describe them in the form of ear diagram and get it cross checked by the school authorities as the school authorities approve the year diagram the database Engineers would carry further implementation let’s have a view of an ear diagram the following diagram showcases two entities student and course and the relationship the relationship described between student and course is many to many as a course can be opted by several students and a student can opt for more than one course here student is the entity and it processes the attributes that is student ID student name and student age and the course entity has attributes such as course ID and course name now we have an understanding of Y diagram let us see why it has been so popular The Logical structure of the database provided by a diagram communicates the landscape of business to different teams in the company which is eventually needed to support the business year diagram is a GUI representation of The Logical structure of a database which gives a better understanding of the information to be stored in a database database designers can use ER diagrams as a blueprint which reduces complexity and helps them save time to build databases quickly ear diagrams helps you identify the enti ities that exist in a system and the relationships between those entities after knowing its uses now we should get familiar with the symbols used in your diagram the rectangle symbol represents the entities oral symbol represents attributes a rectangle embedded in a rectangle represents a weak entity a dashed toal represents a derived attribute a diamond symbol represents a relationship among entities double all symbol represents multivalued attributes now we should dive in and learn about the components of ER diagram there are three main components of ER diagram entity attribute and relationship entities have weak entity attributes are further classified into key attribute composite attribute multivalued attribute and derived attribute relationships are also classified into one to one relationships one to many relationships many to one relationships and many to many relationships let’s understand these components of V diagram starting with entities an entity can be either a living or a non- living component an entity is showcased as a rectangle in a near diagram let’s understand this with the help of a near diagram here both student and course are in rectangular shape and are called entities and they represent the relationship study in a diamond shape let’s transition to weak entity and an entity that makes Reliance over another entity is called a weak entity the weak entity is showcased as a double rectangle in ER diagram in the example below the school is a strong entity because it has a primary key attribute School number unlike the school the classroom is a weak entity because it does not have any primary key and the room number attribute here acts only as a discriminator and not a primary key now let us know about attributes attribute an attribute exhibits the properties of an entity an attribute is Illustrated with an oval shape in an ER diagram in the example below student is an entity and the properties of student such as address age name and role number are called its attributes let’s see our first classification under attribute that is key attribute the key attribute uniquely identifies an entity from an entity set the text of a key attribute is underlined in the example below we have a student entity and it has attributes name address role number and age but here role number can uniquely identify a student from a set of students that’s why it is termed as a key attribute now we will see composite attribute an attribute that is composed of several other attributes is known as a composite attribute and oval showcases the composite attribute and the composite attribute oval is further connected with other ovals in the example below we can see an attribute name which can have further subparts such as first name middle name and last name these attributes with further classification is known as composite attribute now let’s have a look at multivalued attribute an attribute that can possess more than one value are called multivalued attributes these are represented as double old shape in the example below the student entity has attributes phone number role number name and age out of these attributes phone number can have more than one entry and the attribute with more than one value is called multivalued attribute let’s see derived attribute an attribute that can be derived from other attributes of the entity is known as a derived attribute in the ER diagram the derived attribute is represented by dashed over and in the example below student entity has both date of birth and age as attributes here age is a derived attribute as it can be derived by subtracting current date from the student date of birth now after knowing attributes let’s understand relationship in ER diagram a relationship is showcased by the diamond shape in the year diagram it depicts the relationship between two entities in the below for example student study course here both student and course are entities and study is the relationship between them now let’s go through the type of relationship first is one to one relationship when a single element of an entity is associated with a single element of another entity this is called one to one relationship in the example below we have student and identification card as entities we can see a student has only one identification card and an identification card is given to one student it represents a one to one relationship let’s see the second one one to many relationship when a single element of an entity is associated with more than one element of another entity is called one to many relationship in the below example a customer can place many orders but a particular order cannot be placed by many customers now we will have a look at many to one relationship when more than one element of an entity is related to a single element of another entity it is called many to one relationship for example students have to opt for a single course but a course can be opted by number of students let’s see many to many relationship when more than one element of an entity is associated with more than one element of another entity is called many to many relationship for example an employee can be assigned to many projects and many employees can be assigned to a particular project now after having an understanding of ER diagram let us know the points to keep in mind while creating the year diagram first identify all the entities in the system embed all the entities in a rectangular shape and label them appropriately this could be a customer a manager an order an invoice a schedule Etc identify relationships between entities and connect them using a diamond in the middle illustrating the relationship do not connect relationships connect attributes with entities and label them appropriately and the attribute should be in Old shape assure that each entity only appears a single time and eradicate any redundant entities or relationships in the ear diagram make sure your ER diagram supports all the data provided to design the database make effective use of colors to highlight key areas in your diagrams there are mainly four types of SQL commands so first we have data definition language or ddl so ddl commands change the structure of the table like creating a table deleting a table or altering a table all the commands of ddl are autoc committed which means it permanently save all the changes in the database we have create alter drop and truncate as ddl commands next we have data manipulation language or DML so DML commands are used to modify a database it is responsible for all forms of changes in the database DML commands are not autoc committed which means it can’t permanently save all the changes in the database we have select update delete and insert as DML commands now select command is also referred to as dql or data query language third we have data control language or DCL so DCL commands allow you to control access to data within the database these DCL commands are normally used to create objects related to user access and also control the distribution of privileges among users so we have Grant and revok which are the examples of data control language finally we have something called as transaction control language or TCL so TCL commands allow the user to manage database transactions commit and roll back our example of TCL now let’s see the basic SQL command structure so first we have the select state stat M so here you specify the various column names that you want to fetch from the table we write the table name using the from statement next we have the we Clause to filter out our table based on some conditions so you can see here we condition one condition two and so on then we have the group by Clause that takes various column names so you can write Group by column 1 column 2 and so on next we have the having Clause to filter out tables based on groups finally we have the order by Clause to filter out the result in ascending or descending order now talking about the various data types in SQL so we have exact numeric which has integer small int bit and decimal then we have approximate numeric which are float and real then we have some date and time data types such as date time time stamp and others then we have string data type which includes car the varar car and text finally we have binary data types and binary data types have binary VAR binary and image now let’s see some of the various operators that are present in SQL so first we have our basic arithmetic operators so you have addition the substraction multiplication division and modulus then we have some logical operators like all and any or between exist and so on finally we have some comparison operators such as equal to not equal to that’s greater than less than greater than equal to or less than equal to not less than or not greater than now let me take you to my MySQL workbench where we will learn to write some of the important SQL commands use different statements functions data types and operators that we just learned in this session we will learn how to install MySQL workbench and then we will run some commands firstly we will visit the official Oracle website that is myql.com and now we’ll move to the downloads page now scroll down and click on my SQL GPL downloads now under Community downloads click on my SQL installer for Windows the current versions are available to download I will choose this installer and click the download button now here just click on no thanks just start my download Once the installer has download it open it you may be prompted for permission click yes this opens the installer we will be asked to choose the setup type we will go with custom click next now you have to select the products you want to install we will install only the MySQL server my SQL shell and the MySQL workbench expand my SQL servers by double clicking on it and choose the version you want to install and click on this Arrow now you have to do the same thing for applications expand applications and choose the MySQL workbench version you want to install and click on the arrow and we’ll do the same thing for my SQL shell we’ll choose the latest version click on the Arrow so these are the products that have to be installed in a system now we will click next I’ll click execute to download and install the server this may take a while depending on your internet speed as the download is completed click next now you see the product configuration click next now we’ll configure our SQL Server instance here we will go with the default settings and click next and under authentication select use strong password encryption for authentication which is recommended and click on next now here set the password for your root user by the way root is the default user this user will have access to everything I will set my password now I’ll click on next and here also we’ll keep the default settings and click on next now to apply configuration we will execute the process once Sol the conf ification steps are complete click finish now you will see the installation is complete it will launch my SQL workbench and my SQL shell after clicking on finish now the shell and workbench has started now we’ll connect by clicking on the root user it will ask for a password enter the password and it will connect successfully yeah the workbench has started now we’ll just connect the server so first we’ll open command prompt now we will reach the path where MySQL files are present you go into this PC local d c program files my SQL my SQL Server 8.0 bin and now I’ll copy this path now we’ll open the command prom and write a command CD space and paste the link and press enter now we write another command that will be my SQL minus u space root minus p and enter now it will ask for your password just enter the password and press enter now the server has started and now we’ll see some commands in my SQL workbench first we will open my SQL workbench now we’ll click on the local instance my SQL 80 and enter the password to connect to the Local Host yeah the my SQL workbench has started now we’ll see some commands the First Command we will see is show databases show databases semicolon and now we will select the whole command and click on this execute button and here we will see the result in the result grit these are the databases that are stored already in the database now there are four databases that is information schema MySQL performance schema and SS now we will select one of the database we will use uh my SQL now we have selected the mySQL database and now in this database we will see which tables are stored in this mySQL database to see that we will run a command show tables we’ll select the command and click on the execute button the these are the tables that are stored in this mySQL database that is columns _ PR component DP and much more now let me now go ahead and open my MySQL workbench so in the search bar I’ll search for MySQL workbench you can see I’m using the 8.0 version I’ll click on it and here it says welcome to my SQL workbench and Below under connections you can see I have already created a connection which says local instance then you have the root the local host and the port number let me click on it you can see the service the username is root and I’ll enter my password and hit okay now this will open the SQL editor so this is how the MySQL workbench looks like here we learn some of the basic SQL commands so first let me show you the databases that are already present so the command is so databases you can hit tab to autoc complete I’ll use a semicolon I’ll select this and here on the top you can see the execute button so if I run this below you can see the output it says show databases seven rows are returned which means currently there are seven databases you can see the names all right now let’s say I want to see the T tables that are present inside this database called world so I’ll use the command use World which is the database name now let me run it so currently I’m using the world database so to display the tables that are present in the world database I can use the show command and write show tables give a semicolon and I’ll hit control enter this time to run it all right so you can see the tables that are present inside this world database so we have three tables in total City Country and Country language now if you are to see the rows that are present in one of the tables you can use the select command so I’ll write select star which basically means I want to display all the columns so star here means to display all the columns then I’ll write my from the table name that is City so this command is going to display me all the rules that are present inside the city table so if I hit control enter all right you can see the message here it says th000 rows were returned which means there were total thousand records present inside the city table so here you can see there’s an ID column a name column this country code district and population all right similarly you can check the structure of the table by using the describe command so I’ll write describe and then I’ll give the table name that is City now let’s just run it there you go the field shows the column names so we have ID name country code district population type here shows the data type of each of the columns so district is character 20 ID is an integer population is also integer null says yes or no which means if no then there are no null values if it’s yes which means there are null values in your table key here represents whether you have any primary key or foreign key and these are some extra information now let’s learn how to create a table in my SQL so I’ll use the create table command for this and before that let me create a database and I’ll name it as SQL intro so the command is create database and I’ll give my database name that is SQL intro me give a semicolon and hit control enter so you can see I have created a new database now if I run this command that is show databases you can see this newly created database that is SQL intro if I scroll down there you go you can see the name here SQL intro okay now within this database we’ll create a table called employee details now this will have the details of some employees so let me first show you how to create a table that will be present inside the SQL intro database so I’ll use the command create table and then I’ll give my table name that is going to be employee uncore details next the syntax is to give the column names so my first column would be the name column which is basically the employee name followed by the data type for this column since name is a text column so I’ll use varar and I’ll give a value of 25 so it can hold only 25 characters okay next I also want the age of the employee now age is always an integer so I’ll give int okay then we can have the gender of the employee so gender can be represented as f for m f for female and M for male so I’m using the card data type or character data type and I’ll give the value as one then let’s have the date of join or doj and this is going to be of data type date all right next we’ll have the city name that is the city to which the employee belongs to so again again this is going to be warar 15 finally we’ll have a salary column and salary we’ll keep it as float since salary can be in decimal numbers as well now I’ll give a semicolon all right so let me just quickly run through it so first I wrote my create command then the table which is also a keyword followed by the table name which is employee details here and then we give the column names such as name age this gender date of join City and salary for each of the columns we also give the data type all right so let me just run it okay so here you can see we have successfully created our first table now you can use the describe command to see the structure of the table I’ll write this describe empore details if I run this there you go so under field you can see the column names then you have the data types null represents if the table can accept null values or not and these are basically empty and we haven’t set any default constraint all right moving ahead now let’s learn to add data to our table using the insert command so on a notepad I have already written my insert statement so let me just copy it and then I’ll explain it one by one all right so if you see this so we have used an insert into statement or a command followed by the table name that is EMP details then this is the syntax using values I have passed in all the records so first we have Jimmy which is the name of the employee then we we have 35 it basically represents the age then m means the gender or the sex then we have the date of join next we have the city to which the employee belongs to and finally we have the salary of the employee so this particular information represents one record or a tle similarly the next employee we have is Shane you can see the age and other information then we have Mary this Dwayne Sara and am all right so let me go ahead and run this so this will help you insert the values in the table that you have created you can see we have successfully inserted six records now to display the records let me use the select statement so I’m using select star from empore details if I run this you can see my table here and the values it has so we have the name column the age column the state of join City salary and these are the values that you can see here moving ahead now let’s say you want to see the Unique city names present in the table so in this case you can use the distinct keyword along with the column name in the select statement so let me show you how you can print the distinct city names that are present in our table now if you notice this table clearly we have Chicago Seattle Boston Austin this New York and this Seattle repeated again so I only want to print the unique values so for that I can write my select statement as select distinct then I’ll give my column name which is City from my table name that is EMP details if I run this you can see my query has returned five rows and these are the values so we have Chicago cattl which was repeated twice is just been shown once then we have Boston Austin and New York now let’s see how you can use inbuilt aggregate functions in SQL so suppose you want to count the number of employees in the table in that case you can use the count function in the select statement so let me show you how to do that so I’ll write select I’ll use my function name name that is Count now since I want to know the total number of employees I’m going to use their name inside the brackets from employee _ details now if I run this this will return the total number of employees that are present in the table so we have six employees in total now if you see here in the result it says count name now this column is actually not readable at all so what SQL provides something called as an alas name so you can give an alas to the resultant output so here I can write select count of name and use an alas as as I can give an alas as countor name and run this statement again there you go you can see here in the resultant output we have the column name as count name which was Alias name now suppose you want to get the total sum of salaries you can use another aggregate function called sum so I’ll write my select statement and this time instead of count I’m going to write sum and since I want to find the sum of salaries so inside the bracket I’ll give my salary column from my table name that is employee details if I run this this will result the total sum of salaries so basically it adds up all the salaries that were present in the salary column now let’s say you want to find the average salary so instead of sum you can write the average function which is ABG so this will give you the average salary from the column salary so you can see it here this says average salary now if you want you can give an alas name to this as well now you can select specific columns from the table by using the column names in the select statement so initially we were selecting all the columns for example like you saw here the star represents that we want to see all the columns from the employee details table now suppose you want to see only specific columns you can mention those column names in the select statement so let’s say I want to select just the name age and the city column from my table that is employee details so this will result in displaying only the name age and City column from the table if I run it there you go it has given only three columns to me now SQL has a we Clause to filter rows based on a particular condition so if you want to filter your table based on specific conditions you can use we Clause now we Clause comes after you give your table name so suppose you want to find the employees with age greater than 30 in this case you can use a we Clause so let me show you how to do it I’ll write select star from my table name that is employee details and after this I’ll use my wear Clause so I’ll write where age greater than 30 if I run this it will give me the output where the age is only greater than 30 so it excluded everything that is less than 30 so we have four employees whose age is greater than 30 here now suppose you want to find only female employees from the table you can also use a wear Clause here so I’ll write select let’s say I want only the name the gender which is sex here comma City from my table that is employee details where I’ll give my column name that is sex is equal to since I want only the female employees I’ll give F and run this statement okay you can see here our employee table has three female employees now suppose you want to find the details of the employees who belong to Chicago or Austin in this case you can use the or operator now the or operator in SQL displays a record if any of the condition separated by R is true so let me show you what I mean so since I want the employees who are from Chicago and Austin I can use an or operator so I’ll write select star from EMP details which is my table name then I’ll give my we Clause where City equal to I’ll give my city name as Chicago and then I’m going to use the or operator or city equal to I’ll write Austin I’ll give a semicolon and let me run it there you go so in the output you can see all the employees who belong to the city Chicago and Austin now there is another way to write the same SQL query so you can use an in operator to specify by multiple conditions so let me just copy this and instead of using the r operator this time I’m going to use the in operator so I’ll delete this after the wear Clause I’m going to write where City and use the in operator inside bracket I’ll give my city names as shago and I want Austin so I’ll give a comma and write my my next city name that is Austin so this query is exactly the same that we wrote on top let me run this you will get the same output there you go so we have Jimmy and Dwayne who are from Chicago and Austin respectively now SQL provides the between operator that selects values within a given range the values can be numbers text or dates now suppose you want to find the employees whose date of join was between 1st of Jan 2000 and 31st of December 2010 so let me show you how to do it I’ll write select star from EMP details where my date of join that is doj between I’ll give my two date values that is 1st of Jan 2000 and I’ll give my second value the date value that is 31st of December 2010 so every employee who has joined between these two dates will be displayed in the output if I run it we have two employees who had joined between 2000 and 2010 so we have Jimmy and Mary here who had joined in 2005 and 2009 respectively all right now in we Clause you can use the and operator to specify multiple conditions now the and operator displays a record if all the conditions separated by and are true so let me show you an example I’ll write select star from employee details table where I want the age to be greater than 30 and I want sex to be male all right so here you can see I have specified two conditions so if both the conditions are true only then it will result in an output if I run it you can see there are two employees who are male and their age is greater than 30 now let’s talk about the group by statement in SQL so the group by statement groups rows that have the same values into summary rows like for example you want to find the average salary of customers in each department now the group by statement is often used with aggregate functions such as count sum and average to group the result set into one or more columns let’s say we want to find the total salary of employees based on the gender so in this case you can use the group by Clause so I’ll write select let’s say sex comma I want to find the total sum of salary as I’ll give an alas name let’s say total salary from my table name that is employee details next I’m going to group it by sex okay let me run it there you go so we have two genders male and female and here you can see the total salary so what this SQL statement did was first it grouped all the employees based on the gender and then it found the total salary now SQL provides the order by keyword to sort the result set in ascending or descending order now the order by keyword sorts the records in ascending order by default to sort the records in descending order you can use the dec keyword so let’s say I want to sort my employee details table in terms of salary so I’ll write select star from empore details and I’ll use my order by clause on the salary column so this will sort all the records in ascending order of their salary which is by default you can see the salary column is sorted in ascending order now suppose you want to sort the salary column and display it in descending order you can use this keyword that is DEC let me run it you can see the output now this time the salary is sorted in descending order and you have the other values as well now let me show you some basic operations that you can do using the select statement so suppose I write select and do an addition operation let’s say 10 + 20 and I’ll give an alas name as addition if I run this it will give me the sum of 10 and 20 that is 30 similarly you can use the subtraction operator and you can change the alas name as let’s say subtract let’s run it you get minus 10 now there are some basic inbuilt functions there are a lot of inbuilt functions in SQL but here I’ll show you a few suppose you want to find the length of a text or a string you can use the length L function so I’ll write select and then use the length function I’ll hit tab to autocomplete let’s say I want to find the length of country India and I’ll give an alas as total length if I run it you see here it returns five because there are five letters in India there’s another function called repeat so let me show you how repeat works so I’ll write select repeat let’s say I want to repeat the symbol that is at the rate I’ll put it in single codes because it is a text character and I want to repeat this character for 10 times close the bracket and let’s run it you can see here in the output it has printed at the rate 10 times you can count it all right now let’s say you want to convert a text or a string to upper case or lower case you can do that as well so I’ll write select and use the function called upper let’s say I want to convert my string that is India to uppercase I’m not giving in any alas name if I run this see my input was capital I and everything else was in small letter in the output you can see it has converted my input to all caps similarly you can change this let’s say you want to print something in lower case you can use the lower function let’s say this time everything is in upper case if I run it it converts India to lower case now let’s explore a few date and time functions let’s say you want to find the current date there’s a function called C which stands for current and this is the function I’m talking about which is current date if I run this you will get the current date that is 28th of Jan 2021 and let’s say you want to extract the day from a date value so you can use the D function let’s say I’ll use D and I want to find the D from my current date if I run this you get 28 which is today’s day now similarly you can also display the current date and time so for that you can use a function that is called now so this will return the current date and time you can see this is the date value and then we have the current time all right and this brings us to the end of our demo session so let me just scroll through whatever we have learned so first I showed you how you can see the databases present in my SQL then we use used one of the databases and checked the tables in it then we created another database called SQL intro for our demo purpose we used that database and then we created this table called employee details with column names like name integer the sex date of joints City and salary I showed you the structure of the database let me run this again so you get an idea you can see this was the structure of our table the then we went ahead and inserted a few records so we inserted records for six employees so you have the employee name the age the gender the date of join the city to which the employee belongs to and the salary of the employee then we saw how you can use the select statement and display all the columns present in the table we learned how you can display the Unique city names we learned how to use different aggregate function like count average and sum then we learned how you could display specific columns from the table we learned how to use we Clause then we used an R operator we learned about in operator the between operator then we used an and operator to select multiple conditions finally we learned about group buy order buy and some basic SQL operations now it’s time to explore some string functions in MySQL so I have given a comment string functions first let’s say you want to convert a certain string into upper case so I can write select the function I’ll use is upper and within this function you can pass in the string let’s say I’ll write India if you want you can give an alas name as let’s say uppercase I’ll give a semicolon and let’s run it there you go so my input was in sentence case and using the upper function we have converted everything into uppercase similarly let me just copy this and I’ll show you if you want to convert a string into a lower case you can use the lower function I’ll run this you can see the result everything is in lower case now of course I need to change the alas name to lower case instead of using lower as the function there is another function that MySQL provides which is called the L case so I’ll just edit this and write L case and let’s say I’ll write India in uppercase let’s run it returns me the same result cool moving on let’s say you want to find the length of a string you can use the character length function I’ll write select use the function character length and I’m again going to pass in my string as India as let’s say total length let’s run it this time I’m going to hit control enter to run my SQL command there you go it has given us the right result which is five because India has five characters in it now these functions you can also apply on a table now let me show you how to do it let’s say we already have the students table and you want to find the length of each of the student names so here you can pass sore name and you can give the same alas name let’s say total length and then you can write from table name that is students if I run this you can see the output it has given me total 20 rows of information this not readable actually let me also so display these student names so that we can compare their length all right I’ll run this again and now you can see the result so Joseph has six characters NES has six vipul has five anubhab has seven similarly if you see Aki has six Tanish has seven ragav has six Cummins has seven rabada has six so on and so forth now instead of using this character length you can also use the function car length it will work the same way let’s see the result there you go it has given us the same result you can either use character length or car length there’s another very interesting function called concat so the concat function adds two or more Expressions together let’s say I’ll write select use the function concatenate the function is actually concat and I’m going to pass in my string values let’s say India is in Asia let’s run this and see our result you can see see here it has concatenated everything let us make it more readable I’ll give a space in between so that you can read it clearly now this is much more readable India is in Asia and if you want you can give an alas name as well as let’s say merged there you go now the same concat operation you can also perform on a table I’m going to to use the same students table let’s say I want to return the student ID followed by the student name and then I am going to merge the student name followed by your space followed by the age of the student and I can give an alas as let’s say name _ AG from my table that is students let’s see how this works okay you see here the result is very clear we have the student ID the student name and the concatenated column that we created which was name _ age where we have the student name with a space followed by the age of the student if I scroll down you can see the rest of the results cool now moving ahead let’s see how the reverse function Works in MySQL so the MySQL reverse function returns a string with the characters printed in reverse order so suppose I write select reverse I’ll use the same string again let’s say I have India let’s run it you will see all the characters printed in reverse order again you can perform the same operation on a table as well let’s say I’ll write select reverse and I’ll pass in the column as student name from my table that is students let’s run it it gives you 20 students and all the names have been printed in reverse order okay now let’s see what the replace function does so the replace function replaces all occurrences of a substring within a string within a new substring so let me show you what I mean I’ll write select replace I’ll pass in my input string which is let’s say orange is a vegetable which is ideally incorrect I’m purposely writing this so that I can replace the word vegetable with fruit okay so what this replace function does is it is going to find where my word vegetable is within the string my input string and it is going to replace my word vegetable with fruit let’s run it and see the output there you go now this is correct which is Orange is a fruit all right now MySQL also provides some trim functions you can use the left trim right trim and just the trim function so let me show you how this left trim Works left trim or L trim removes the leading space characters from a string passed as an argument so see I write select I’ll use the left trim function which is L trim and then I’m going to purposely give a few pces in the beginning of the string I’ll give a word let’s say India and then I’ll give some space after the word India and see how the elri works if I run this it gives me India which is fair enough but before that let’s first find the length of my string so I’ll use my length function here and within this function I am going to find the length of my string which has India along with some leading and trailing spaces I’ll paste this here give a semicolon and I’ll run it okay so the entire string is 17 characters long or the length of the string is 17 now say I use lrim on my same string what it returns me is India and if I run length over it you can see the difference as in you can see how many spaces were deleted from the left of the string you can see here now it says 17 and I’m going to use lrim let’s see the difference it gives me 12 the reason being it has deleted five spaces from the left you can count it 1 2 3 4 and 5 so 17 – 5 is 12 which is correct similarly you can use the rri function which removes the trailing spaces from a string trailing spaces are these spaces when you use left Rim it deletes the leading spaces which is this now let me just replace L trim with r trim which stands for right trim and see the result so the length is 10 now the reason being it has deleted seven spaces from the right of the string if you can count it 1 2 3 4 5 6 and 7 cool you can also use the trim function which will delete both the leading and the TR in spaces so here if I just write trim and I’ll run it it gives me five because India is total five characters long and it has deleted all the leading and the trailing spaces all right there’s also a function called position in MySQL the position function Returns the position of the first occurrence of a substring in a string so if the substring is not found with the original string the function will return zero so let’s say I’ll write select position I want to find where fruit is in my string that is Orange is a fruit I’ll give an alas as name there some error here this should be within double quotes now let’s run it and see the result okay it says at the 13th place or at the 13th position we have the word fruit in our string which is Orange is a fruit now the final function we are going to see is called asky so the asky function Returns the asky value for a specific character let’s say I write select ask key of the letter small a if I run this it will give me the ask key value which is 97 let’s say you want to find the ask key value of 4 let’s see the result it gives me 52 all right in this session we are going to learn two important SQL statements or Clauses that are widely used that is Group by and having first we’ll understand the basics of group by and having and then jump into my SQL workbench to implement these statements so let’s begin first what is Group by in SQL so the group by statement or Clause groups records into summary rows and returns one record for each group it groups the rows with the same group by item expressions and computes aggregate functions for the resulting group a group by Clause is a part of Select expression in each group no two rows have the same value for the grouping column or columns now below you can see the syntax of group by so first we have the select statement and Then followed by the column names that we want to select from we have the table name followed by the wear condition and next we have the group by clause and here we include the column names finally we have the order by and the column names now here is an example of the group by Clause so we want to find the average salary of employees for each department so here you can see we have the employees table it has the employee ID the employee name the age of the employee we have the gender the date on which the employeer had joined the company then we have the department to which each of these employees belong to we have the city to which the employees belong to and then we have the salary in dollars so actually we’ll be using this employees table on my SQL workbench as well so if you were to find the average salary of employees in each department so this is how your SQL query with Group by Clause would look like so we have selected department and then we are using an aggregate function that is AVG which is average and we have chosen the salary column and here we have given an alias name which is average uncore salary which appears in the output you can see here from employees and we have grouped it by department so here in the output you can see we have the department names and the average salary of the employees in each department now let me take you to my MySQL workbench where we’ll Implement Group by and solve specific problems okay so I am on my MySQL workbench so let me make my connection first I’ll enter the password so this will open my SQL editor so first of all let me check the databases that I have so I’ll use my query that is show databases let’s run it okay you can see we have a list of databases here I’m going to use my SQL intro database so I’ll write use SQL intro so this will take us inside this database I run it all right now you can check the tables that are present in SQL intro database if I write show tables you can see the list of tables that are already present in this database to do our demo and understand Group by as well as having let me first create an employee table so I’ll write create table employees next I’ll give my column name as employee _ ID which is the ID for each employee I’ll give my data type as integer and I’ll assign employee ID as my primary key next I’ll give employee name and my data type would be varar I’ll give the size as 25 my third column would be the age column age would obviously be an integer then I have my gender column I’ll use character data type and assign a value of one or size of one next we have the date of join and the data type will be date we have the department column as well this is going to be of varar and 20 will be the size next we have the city column which is actually the city to which the employee belongs to and finally we have the salary column which will have the salary for all the employees okay now let me select and run this you can see here we have successfully created our table now to check if our table was created or not you can use the describe command I’ll write describe employees you can see the structure of the table so far all right now it’s time for us to insert a few records into this employees table so I’ll write insert into employees and I’ll copy paste the records which have already written on a notepad so let me show you so this is my EMP notepad and you can see I have already put the information for all the employees so let me just copy this and we’ll paste it here all right let me go to the top and verify if all the records are fine all right so let’s run our insert query okay so you can see here we have inserted 20 rows of information and now let’s check the table information or the records that are present in our employees table I’ll write select star from employees if I run it you can see here I have my employee ID the employee name age gender we have the city salary and in total we have inserted 20 records now let me run a few SQL commands to check how the structure of our table is let’s say I want to see the distant cities that are present in our table so I’ll write select distinct City from employees if I run on this you see here there are total eight different cities present in our employees table so we have Chicago the Seattle Boston we have New York Miami and Detroit as well now let’s see you want to know the total number of departments that are present so you can use distrct Department if I run this all right you can see we have seven rows returned and here are the department names so we have sales marketing product Tech it finance and HR all right now let me show you another SQL command now this is to use an aggregate function so I want to find the average age of all the employees from the table so I can write select AVG which is the aggregate function for average inside that I have passed my age column from employees if I run this so the average age of all the employees in our table is 33.3 now say you want to find the average age of employees in each department so for this you need to use the group by Clause I’ll give a comment here I want to find the average each in each department so I’ll write select Department comma I’ll write average of age from employees Group by department now if I run this you can see here we have our seven departments on the left and on the right you can see the average age of employees in each of these departments now you can see here in the output it says AVG of age which is not readable so I can give an alas name as average age all right I can bring this down and if you want you can round the values also so you can round the decimal places so I’ll use a round function before the average function and the round function takes two parameters one is the variable and the decimal place you want to round it to so if I run this there you go you can see here we have the average age of all the employees in each of these departments all right now suppose you want to find the total salary of all the employees for each department so you can write select Department comma Now I want the total salary so I’ll use the sum function and I’ll pass my column as salary from employees Group by Department let’s run this query you can see here in the output we have the different departments and on the right you can see the total salary of all the employees in each of these departments now here also you can give an alas name as total underscore salary let’s run it again and you can see the output here all right now moving ahead you can also use the aut by Clause along with the group by Clause let’s say you want to find the total number of employees in each City and group it in the order of employee ID so to do this I can use my select query I’ll write select count of let’s say employee ID and I want to know the city as well from employees Group by City And next you can use the order by Clause I’ll write order by count of employee ID and I’ll write DEC which stands for descending if I run this query you can see here on the left you have the count of employees and on the right you can see the city names so in Chicago we had the highest number of employees working that was four then we had Seattle Houston Boston Austin and the remaining also had two employees so in this case we have ordered our result based on the count of employee ID in descending order so we have the highest number appearing at the top and then followed by the lowest okay now let’s explore another example suppose we want to find the number of employees that join the company each year we can use the year function on the date of joining column then we can count the employee IDs and group the result by each year so let me show you how to do it so I’ll write select I’m going to extract Year from the date of join column I’ll give an alas as year next I’ll count the employee ID from my table name that is employees and I’m going to group it by Year date of join we give a semicolon all right so let’s run this great you see here in the result we have the year that we have extracted from the date of join column and on the right you can see the total number of employees that joined the company each year so we have in 2005 there was one employee similarly we have in 2009 there were two employees if I scroll down you have information of other years as well now if you want you can order this as well based on year or count okay now you can also use the group bu to join two or more tables together so to show you this operation let me first create a sales table so I’ll write create table sales and the sales table will have column such as the product ID which is going to be of integer type then we have the selling price of the product now this will be a float value then we have the quantity sold for each of the products so I’ll write quantity quantity will of integer type next we have the state in which the item was sold and state I’ll put it as worker and give the size as 20 let’s run this so that we’ll create our sales table all right so we have successfully created our sales table next we need to insert a few values to our sales table so I’ve have already written the records in a notepad let me show you okay so here you can see I have my sales text file let me just copy these information I’ll just paste it on the query editor okay now let me go ahead and run this insert command all right so you can see here we have successfully inserted nine rows of information so let me just
run it through what we have inserted so the First Column is the product ID column then we have the selling price at which this product was sold then we have the quantity that was sold and in which state it was sold so we have California Texas Alaska then we have another product ID which is 123 and these are the states in which the products were sold so let me just confirm with the select statement I’ll write select star from sales I run this you can see we have successfully created our table okay now suppose you want to find the revenue for both the product IDs one to one and let’s say 1 to three since we have just two product IDs here so for that you can use the select query so I’ll write select product ID next I want to calculate the revenue so revenue is nothing but selling price multiplied by the quantity so I’ll use the sum function to find the total revenue and inside the sum function I’ll use my selling price column multiplied by my quantity column I’ll give this an alas name as revenue from my table name that is sales finally I’ll group it by product ID let’s run it there you go so here you can see we have the two product IDs one 121 and 1 12 3 and here you can see the revenue that was generated from these two products all right now let’s see we have to find the total profit that was made from both the products 1 to 1 and 1 to 3 so for that I’ll create another the table now this table will have the cost price of both the products so let me create the table first I’ll write create table let’s say the table name is C product which stands for the cost price of the products I’ll give my first column as product ID this will be an integer and I’ll have my second column as cost price cost price will have floating type values let’s run this so we have successfully created our product cost table now let me insert a few values into the C product table so I’ll write insert into ccore product I’ll give my values for one to one let’s say the cost price was $270 for each and next we have my product as 123 and let’s say the cost price for product 1 123 was $250 let’s insert these two values okay next we’ll join our sales table and the product cost table so this will give us the profit that was generated for each of the products so I’ll write select C do productor ID comma I’ll write sum s. cellor price now here C and S are alas names so if I subtract my cost price from the selling price that will return the profit that was generated I’ll multiply this with s do quantity close the bracket I’ll give an alas name as profit from sales as s so here s stands for the sales table I’m going to use inner join ccore product table as the Alias name should be C where s do productor ID is equal to C do productor ID we are using product underscore ID because this column is the common column to both the tables and finally I’m going to group it by C do productor ID all right so let me tell you what I have done here so I’m selecting the product ID next I’m calculating the profit by subtracting the cost price from the selling price and I multiplied the quantity column I’m using an join to connect my sales and the product cost table and I am joining on the column that is product ID and I have grouped it by c. product ID let’s run this there you go so here you can see for product id1 121 we made a profit of $1,100 and for product ID 1 123 you made a profit of $840 so now that we have learned Group by in detail let’s learn about the having clause in SQL the having clause in SQL operates on grouped records and returns rows where aggregate function results matched with given conditions only so now having and wear Clause are kind of similar but we Clause can’t be used with an aggregate function so here you can see the syntax of having Clause you have the select statement followed by the column names from the table name then we have the we conditions next we have the group bu finally we have having and at last we have order by column names so you can see here we have a question at hand we want to find the cities where there are more than two employees so you can see the employee table that we had used in our group by Clause as well so if you were to find the cities where there are more than two employees so this is how your SQL queries should look like so we have selected the employee ID and we are finding out the count using the count function next we have selected the city column from employees we have grouped it by City And then we have used our having Clause so we have given our condition having count of employee ID should be greater than two so if you see the output we have the different city names and these were the cities where the count of employees was greater than two all right so let’s go to our MySQL workbench and Implement how having works so suppose you want to find those departments where the average salary is greater than $75,000 you can use the having clause for this so let me first run my table which is employees if I run this you can see we had inserted 20 rows of information and the last column we had was salary so the question we have is we want to find those departments where the average salary is greater than $75,000 so let me show you how to do it so I’ll write select Department comma I’ll use the aggregate function that is average salary I’ll give an alas name as AVG underscore salary from employees next we’ll use the group by clause and I want to group it by each department and then I’m going to write my having Clause so in having Clause I’ll use my condition that is having average of salary greater than $75,000 let’s run it and see the output there you go so here you can see there were total three departments in the company that is sales finance and HR where the average salary is greater than $775,000 okay next let’s say you want to find the cities where the total salary is greater than $200,000 so this will again be a simple SQL query so I’ll write select City comma I want to find the total salary so I’ll use the sum function and I’ll pass my column as salary as I’ll give a alas name as total from employees Group by City And then I am going to use my having Clause I’ll pass in my condition as having sum of salary greater than $200,000 all right so let’s run this query there you go so so the different cities are Chicago Seattle and Houston where the total salary was greater than $200,000 now suppose you want to find the Departments that have more than two employees so let’s see how to do it I’ll write select Department comma this time since I want to find the number of employees I’m going to use the count function I’ll write count Star as employee uncore count or empore count which is my alas name from employees next I’ll group it by Department having I’ll give my condition count star greater than 2 let’s run this okay so you have departments such as sales product Tech and it where there are more than two employees okay now you can also use a wear Clause along with the having clause in an SQL statement so suppose I want to find the cities that have more than two employees apart from Houston so I can can write my query as select City comma count Star as EMP count from employees where I’ll give my condition City not equal to Houston I’ll put it in double code since I don’t want to see the information regarding Houston I’ll group it by City having count of employees greater than two so if I run this query you see we have information for cicago and cattl only and we have excluded the information for Houston now you may also use aggregate functions in the having Clause that does not appear in the select Clause so if I want to find the total number of employees for each department that have an average salary greater than $75,000 I can write it something like this so select Department comma count star as EMP count from employees Group by department and in the having Clause I’m going to provide the column name that is not present in the select expression so I’ll write having average salary greater than 75,000 this is another way to use the having Clause let’s run this all right you can see we have department sales finance and HR and you can see the employ count where the average salary was greater than 75,000 okay so let me run you from the beginning what we did in our demo so first we created a table called employee then we inserted 20 records to this table next we explored a few esql commands like distinct then we used average and finally we started with our group by Clause followed by looking at how Group by can be used along with another table and we joined two tables that was sales and product cost table to find out the profit then you learned how to use the having Clause so we explored several different questions and learned how to use having an SQL in this session we will learn about joints in SQL joints are really important when you have to deal with data that is present on multiple tables I’ll help you understand the basics of joints and make you learn the different types of joints with Hands-On demonstrations on MySQL workbench so let’s get started with what are joints in SQL SQL joint statement or command is often used to fetch data present in multiple tables SQL joints are used to combine rows of data from two or more tables based on a common field or column between them now consider this example where we have two tables an orders table and a customer table now the order table has information about the order ID which is unique here we have the order date that is when the order was placed then we have the shipped date this has information about the date on which the order was shipped then we have the product name which basically is the names of different products we have the status of delivery whether the product was delivered or not or whether it was cancelled then we have the quantity which means the number of products that were ordered and finally we have the price of each product similarly we have another table called customers and this customer table has information about the order ID which is the foreign key here then we have the customer ID which is the primary key for this table we also have the phone number customer name and address of the customers now suppose you want to find the phone numbers of customers who have ordered a laptop now to solve this problem we need to join both the tables the reason being the phone numbers are present in the customers table as you can see here and laptop which is the product name is present in the orders table which you can see it here so using a join statement you can find the phone numbers of customers who have ordered a laptop now let’s see another problem where you need to find the customer names who have ordered a product in the last 30 days in this case we want the customer name present in the customer’s table and the last 30 days order information which you can get from the order date column that is present in the orders table okay now let’s let’s discuss the different types of joints one by one so first we have an inner joint so the SQL inner joint statement returns all the rows from multiple tables as long as the conditions are met from the diagram ADB you can see that there are two tables A and B A is the left table and B is the right table the orange portion represents the output of an inner joint which means an inner joint Returns the common records from both the tables now you can see the syntax here so we have the select command and then we give the list of columns from table a which you can see here is the left table followed by the inner join keyword and then the name of the table that is B on a common key column from both the tables A and B now let me take you to the MySQL workbench and show you how inner join Works in reality so here I’ll type MySQL you can see I have got my SQL workbench 8.0 version installed I’ll click on it it will take some time to open okay I’ll click on this local instance and here I’ll give my password okay so this is how an SQL editor on my SQL workbench looks like so first of all let me go ahead and create a new database so I’ll write create database this is going to be my command followed by the name of the database that is going to be SQL joints I give a semicolon and hit control enter this will create a new database you can see here one row affected now you can check whether the database was created or not using show databases command if I run it here you can see I have SQL joints database created now I’ll use this database so I’ll write use SQL joints okay now to understand inner join consider that there is a college and in every College you have different teams for different sports such as Cricket football basketball and others so let’s create two tables cricket and football so I’ll write create table and my table name is going to be cricet next I’m going to create two columns in this table the First Column is going to be cricet ID then I’m going to give the data type as int and use the autoincrement operator I’m using Auto increment because my Cricket ID is going to be my primary key then I’m going to give the name of the students who are part of the cricket team and for this I’ll use war card data type and give the length as 30 I’ll give another comma and I’ll assign my Cricket ID as primary key within brackets I’ll give ccore ID cricket ID is nothing but a unique identifier for each of the players like you have role numbers in college okay let me just run it all right so we have successfully created our cricket table similarly let me just copy this and I’ll paste it here I’ll create another table called football this will have the information of all the students who are part of the football team and instead of cricket I am going to give this as football idid all right and the name column will have the names of the students I’ll change my primary key to football ID all right let me run this okay so now we have also created our football table the next step is to insert a few player names into both the tables so I’ll write my insert into command first let’s load some data to our cricket table so I’ll write cricet and I’ll give my name column followed by values and here I’ll give some names such as let’s say Stuart we give another comma the next player I’ll choose is let’s say Michael similar I’ll add a few more let’s say we have Johnson the fourth player I’ll take is let’s say hidden and finally we have let’s a Fleming okay now I’ll give a semicolon and run this okay so let me just check if all the values were inserted it properly for this I’ll use select star from table that is Cricket if I run it you can see I have created a table and have successfully inserted five rows of information now similarly let’s insert a few student names for our football table so I’ll change this to football and obviously there would be students who will be part of both cricket and football team so I’ll keep a few repeated names let’s say Stuart Johnson and let’s say Hayden are part of both cricket and football team then we have let’s say Langer and let’s say we have another player in the football team that is astral I’ll just run it okay you can see there are no errors so we have successfully inserted values to our football team as well let me just recheck it I’ll write select star from football all right so we have five players in the football team as well okay now the question is suppose you want to find the students that are part of both the cricket and football team in this case you can use an inner join so let me show you how to do it so I’ll write select star from cricket as I’m using an alias name as C which stands for Cricket then I’m going to write inner join my next table is going to be football as F which is an alas name for the football table then I’m going to use the on command or operator and then I’ll give the common key that is name here so C do name is equal to F do name So based on this name column from both the table my inner John operation will be performed so let’s just run it there you go so Stuart Johnson and Hayden are the only three students who are part of both the teams all right now you can also individually select each of the columns from both the tables so let’s say I write select c. ccore ID comma C do name comma F do football ID comma f. name from I’ll write Cricket as C inner join football as F on C do name is equal to F do name now if I run this you see we get the same output here as well all right now let’s explore another example to learn more about inner joints so we have a database called classic models let me first use classic models I’ll run this okay now let me just show the different tables that are part of classic tables all right so here you can see there are tables like customers there’s employees office there’s office details orders payments products and product lines as well all right so let me use my select statement to show what are the columns present in the products table okay so this product table has information about different product names you have the product code now this product code is unique here we also have the product vendor a little description about the product then we have the quantity in stock buying price and MSRP let’s see what we have in product lines if I run it you see here we have the product line which is the primary key for this table then we have the textual description for each of the products this is basically some sort of an advertisement all right now suppose you want to find the product code the product name and the text description for each of the products you can join the products and product lines table so let me show you how to do it I’ll write my select statement and choose my columns as product code then we have product name and let’s say I want the text description so I’ll write this column name okay then I’ll use from my first table that is products inner join product lines I can use using the common key column that is product line close the bracket I’ll give a semicolon and if I run it there you go so you can see the different product codes then we have the different product names and the textual description for each of the products so this we did by joining the products table and the product lines table all right now suppose you want to find the revenue generated from each product order and the status of the product to do do this task we need to join three tables that is orders order details and products so first let me show you what are the columns we have in these three tables you have obviously seen for the products table now let me show you for orders and Order details table so I’ll write select star from orders if I run it you can see it has information about the order number the date on which the order was placed we also have the shipment date we also have the status column which has information regarding whether the order was shipped or cancelled then we have some comments column we also have the customer number who ordered this particular product similarly let’s check what we have under order details so I’ll write select star from order details if I run it you can see it has the order number the product code quantity of each product we have the price of each product then we have the order line number okay so using the product orders and Order details let’s perform an inner join so I’ll write select o do order number comma o do status comma I need the product name which I’ll take from the products table so I’ll write P do product name now here o p are all alas name for the tables orders products and I’ll use OD for order details comma since we want to find the revenue we actually need to find the product of quantity ordered into price of each product so I’ll use a sum function and inside the SU function I’ll give quantity ordered multiplied by the price of each item I’ll use an alas as Revenue then I’ll use my from Clause from orders as o inner join order details as I’ll use an alas name as OD on I’ll write o do order number is equal to OD do order number I’ll use another inner join and this time we’ll join the products table so I’ll write inner join products as p on P do product code is equal to OD do product code and finally I’ll use the group by clause and group it by order number all right let me run this okay there’s some mistake here we need to debug this it says you have an error in your SQL syntax check the manual all right okay I think the name of the tables is actually orders or not order all right now let’s run it okay there’s still some error it says classic models. product doesn’t exist so so again the product name is I mean the table name is products and not product so let’s run it again all right there you go so we have the order number the status the product name and the revenue this we got it using inner join from three different tables now talking about left joins the SQL left join statement returns all the rows from the left table and the matching rows from the right table so if you see this diagram you can see we have all the rows from the left table that is a and only the matching rows from the right table that is B so you can see this overlapped region and the Syntax for SQL left join is something like this so you have the select statement and then you give the list of columns from table a which is your left table then you use the left join keyword followed by the next table that is table B on the common key column so you write a do key is equal to B do key okay now in our classic models database we have two tables customers and orders so if you want to find the customer name and their order ID you can use these two tables so first let me show you the columns that are present in customers and orders I think orders we have already seen let me first show you what’s there in the customer table okay so you can see we have the customer number the name of the customer then we have the contact last name the contact first name we have the phone number then there’s an address column there are two address columns actually we have the city name the state and we have other information as well and similarly we have our orders table so I’ll write select start from orders so I’ll write select star from orders if I run this you can see these are the information available in the orders table okay so let’s perform a left join where we want to find the customer name and their order IDs so I’ll write select C do customer name or let’s say first we’ll choose the customer number comma then I want the customer name so I’ll write C do customer name then we have the order number column which is present in the orders table and let’s say I also want to see the status then I’ll give my left table that is customers as C left join orders as o on C Dot customer number equal to O do customer number let’s run it okay again there is some problem all right so the table name is customers let’s run it so there’s another mistake here this is customer number so B is missing cool let me run it all right so here you can see we have the the information regarding the customer number then the respective customer names we have the order number and the status of the shipment so if I scroll down you’ll notice one thing there are a few rows you can see which have null values this means for customer number 125 and for this particular customer name there were no orders and similarly if I scroll down you will find a few more null values you can see here there are two null Val values here for customer number 168 and 169 there were no orders available all right now to check those customers who haven’t placed any orders you can use the null operator so what I’ll do is here I’ll just continue with this I’ll use a where clause and write where order number is null now let me run this okay so here you can see there are 24 customers from the table that don’t have any orders in their names okay now talking about right joins so SQL right join statement returns all the rows from the right table and only matching rows from the left table so here you can see we have our left table as a and the right table as B so the right join will return all the rows from the right table and only the matching rows from the left table now talking about the syntax so here you can see we have the select statement followed by the select statement you’ll have the list of columns that you want to choose from table a write join table B on the common key column from both the tables all right now to show how write join works I’ll be using two tables that is customers and employees so let’s see the rows of data that are present in the customer table first so I’ll write select star from customers let’s run it so here you have the customer number the customer name then we have the phone number the address of the customers you also have the country to which the customer belongs to the postal code and the credit limit as well similarly let’s see for the employees table here I’ll change customer customers to employees let’s run it okay so we have the employee number the last name the first name you have the extension the email ID the job title and also reports to here means the manager okay so based on these two tables we’ll find the customer name the phone number of the customer and the email address of the employee and join both the tables that is customers and employees so let me show you the command so I’ll write select C do customer name comma then we have C do phone I’ll give a space here next I want the employee number from the employee table so I’ll write e do employee number comma e do email from customers as C right join employees as e on E do my common key column is employee number here so I’ll write e do employee number is equal to C dot we have sales Representatives employee number and I’m also going to order it by the employee number column okay so you can see I have my customer name selected from the customers table the phone number of the customer then we have the employee number and the email address so let me run it okay there’s some problem all right so the table name is customers actually let’s run it once again there you go so you can see here we have all the values selected from our right table which is the employees table you can see right on employees which means your employees table is to the right and then we have the customer name and phone numbers of the customers from the customer table which is actually your left table so you have a few employee number such as one2 this 1056 which don’t have any customer name or phone numbers okay so there’s another popular join which is very widely used in SQL known as self joints so self joints are used to join a table to itself so in our database we have a table called employees let me show you the table first all right so here you can see we have the employee number the last name the first name of the employee you have the email ID and here if you see we have a column called reports 2 now this you can think of as the manager column so the way to read is for example for employee number 1056 the manager is one2 so if you check for one2 we have Dane Murphy then if I scroll down let’s say for employee number 1102 yeah for employee number 1102 the manager is 1056 so here you can see who is at 1056 you have Mary Patterson similarly if I scroll down let’s say for employee number 11 188 we have the manager as 11 43 now if I check the table at 1143 we have Anthony bow so so the employee Julie feli reports to Anthony bow all right now suppose you want to know who is the reporting manager for each employee so for that you can use a self jooin so let me show you how to join this employees table I’ll write select and then I’m going to use a function called concat within brackets I’ll start with my alas name that is m dot then I’ll write last name I’m going to concat last name followed by a comma then I’ll have my first name I’ll close this bracket and then I’m going to give my alas name let’s say manager here comma next I’m going to concat the same last name and first name and this time I’m going to use a separate alas let’s say e which stands for employee so I’ll write e do last name comma and within single codes I’ll give my comma and then I’ll write e do first name I close this bracket I’ll give an alas as let’s say employee from I’ll write employees as e inner join employees as M on M do I’ll use my common key column as employee number so I’ll write M do employee number is equal to e do here I’m going to use the reports two column and then I’ll order it by let’s say manager okay now let’s run this there you go so you have your two columns as manager and employee so for employee Louie bonder the manager is zarad bonder similarly if I scroll down you have there are multiple employees reporting to this particular manager similarly we have our manager as Anthony bow and we have different employees who are reporting to this particular manager and so on all right now moving ahead now let’s see what a full join is so SQL full outer join statement returns all the rows when there is a match in either left or right table now you must remember that MySQL workbench does not support full outer join by default but there’s a way to do it so by default this is how the syntax of full outer joint looks like now this statement will work on other SQL databases like micros moft SQL server but it won’t work on MySQL workbench I’ll show you the right way of using full auto join on MySQL workbench so to show full outer join I’m going to first use a left join and then we’ll also use a right join and finally we’ll use a union operator so the union operator is used to combine the result set of two or more select statements so first of all let me write C do customer name so for this example I’m using the customer table and the order table comma o do order number so I just want to know the customer name and the order number related to the customer from I have customers as C left join I’ll write orders as o on C do customer number is equal to O do customer number let me just copy this and after this I’m going to use my union operator so Union operator is used to merge results from two or more tables so basically this performs a vertical join and next I am going to use my right join operation so here instead of left join I’ll write right rest all looks fine let me just run it there you go so we have successfully run our full outer join operation you can see we have the different customer names and the order that each customer had placed all right so that brings us to the end of our demo session so let me just run through whatever we did in this session so first we created a database called SQL joints then we created two tables like cricket and football then we had inserted a few rows to each of these tables then we used this table to learn about inner join next we used a database called classic models it had multiple tables so we explored all of these tables like products there was product lines orders customers and employees and learned how to use inner join left join self join right join as well as full outer join in this video we will learn what is a subquery and look at the different types of subqueries then we learn subqueries with select statement followed by subqueries with insert statement moving further we will learn subqueries with the update statement and finally we look at subqueries with delete statement all these we will be doing on our MySQL workbench so before I begin make sure to subscribe to the simply learn Channel and hit the Bell icon to never miss an update so let’s start with what is a subquery so a subquery is a select query that is enclosed inside another query so if I show you this is how the basic structure of a subquery looks like so here whatever is present inside the brackets is called as the inner query and whatever is present outside is called the outer query so first the inner query gets executed and the result is returned to the outer query and then the outer query operation is performed all right now let’s see an example so we have a question at hand which is to write a SQL query to display Department with maximum salary from employees table so this is how our employees table looks like it has the employee ID the employee name age gender we have the date of join Department City and salary now to solve this query my subquery would look like this so I’ll first select the department from my table that is employees where I’ll use the Condition salary equal to and then I’ll pass in my inner query which is Select Max of salary from employees so what this does is it will first return the maximum salary of the employees in the table then our outer query will get executed based on the salary returned from the inner query so here the output is department sales has the maximum salary so one of the employees from the sales department earns the highest of the maximum salary if you see in our table the employee is Joseph who earns $115,000 all right and Joseph is from the sales department now let’s see how this query works so here we have another question which is to find the name of the employee with maximum salary in the employees table so this is our previous employees table that we saw and to find the employee who has the maximum salary my subquery would look something like this so I’m selecting the employee name from my table that is employees where I’m using the Condition salary equal to and then then I’m passing in my subquery or the inner query so first I’m selecting the maximum salary this will return a particular value that is the highest salary from the table and if you see our table the highest salary is $115,000 so our query becomes select employee name from employees where salary equal to $115,000 so the employee name is Joseph here and that’s the output now if you want to break it down here you can see first the inner query gets executed so our SQL query will first execute the inner query that is present inside brackets select maximum salary from employees the result is $115,000 and then based on the returned result our outer query gets executed so the query becomes select employee name from employees where salary equal to $115,000 and that employee is Joseph all right now we’ll learn the different types of subqueries so you can write subqueries using select statement update statement delete and insert statement we’ll explore each of this with the help of example on my my SQL workbench so let’s learn subqueries with the select statement so subqueries are majorly used with the select statement and this is how the syntax looks like you select the column name from the table name then you have the WHERE condition followed by The Columns that you want to pass the operator and inside that you have the subquery so here is an example that we will perform on our MySQL workbench so in this example we want to select all the employees who have a salary less than average salary for all the employees this is the output so let’s do this on my MySQL workbench all right so let me log into my local instance I’ll give my password okay so you can see I’m on my MySQL workbench so let’s start by writing our subquery using the select statement okay so for this demo session we’ll be using a database that is subqueries you can see it here I have a database called subqueries so I’ll use this subqueries database and we’ll create a few tables as well okay if I run it now we are inside the subqueries database so let me just show you the tables that are present inside this database I’ll write show tables if I run it okay there are two tables employees and employees undor B uh we’ll use this table throughout our demonstration all right now for our select subquery we want to fetch the the employee name the department and the salary whose salary is less than the average salary so we will be using the employees table so let me first show you the records and the columns we have in the employees table so I’ll write select star from employees and run it okay you can see here we have 20 rows of information we have the employee name the employee ID age gender date of join Department City and salary so this is the same table that we saw in our slide slides okay now for our subquery I’ll write select I want to choose the employee name the department and the salary there should be a comma here instead of a period next I’ll give my table name that is employees where my salary is less than and after this I’ll start my inner query or the subquery I’ll write select average salary so I’m using the AVG function to find the average salary of all the employees from my table that is employees if I give a semicolon and run this you’ll see the output so we have total 12 employees in the table whose salary is less than the average salary now if you want you and check the average salary so the average salary is $753 now the employees who have a salary less than the average salary so these are the people all right now moving back to our slides okay now let’s see how you can use subqueries with the insert statement now the insert statement uses the data return from the subquery to insert into another table so this is how the syntax looks like so you write insert into table name followed by select individual column so start from the table use the wear clause and then you give the operator followed by the inner query or the subquery so here we will explore a table called products table we are going to fetch few records from the products table based on a condition that is the selling price of of the product should be greater than $100 so only those records will fetch and put it in our orders table all right so we are going to write this query on my MySQL workbench so let’s do it I’ll give my comment as update subquery all right so first of all let’s create a table that is products so I’ll write create table products then we’ll give our column names the First Column would be the product ID of type integer then we have the column as item or the product which is of type Vare 30 next we have the selling price of the product the selling price will be of type float and finally we have another column which is called the product type and again product type is of the data type bar car I’ll give the size as 30 close the bracket and give a semicolon now let’s just run it okay so we have successfully created our products table now let’s insert a few records to our products table so I’ll write insert into products for followed by values I’ll give four records the first product ID is 101 the product is let’s say jewelry then the selling price is let’s say $800 and the product type is it’s a luxury product next let’s insert one more product detail the product ID is 102 the product is let’s say t-shirt the price is let’s say $100 and the product type is non-luxury next I’ll just copy this to reduce our task we’ll edit this the third product’s ID is 103 the product is laptop and let’s say the price is $1,300 and it’s a luxury product I’ll paste again and finally I’ll enter my fourth product which is let’s say table and the price is $400 and it’s a non-luxury product I’ll give a semicolon and we’ll insert these four records to our products table you can see see we have inserted four records let’s just print it now so I’ll write select star from products if I run it you can see we have our four products ready now we need to create another table where we are going to put some records from our products table so that new table is going to be the orders table so I’ll write create table orders now it will have three columns the order ID order ID will be of type integer then we have product underscore sold this will be of type varing character of size 30 and finally we have the selling price column this will be of type float let’s create our orders table the table name should be orders and there is some mistake here okay we should close the brackets okay let me run it so we have our orders table ready now let’s write our insert subquery so I’m going to insert into my table that is orders and I’ll select the product ID comma the item and the selling price or the sell price from my table that is products where I’ll write product ID in I’ll write my inner query select prodad ID or the product ID from products next I’ll give a wear Clause where the selling price is greater than $11,000 so let me tell you what I’m going to do here I’m going to insert into my orders table the product ID the item name and the s selling price from my products table where the product ID has this condition so let me first run this condition for you which is Select prod ID from products where the selling price is greater than 1,000 if I run this okay there is some issue here the column name is actually prodad ID now let’s run it again so that we can see the product IDs of the products which have a selling price greater than 1,000 so it is 101 and 103 now let’s run the entire query there is another mistake here let’s debug the mistake now this should be product ID instead of product _ in let’s insert again all right so we have successfully inserted two records to our table that is orders now let’s see the orders table I’ll write select star from orders if I run it there you go so there were two products from our product table that were jewelry and laptop which have a selling price greater than $11,000 so the selling price for jewelry was $1,800 and for laptop it was $1,300 so this is how you can use a subquery using the insert statement all right now going back to our slides again all right now let’s see how you can use subqueries with the update statement now the sub sub queries can be used in conjunction with the update statement so either single or multiple columns in a table can be updated when using a subquery with the update statement so this is how the basic syntax of an update subquery looks like so you write update table followed by the table name you set the column name you give the we operator and then you write your inner subquery so we are going to see an example where we’ll use this employees table and using this employees table we will update the records of the salaries of the employees by multiplying it with a factor of 35 only for those employees which have age greater than 27 so we are going to use a new table called employees Corb for this as well so let’s see how to do it so I’ll give my comment as update subquery before we see the subquery let’s see what we have in the table employees Corb this is basically a replica of the employees table there you go it has the same records that our employees table has we are going to use both the employees table and the employees _ B table to update our records so I’ll write update employees set salary equal to let me bring this to the next line I’ll write set salary equal to salary multiplied by 35 where age in then I’ll write select age from my other table that is employees uncore B where age is greater than equal to let’s say 27 all right so let me run through this query and tell you what we are going to do so I’m going to update the records of the employees table specifically for the salary column so I’m checking if the age is greater than 27 then we’ll multiply the salaries of the employees with a factor of. 35 in the employees table let me just run this then we’ll see our output okay so it says 18 rows affected which means there are total 18 employees in the table out of the 20 employees whose age is greater than 27 now if you see I’ll write select star from employees you can see the difference in the salaries if I scroll to the right you can see these are the up updated salaries okay now if you check for employees who have an age less than or equal to 27 for example Marcus whose age is 25 his salary is the same we haven’t updated his salary then if you see if you have okay there is one more employee Maya we haven’t updated the salary of Maya because the age is less than 27 all right now let’s go back to our slides again as you can see we got the same output on our MySQL workbench now let’s explore how you can write subqueries with the delete statement now sub queries can again be used in conjunction with the delete statement so this is how the basic syntax of a delete query using subquery would look like you write delete from the table name where Clause the operator value followed by the inner query in within brackets so here we are going to use the employees table and what we are going to do is we’ll delete the employees whose age is greater than equal to 27 so let’s see how you can do it all right so I’ll give my comment as delete subquery so we’ll follow the syntax tax that we saw I’ll write delete from my table name that is employees I’ll write where age in and then I’ll start my inner query or the subquery I’ll write select age from employees uncore B where AG is let’s say greater than equal to 32 or let’s say the AG is less than equal to 32 close the bracket and I’ll give my semicolon let me first run the inner query for you so that you get an idea of the employees who are less than 32 years of age so there are nine employees in the table who have an age less than equal to 32 so we are going to delete the records if I run this okay it says nine records deleted now let’s print or display what we have in the employees table if I run this there you go so if you see the age table we have total 11 employees now and all their ages are greater than 32 because we have deleted all those employees who had an age less than equal to 32 okay so let me show you from the beginning what we did so first we used our subqueries database then we used our employees table so we started by looking at how you can use the subquery with a select statement this should be insert instead of update so we learned how to write an insert subquery we use two tables products and our a table moving ahead we saw how to write subqueries using the update command so we updated the salaries of the employee by a factor of. 35 for those who had an age greater than equal to 27 and finally we saw how to use the subquery using the delete statement so we deleted all those records for the employees whose age was less than equal to 32 so let’s start with what is normalization normalization in dbms is a method used to organize data within database to reduce repetition by breaking down large data sets into smaller more manageable tables and ensuring these tables are properly related normalization helps prevent issues like data rency data rency means the unnecessary repetition or duplication of data within a database for example when a same piece of data is stored in multiple places it can lead to inconsistencies and take up more storage space than needed for example Data rency before normalization you can see the table mentioned above where we have order ID customer ID customer name customer address product and quantity you might see some of the data which is being repeated again and again in the above table the customer address for John do is repeated three times let’s suppose if John do moves to a new address every occurrence of his address in the table must be updated if any instances missed during the update it leads to inconsistencies and errors can occur in the database the solution is reducing the rency through normalization let’s check it out how so you can see this is the normalized table we have created first is the normalized customer table and then we have the order table so what are the benefits of normalization the address for JN do is stored only once in the customer table if JN do address changes it needs to be updated in one place ensuring consistency through the database this reduces the risk of errors and maintains data Integrity the process involves multiple steps that transform data into a tab below format removing duplicates and establishing clear connections between different tables making the database more efficient and reducing problems like errors during data insertion updates or deletion let’s now discuss the types of dbms normal forms normalization rules are categorized into different normal forms the first one is one and if for a table to be in first normal form it must satisfy the four rules single valued Atomic attributes each column should contain only one value per row this means that there should be no repeating groups or arrays within a single column same domain values all values stored in a specific column should be of the same data type or domain for example if a column is meant to store dates all values in that column should be dates then we have unique column names each column in the table should have a unique name this ensures Clarity and avoids confusion when referring to a specific column then we have order of data which doesn’t matter the order in which rows are stored in the table should not affect the data or its Integrity let’s check the example of the first normal form consider the following unnormalized table customer ID customer name and the phone numbers as you can see the phone numbers are repeated twice the problems with the original table is the nonatomic values the four numbers column contain multiple phone numbers separated by commas which violates the atomicity rule of 1 andf converting to First normal form to bring this table into one and F we must ensure that each column contains only Atomic value this involves splitting the rows where there are multiple phone numbers as you can see we have splitted the data each row now has a single phone number ensuring that the phone number column contains Atomic value same domain names all the values in the phone number column are consistent in format and type all are phone numbers then we can see that the unique column names the colums customer ID customer name phone number which has unique name satisfying the requirement order of data the order in which the rules appear does not matter as the data’s meaning and integrity are preserved by applying these rules the table now confirms the first normal form eliminating any rency related to the four numbers and ensuring data is stored in a more organized and efficient manner let’s go through each of these database normal forms step by step with simple examples to help you grasp the concepts more easily let’s talk about the second normal form for a table to be in second normal form it must satisfy the following condition number one it must be in one and F number two no partial dependency every non key attribute should be fully dependent on the entire primary key not just part of it this rule applies primarily to tables with composite primary Keys example of second normal form is consider the following table that is in one NF the order ID product ID product name quantity and the supplier name the problems with this table is that the partial dependency the product name and the supplier name depend only on product ID not the entire bio primary key which is order ID and product ID this violates 2nf converting to Second normal form to bring the table into 2nf we separate the data into two tables to remove partial dependencies order table and the product table no partial dependency in the order table quantity is fully dependent on both order ID and product ID in the product table product name and supplier name are dependent only on the product ID this ensures that each each non key attribute is fully dependent on the primary key bringing the tables into 2 andf let’s now talk about the third normal form 3 andf for a table to be in third normal form it must satisfy the following condition number one it must be in 2 andf number two there should be no transitive dependency where non-key attributes depend on other non-key attributes rather than the primary key let’s check out the example of a third normal form consider the following table that is in 2nf the problems with the tnf table is that the transitive dependency the instructor name is dependent on the course name which is not directly on student ID or course ID and this violates 3 andf so how do we convert this into 3 andf to achieve 3 andf we split the table to remove the transitive dependency student course table and course table no transitive dependency now the student course table there are no non-key attributes depending on other non-key attributes the course Table stores the course and instructor information separately this structure eliminates transitive dependency uring the tables conform to 3 andf Let’s now talk about the boys called normal form which is bcnf bcnf is an extension of the third normal form 3nf a table is in bcnf if it is in 3nf and for every functional dependency a implies to B A should be a Super Key let’s check out the example of a boy Squad normal for bcnf so you can see this table here consisting of employee ID department and the manager the problem with this table is that the bcn a violation in this table Department determines manager but department is not a Super Key since employee ID is the primary key this violates bcnf so how do we convert this to bcnf to achieve bcnf we split the table to ensure that every determinant is a Super Key as you can see the employee table and the department table the super key requirement in the employee table employee ID is the primary key and in the department table department is now the primary key the decomposition ensures that every functional dependency is Satisfied by a Super Key meeting the requirements of bcnf let’s now talk about the fourth normal form which is 4nf a table is set to be in 4nf if it is in bcnf and has no multivalue dependencies so let’s consider an example of a fourth normal form consider a table where an employee can have multiple skills and work on multiple projects as you can see the employee ID skill and the project the problem with this table is that it is multivalue dependency an employee skill is independent of the project but both are stored in the same table this leads to multivalue dependency violating 4nf so in order to achieve 4nf we separate the skills and the projects into different tables the employee skill table and the employee projects table and now you can see that no multivalue dependency by separating the skills and the projects we eliminate multivalue dependencies ensuring the table conform to for and let’s now talk about the fifth normal form the employee skill table and the employee projects table so as you can see that no multivalue dependencies is there by separating the skills and the projects we eliminate multivalue dependencies ensuring the tables conformed to 4 and F now let’s talk about the fifth normal form which is 5 and f a table is said to be in fifth normal form if it is in forf and cannot be decomposed into any smaller tables losing information also known as joint dependency let’s consider an example of a fifth normal form this is a table here that records the relationship between suppliers parts and the project the problem with this table is that the join dependency the table has a complex relationship between suppliers parts and projects that can be decomposed further so how do we convert this into fifth normal form form in order to achieve 5 andf we break the table into smaller related tables the suppliers part table and the suppliers project table also Parts project table eliminating joint dependency by decomposing the table into three smaller tables we remove the complex relationship and eliminate the joint dependency ensuring the tables confirmed to 5 andf So currently I am on my MySQL workbench let me connect to the local instance so I’ll give my pass word I’ll click on okay all right so this is my my SQL workbench query editor so first we are going to learn subqueries let me give a comment and write subqueries all right so first of all let’s understand what a subquery is so a subquery is a query within another SQL query that is embedded within the where Clause from clause or having Clause so we’ll explore a few scenarios where we can use subqueries so for that I’ll be using my database that is SQL uncore intro so I’ll write my command use SQL uncore intro now this database has a lot of tables I’ll be using the employees table that is present inside SQL intro Let me just expand this and you can see here we have an employees table so let me first show you the contents within this table I’ll write select star from employees let me execute it okay you can see here we have the employee ID employee name age gender there’s date of join Department City and salary and we have information for 20 employees if I scroll down you can see there are 20 employees present in our table so let’s say you want to find the employees whose salary is greater than than the average salary in such a scenario you can use a subquery so let me show you how to write a subquery I’ll write the select statement in the select statement I’ll pass my column names that I want to display so the column names I want are the employee name then I want the department of the employee and the salary of the employee from my table name that is employees next I’ll use use a we condition where my salary should be greater than the average salary of all the employees so I’ll write salary greater than after this I’m going to write my subquery so I’ll give select average of salary from my table name that is employees and I’ll close the bracket and give a semicolon so what it does is first it is going to find the average salary of all the employees that are present in our table once we get the average salary number we’ll use this wear condition where salary is greater than the average salary number so the inside subquery let me run it first if I run this this gives you the average salary of all the employees which is $275,300 now I want to display all the employees who have salary greater than $75,500 so let’s run our subquery there you go so there are eight employees in our table who have a salary greater than the average salary of all the employees all right next let’s see another example suppose this time you want to find the employees whose salary is greater than John’s salary so we have one employee whose name is John let me run the table once again okay if I scroll down you see we have an employee as John you see this our employee ID 116 is John and his salary is $67,000 I want to display all the employees whose salary is greater than John’s salary so B basically all the employees who are earning more than $65,000 I want to print them so let’s see how to do it I’ll write select I want the employee name comma the gender of the employee I also want the department and salary from my table name that is employees I’ll write where salary is greater than I’ll start my opening bracket inside the bracket I’m going to give my inner query that is Select salary from employees where the employee name is John So within single quotations I’ll give John as my employee I’ll end with a semicolon so let me first run my inner query so this will give us the salary that John has which is $67,000 now I want the employees who are earning more than $667,000 so let’s run our subquery okay so you can see 12 rows returned which means there are 12 employees in our table who are earning more than $67,000 you see here all these employees have a salary greater than6 $7,000 okay now you can also use subqueries with two different tables so suppose you want to display some information that are present in two different tables you can use subqueries to do that so for this example we’ll use a database that is called classic models you can see the first database so let me use this database called classic Model models I’ll write use classic models now this database was actually downloaded from the internet there’s a very nice website I’ll just show you the website name so this is the website that is MySQL tutorial.org you can see here they have very nice articles blogs from where you can learn my SQL in detail so we have downloaded the database that is classic models from this website you see here they have a MySQL sample database if you click on this it will take you to the link where you can download the database so they have this download link which says download my SQL sample database and the name of the database is classic Models All right so we are going to use this classic models database throughout our demo session if I expand the tables section you can see see there are a lot of tables that are present inside this classic models database we have Cricket customers there’s employees office there’s orders order lines and many more so for our subquery we’ll be using two tables that is order details and products table first let me show you the content that is present inside the products table first if I run this you see here it says 110 rows returned which means there are 110 different products that are present in our table which has the product code the product name product line we have the product vendor description quantity and stock Buy price MSRP the other table we are going to use is order details which has the details of all the orders let me show you the records order details tables has okay so there are thousand records present in this table you have the order number the product code quantity ordered price of each item you have the order line number as well okay now we want to know the product code the product name and the MSRP of the products whose price of each product is less than $100 for this scenario we are going to use two different tables and we are going to write a subquery okay so if you see here in the order details table we have a column called price each I want to display the product code the product name and the MSRP of the products which have a price of each product less than $100 so the way I’m going to do is I’ll write select product code comma product name now one thing to remember that this product name is actually present inside our products table and product code is present in both the tables that is products and Order details here you can see this is the product code column comma MSRP which is present inside the products table again from my table that is products where I’ll write product code I’m going to use the in operator next I’ll write my inner query that is Select product code from my table order details where my price of each product is less than $100 let me run this okay so you can see there are total 83 products in our table which have a price less than $100 you can see the price here okay now we learn another Advanced Concept in SQL which is known as stored procedures I’ll just give a comment saying stored procedure okay so first let’s understand what is a stored procedure a stored procedure is an SQL code that you can save so that the code can be reused over and over again so if
you want to write a query over and over again save it as a stored procedure and then call it to execute it so in this example I want to create a stored procedure that will return the list of players who have scored more than six goals in a tournament so I have a database is called SQL IQ these are a few databases that I’ve have already created so this database has a table called players if I expand the tables option you see we have a table called players and you can see the columns player ID the name of the player the country to which the player belongs to and the number of goals each player has scored in a particular tournament so I’ll write a store procedure that will return the list of top players who have scored more than six goals in a tournament so first of all let me Begin by using my SQL IQ database we’ll run it so now we are inside the SQL IQ database let me select star from players to show the values that we have in the players table you can see there are six players in our table we have the player ID the names of the players the country to which these players belong to and the goals they have scored so I’ll write a stored procedure so the stor procedure syntax is something like this it should start with a D limiter okay in the D limiter I’ll write Amberson erson next I’ll write create procedure followed by the procedure name let’s say I want to name my procedure as topor players next statement is begin after begin I’ll write my select statement I want to select the name of the player the country and the goals each player has scored from my table that is players where I’ll write goals is greater than six we give a semicolon then I’ll end my procedure with a d limiter that was done double Amberson next I’ll write D limiter and give a semicolon now the semicolon suggests this is a default DM there should be a space okay now let’s run our stored procedure there you go so you have successfully created our store procedure now the way to run a store procedure is you need to use the call method and give the procedure name that is topor players in our case with brackets and a semicolon let’s execute it okay there is some problem here so we made a mistake while creating a procedure the name of the column is goals and not go goal let me create that procedure again okay it says the procedure topor player already exists let’s just edit the procedure name instead of top player we’ll write it as top players and similarly we’ll edit here as well now let’s create it again okay now to call my procedure I’ll write call space followed by the procedure name which is topor players if I run this you can see we have two players in our table who have scored more than six goals so we consider them as the top players in a particular tournament all right now there are other methods that you can use while creating a stored procedure one of the methods is by using an in parameter so when you define an in parameter inside a stored procedure the calling program has to pass an argument to the stored procedure so I’ll give a comment stored procedure using in parameter all right so for this example I’ll create a procedure that will fetch or display the top records of employees based on their salaries so if we have a table in our SQL IQ database which is called employee details I’m going to use this table you can see we have the name of the employee the age sex then we have the date of join City and salary using this table I’ll create a procedure that will fetch or display the top records of employees based on their salaries and we’ll use the in parameter so let me show you how to do it I’ll write delimiter this time I’m going to use forward slash I’ll write create procedure followed by the procedure name let’s say SP for stor procedure sort by salary is the name of my procedure and inside this procedure I’ll give my parameter in I’ll create a variable V and assign a data type integer then I’ll write begin followed by my select statement where I’ll select the name age salary from my table name that is EMP details or employee details I’m going to order this by salary descending and I want to display limited number of Records so I’m using this limit keyword and my variable V which I created here here I end my select statement I end my stored procedure with forward slash and I’ll go back to my default delimiter that is semicolon all right so let me run this there should be a space here all right so let’s run this okay you can see we have successfully created our second stored procedure which is Spore sort by salary now you can also check whether the stored procedure was created or not here you have an option to see the stored procedures let me just refresh this and you can see we have three stored procedures that we have created so far one is Spore sort by salary the other two were topor play and topor players okay now let’s call our stor procedure I’ll write call space followed by the stored procedure name which is Spore sort by salary and inside this I’ll give my parameter which was actually V and this V we have used in limit let’s say I want to display only the top three records of the employees who have the top three highest salaries okay so let me run it there you go so ammy Sara and Jimmy were the top three employees who have the highest salary so you saw how you could use the in parameter in a stored procedure we created a variable and that variable we used in our select statement and we called our stored procedure and passed in that variable okay now instead of a select statement inside a stored procedure you can also use other statements let’s say update so I’ll create a stored procedure to update the salary of a particular employee so in this procedure instead of Select statement we’ll use the update command in this example we’ll use the in operator twice let me show you how to do it I’ll write my D limiter first which is going to be for slash then I’ll write create procedure my name of the procedure is going to be update salary and inside the update salary name I’ll write in and then temp underscore name which will be a temporary name variable and the type I’ll assign is varar 20 I’ll again use my in parameter I’ll write in next my other variable would be newcore salary and the data type would be float I’ll write begin and write my update command or update statement I write update table name that is employee details set salary equal to newcore salary where name is equal to my temporary variable that is tempore name so this is my update command and I’ll and the delimiter all right so let’s run this okay we have successfully created our stored procedure if I refresh this you can see I have my store procedure update _ salary okay now let’s say first of all I’ll display my record that are present inside employee _ details table okay so we have six rows of information let’s say you want to update the salary of employee Jimmy or let’s say Mary from 70,000 to let’s say 72,000 or let’s say 80,000 so I’ll call my store procedure that is update uncore for salary and this time I’m going to pass in two parameters the first parameter will be the employee name and next with a comma I’ll give my new salary that I want to so my employee name let’s say is Mary and the salary I want to be updated is let’s say $880,000 I’ll give a semicolon and I’ll run it you can see it says one row affected now let’s check our table once again there you go if you see this record for Mary we have successfully updated the salary to $80,000 now moving ahead we learn to create a stored procedure using the out parameter so I’ll give a comment stor procedure using out parameter Okay so so suppose we want to get the count of total female employees we will create total employees as an output parameter and the data type would be an integer the count of the female employees is assigned to the output variable which is total uncore emps using the into keyboard let me show you how to write a stored procedure using the out parameter so first I’ll declare my delimer to forward slash I’ll write create procedure followed by the procedure name it is going to be Spore count employees and inside this I’m going to give my out parameter and the variable name that is total uncore emps which is total employees and the data type will be integer next I’m going to write begin followed by my select statement that is Select I want the count of total employees and the output I’m going to put into my new variable that is total _ emps from my table that is empore details where sex is equal to F which means female I’ll give a semicolon next I’ll end it with the D limiter and I’m going to change the D limiter to a default D limiter that is colon so let me tell you what I’m doing here I’m creating a new stor procedure that is Spore count employees using this stored procedure I’m going to count the total number of female employees that are present in our table empore details so I’ve used my out parameter and I’m creating a new variable called total uncore emps the data type is integer here in the select statement I’m counting the names of the employees and the result I’m storing it in total _ emps I have used my wear condition where the gender of the sex is female so let’s run this okay so we have created our stored procedure let’s refresh this okay you can see we have our new stored procedure Spore count employees now to call it I’ll write call the name of the procedure that is countor Spore count employees within brackets I’ll pass in the param meter as at the rate fcor EMP I’ll give a semicolon then I’ll write select at the rate fcor EMP as female employees okay so as is an alias name let’s run this one by one first I’ll call my procedure and then we’ll display the total number of female employees you can see in our table we have three female employees all right now with this understanding let’s move on to our next Topic in this tutorial on Advanced SQL now we are going to learn about triggers in SQL so I’ll give a comment here triggers in SQL so first let’s understand what is a trigger so a trigger is a special type of stored procedure that runs automatically when an event occurs in the database server there are mainly three types of triggers in SQL we have the data manipulation trigger we have the data definition trigger and log triggers in this example we’ll learn how to use a before insert trigger so we will create a simple students table that will have the students role number the age the name and the students marks so before inserting the records to our table we’ll check if the marks are less than zero so in case the marks are less than Z our trigger will automatically set the marks to a random value let’s say 50 so let’s go ahead and create our table that is students all right so I’ll write create table student now this table will have the student role number the data type is integer we will have the age of the students again the data type is integer we have the names of the students so the third column would be name the data type would be variable or varying character size I’m giving it as 30 finally we have the marks as floating type so let’s create this table which is student so we have created our table now I’ll write my trigger command so trigger command will start with D limiter like how our usual stored procedures have next this time I’ll write create trigger then you you need to give the name of the trigger that is Mark underscore let’s say verify I’m going to use a before insert trigger so I’ll write before insert on my table name that is student next I’ll write for each row if new do marks is less than zero then we set new do marks equal to 50 so this is my condition first we’ll check before inserting if any student has marks less than zero will assign a value 50 to that student because usually the marks are not less than zero in any exam I’ll write end if semicolon and I’ll close the delimiter so this is my trigger command I’ll run it it says trigger already exists in this case we need to update the trigger name let’s say I’ll write marks _ verify uncore student for STD let’s run it again okay there is an error here because in our table the column name is Mark and not marks so here we need to change it as Mark instead of marks all right let’s run it okay so we have created our trigger now let me insert a few records to the student table so I’ll write insert into student I’ll write values it give the values as 501 which is the student role number the age is let’s say 10 the name is let say Ruth and the marks is let’s say 75.0 give a comma we’ll insert our second student record student role number is 502 age is 12 the name is let’s say mic and this time I’m purposely giving a value of minus 20.5 give another comma we’ll insert the third record for student role number 503 age is 13 the name is Dave and let’s say the marks obtained by Dave is 90 now we’ll insert our final record for student number 504 the age is 10 name I’ll enter as Jacobs and this time again I’m purposely giving the marks in negative 12 point let’s say 5 close the bracket and give a semicolon and I’ll run my insert statement okay so we have inserted four rows of information to our student table now let me run the select query I’ll write select star from student if I run this you see the difference there you go so originally we had inserted for 502 the marks was minus 20.5 and for 504 for Jacobs the marks was – 12.5 our trigger automatically converted the negative marks to 50 because when we created our trigger we had set our marks to 50 in case the marks were less than zero so this is how a trigger works now you can also drop a trigger or delete a trigger you can just write drop trigger followed by the trigger name in this case our trigger name is marks _ verore St I’ll just paste this here and if you run this it will automatically delete your trigger I give this as a comment okay now moving on now we are going to learn about another crucial concept in SQL which is very widely used this is known as views so views are actually virtual tables that do not store any data of their own but display data stood in other tables views are created by joining one or more tables I’ll give a comment as views in SQL okay now to learn views I’m going to use my table which is present inside classic models data datase now this database as I mentioned we had downloaded we had downloaded it from the internet so first of all let me write use classic models so I’ll switch my database first all right now we are inside classic models so here let me show you one of the tables which is called customers so I’ll write select star from customers okay I missed s here let’s run it again so this is my customer table which is present inside classic models database it has the contact last name the contact first name the customer name customer number we have the address State country another information now I’ll write a basic view command using this customer table the way to write is I’ll write create view followed by The View VI name which is cust _ details then you write as select I’m going to select a few column names from my original customer table which is this one so I need the customer name let’s say I need the phone number and the city so you have this information here you have the phone number and the city all right I’ll write from my table that is customers if I run this my view that is cust details will be created let’s run it there’s some error here because the name of the table is customers and not customer I’ll give an S and I’ll run it again all right so you can see we have created our view and to display the contents that are present inside our view I can write select star from followed by The View name that is custor details let’s run it there you go so we have the customer name the phone number and the City of the different customers that we have in our table all right now let’s learn how you can create views using joins so we’ll join two different tables and create a view so for that I’m going to use my products table and the products lines table I’m talking about the products table and the product lines table present inside classic models database so before I start let me display the records that are present inside the products table let’s run it so these are the different products you can see here now let’s see what we have in product lines table so we have the product line the text description and there’s some HTML description and image so I’ll create a view by joining these two tables and we’ll fetch specific records that are present in both the tables so let me first start by writing create view followed by The View name that is product underscore description as I’ll write select product name comma then I’ll write quantity in stock I also want the MSRP now these three columns are present inside the products table and next from the product l table I want the text description of the products so I’ll write from products table I’ll give an alas as P followed by Inner join my other table that is product lines as let’s say PL on the common column that is product line so P dot product line is equal to I’ll give a space PL do product line okay so here we have used an inner joint to fetch specific columns from both the tables and our view name is productor description let us run it all right so we have our view ready now let me view or display what is present inside our productor description view I like select star from productor description let’s run it there you go so we have the product name the quantity in stock MSRP and textual descriptions of the different products in the table okay now there are are a few other operations that you can perform let’s say you want to rename a view instead of productor description you want to give some other name so I’ll just give a comment rename description so to rename a description you can use the rename statement I’ll write rename table product underscore description Which is my old name I want to change this name to let’s say I’ll give vehicle description since all our products are related to some of the other vehicle so I’ll write vehicle description okay let us run it all right so here you can see I have renamed my view so here if I just refresh it and I’ll expand this you can see we have the Cur details view and we have the vehicle _ description view okay now either you can view all the views from this panel or you can use a command let’s say I’ll write display views is the comment now to show all the views you can use show full tables where table underscore type is equal to within single code I’ll write view so this is the command that will display all the views that are present inside a database there is some error here let’s debug the error this should be okay so instead of table types it should be table type equal to view let’s run it you can see the two different views that we have one is customer details another is vehicle _ description okay now you can also go ahead and delete a view for that you can use the drop command so I’ll write drop view followed by The View name let’s say I want to delete customer _ details or custor details view I’ll write drop View ccore details let’s run it you can see here we don’t have the custor details view anymore all right now moving to our final section in this demo here we will learn about Windows functions Windows functions were Incorporated in my SQL in the 8 .0 version so Windows function in my SQL are useful applications in solving analytical problems so using the employees table present inside my SQL intro database so we’ll find the total combined salary of the employees for each department so first let me switch my database to SQL undor intro database I’ll run it okay and display my table that is employee so here we have 20 employees in our table using this table we are going to find the combined salary of the employees for each department so we will partition our table by department and print the total salary and this we are going to do using some windows functions in MySQL so I’ll write select I want the employee name the age of the employee and the department of the employee comma next I’ll write the sum of salary over I want to partition it by department so I’ll write Partition by Department which is D and I’ll give an alas as total salary so that it will create a new column with the name total salary from my table that is employees the output will be a little different this time let’s execute it and see the result there you go so here we have created another column in our result that is total salary and for each of the employees and the respective departments we have the highest salary so in finance the highest salary of one of the employees was $155,000 similarly if I come down we have the highest salary from HR if I scroll further we have the highest salary from it marketing product sales and the tech Team all right now we’ll explore a function which is called row number now the row number function gives a sequential integer to every row within its partition so let me show you how to use the ru number function I’ll write select rore number function over my column would be salary so I’ll write order by salary I’ll give the alas as ronom give a comma and I want to display the employee name and the salary of the employee from my table that is employees and I’ll order by salary so let’s see how our row number function will create sequencial integers okay you can see here we have a row number column and we have successfully given row numbers to each of the records you can see it starts from one and goes up till 20 okay now this row number function can be used to find duplicate values in a table to show that first I’ll create a table I’ll write create table let’s say I’ll give a random name that is demo and let’s say we have in this table the student ID which is of type integer and we have the student name which is of type varar the size is 20 I’ll create the small table with a few records let’s create this table first now we are going to insert a few records to our demo table so I’ll write insert into demo values I’ll give one1 the name is Shane give a comma I’ll insert the second student name one2 the name is Bradley we give a comma this time for 103 we have two records let’s say the name of the student is her give a comma I’ll copy this and we’ll paste it again so we have duplicated 103 next we have 104 for the name of the student let’s say is Nathan then again let’s say for the fifth student which is Kevin we have two records I’ll copy this and I’ll paste it here let me give a semicolon and we’ll insert these records to our table demo all right now let me just run this table for you I’ll write select star from demo if you see this we have a few information that are duplicated in our table that is for student ID 103 and student ID 105 now I’m going to use my row number function to find the duplicate records present in my table I’ll write select student uncore ID comma student uncore name I’ll give another comma and write rore number over within brackets I’ll write Partition by store ID comma store name okay then I’ll write order by store ID close the bracket I’ll give an alas as rum from my table that is demo let’s just run it you can see here okay let me just delete n from here and do it again all right if you see here there is just one student in the name Shane we have one student in the name Bradley but here if you see for her the second record it says two which means there are two records for H and if I scroll down there is one record for Nathan and there are two records for Kevin which means Kevin is also repeated okay now we are going to see another Windows function that is called rank function in my SQL so the rank function assigns a rank to a particular column now there are gaps in the sequence of ranked values when two or more rows have the same rank so first of all let me create a table and the name of the table would be a random name we’ll give it as let’s say demo one and it will have only one column let’s say variable a of type integer we’ll create this table first okay now let’s go ahead and insert a few records to our table which is demo one so I’ll write value 101 102 let’s say 103 is repeated I’m doing this purposely so that in the output you can clearly distinguish what the rank function does next we have 104 105 we have 106 and let’s say 106 is also repeated finally we have 107 okay let me insert these values to my table that is demo one okay this is done now if I write select Vore a and use my rank function I’ll write rank over then I’ll order by my variable that is Vore a as an alas name let’s a test rank from my table that is demo one let me execute this and show you how the rank function works if I run this there you go so here if you mark So for variable a101 the test rank is 1 for 102 the test rank is two but for this value which is 103 the test rank is repeated because there was a repetition for 103 so we have skipped the rank four here for 104 the rank is 5 now for 105 the rank is 6 now for 106 again since the record was repeated twice we have skipped the eighth Rank and our rank function assigned the same value which is 7 for 106 and for the last value 107 the rank is 9 all right now moving ahead we’ll see our final Windows function which is called first value so first value is another important function in my SQL so this function Returns the value of the specified expression with respect to the first row in the window frame all right so what I’m going to do is I’m going to select the employee name the age and salary and I’ll write first underscore value which is my function and pass in my employe name and then I’ll write over order by my column that is salary descending I’ll give an alas as highest uncore salary from my table that is employees so let me run this and see how the first underscore value function works all right so in our table Joseph was the employee who had the highest salary which was $115,000 so the first value function populated the same employee name throughout the table you can see it here now you can also use the first uncore value function over the partition so let’s say you want to display the employee name who has the highest salary in each department so for that you can use the partition I’ll write select _ name comma I want the department and the salary comma I’ll use my function that is first underscore value follow by the name of the employee inside my first value parameter I’ll write over here I’m going to use partition I’m going to partition it by department since I want to know the employee name who has the highest salary in each department and I’m going to order by salary descending and I’ll give my alas again as highest salary from my table that is employees so let’s run this and see the difference in the output okay so as you can see here we have the employee who had the highest salary from each department so for finance Jack had the highest salary from HR it was Marcus similarly in it it was William if I scroll down for marketing it was John for product it was Alice who had the highest salary similarly in sales we had Joseph and in Tech we had Angela so this is how you can use the first uncore value function using partition all right so that brings us to the end of this demo session on our tutorial so let me just scroll through and show you what we did from the beginning first we learned about subqueries in SQL so we initially wrote a simple subquery and then we used our classic models database which was downloaded from the internet I’d also shown you the link from where you can download this database here we used two different tables and we performed a subquery operation we learned how to create stored procedures so we learned how you can use the in operator or the in parameter as well as the out parameter in store procedure after stored procedure we learned another crucial Concept in SQL which is called triggers now triggers are also special kind of store procedures so we saw how to write a before insert trigger you can see it here next we learned how to delete a trigger we also saw how to work with views in SQL so views are basically virtual tables that you can create from existing tables we also saw how you can use views using two different tables and an inner join and we learned how to display views how to rename view names how to delete a view and finally we explored a few Windows function in this tutorial we will learn how to work with databases and tables using SQL with python to do this demo we will be using our jupyter notebook and the MySQL workbench you can see it here so we will write our SQL queries in the jupyter notebook with python like syntax if you don’t have MySQL or jupyter notebook install so please go ahead and install them first while installing the MySQL workbench you’ll be asked to give the username and password let me show you so I am on my MySQL workbench so once you connect it will ask for the username and the password so I’ve given my username as root and password you can give while installing it we will be using the same user ID or the username and the password to make our connection so let’s get started with our Hands-On demonstration part first and foremost let me go ahead and import the necessary libraries I’ll give a comment as import libraries all right so first I’ll import MySQL do connector next from MySQL doc connector I’m going to import my error method or the error module next I want to import pandas as PD so let’s run this okay there is some error here this should be capital E and not small all right you can see I have imported my important libraries now I’m going to create a function that will help us create a server connection so I’ll write my userdefined function by using the DF keyword I’ll write create underscore Server uncore Connection this is going to be my function name and it will take in three parameters first is the host name next is the username and then we have the user password all right I’ll give a colon and then in the next line I’m going to Define a variable which is going to be connection and I’ll assign it to a value called none now we’ll be using exception handling techniques to connect to our MySQL server the tri block lets you test a block of code for errors and the accept block will handle the errors so I’ll write try and give a colon and then I’m going to reassign the connection variable to a method which is MySQL do connector do connect now this MySQL connector. connect method sets up a connection so it establishes a session with the MySQL server if no arguments are passed it uses the already configured or default values so here we are going to pass in three parameters the first is the host name I’ll write host equal to host name which is hostor name name I’ll give a comma then I’ll write user equal to user uncore name next will be my password and I’ll assign the value user _ password all right now I’m going to use a print statement and write mySQL database connection successful after this I’ll give my accept blog so I’ll use the keyword accept here I’ll write error as err give a colon and then I’m going to use the print statement here I’m going to use some print formatting techniques using the F letter I’ll write error colon and I’ll use curly braces give VR and then I’ll close the double codes after this I’m going to return my connection all right let me give a comment here we are going to assign our password so we need to put our MySQL terminal password so this password you assign it while installing MySQL workbench I’ll write PW and I’ll give my password which is simply at the rate 1 2 3 4 5 and then I’m going to give my database name so I’ll give database name here I’m going to write DB equal to this is the database I want to create which is going to be MySQL python let me just scroll this down okay now I’ll say connection equal to I’ll pass in my user defined function name which is create server connection and the parameters which are going to be Local Host that is my host name my username which is root and then I’ll give PW which is my password that is exact L simply at the rate 1 2 3 4 5 let’s just run it now okay there is an error here we need to remove this double quotation all right made another mistake here this this should be root okay you can see here my SQL database connection successful all right next we are now going to create a database that is MySQL _ python so I’ll give a comment create MySQL uncore python database again to create this database I’m going to create another user defined function using the DF keyword I’ll write the function name as create database passing the parameters as connection comma query give a semicolon and in the next line I’ll write cursor equal to I’m going to make the connection so I’ll write connection dot cursor and I’ll give the parenthesis so this mysql’s cursor of MySQL connector python is used to execute statements to communicate with the mySQL database the MySQL cursor class initiates objects that can execute operations such as the MySQL statements okay next I’m going to again use my try and accept block so I’ll write try give a coolon and here I’m going to use cursor do execute within that I’m going to pass in my query next I’ll use a print statement and the message I’m going to display is database created successfully after this I’m going to write my except block I’ll write accept error as err give a colon and then I’ll use a print statement I’ll write print I’ll use the formatting again error colon and I’ll write within single codes I’ll give curly braces err and then I’ll close the double codes next let’s use the variable create underscore database underscore query and here I’m going to write my SQL query to create the database so I’ll write create database and followed by that I’ll give my database name which is going to be MySQL python okay after this I’ll call my function which is create database and I’ll pass in the parameters the first one is connection and next the query qu is create _ database _ query let me just copy it and I’m going to paste it here all right so what I’m doing here is I am creating a new function that is to create a new database with the name MySQL undor python which you can see it here now this function takes in two parameters connection and query I’m using the connection. cursor function which is often used to execute SQL statements using Python language and then I have created my try and exer blocks so this Tri block statements will try to create my new database which is MySQL python in case it fails to create the new database the exer block will work so here I’m writing my SQL query to create a new database which is create database followed by the database name and I’m assigning it to a variable which is create data datase query and then I’m calling my function create database and passing in the two parameters connection and the query all right so let’s just run it all right you can see here it has created my database successfully now you can verify this by checking the MySQL workbench or the MySQL shell you can see on the MySQL workbench here on the left panel under schemas there is a database called MySQL python let me just expand it now we haven’t created any table so it’s not showing it now the next step we are going to connect to this database so let’s go ahead and connect to our database that we have just created I’ll write the comment as connect to database now to connect to a database I’m again going to create a userdefined function using the DF keyword I’ll write create underscore DB which is for database _ connection and the parameters it will take is the host name followed by the username then we have the user password and finally we have the database name I’ll give a colon in the next line I’m going to create my variable which is connect connection and then I’ll assign it to a value none after this I’m going to use my exception handling techniques so I’ll write my tri block first I’m going to reassign my connection variable using the MySQL connector method so I’ll write MySQL do connector do connect so this this method we’ll take in the parameters so first it will take the host name I’ll write host equal to hostor name I’ll give a comma next it will take the usern name so user equal to user name another comma next it will take the user password I’ll use pass WD equal to user uncore password we give another comma and this time is going to be the database name so I’ll write database equal to dbor name now let’s use the print statement and and the message we are going to print is mySQL database connection successful all right finally we’ll write my accept block I’ll write accept error as err give a colon and then I’ll use the print statement f F within double Cotes I’ll write error colon within single Cotes curly braces I’ll write err and we’ll close the double quotes finally this function will return the connection value all right let’s run it and there you go it has run successfully so we have connected to our database now it’s time for us to execute SQL queries I’ll give another comment saying execute SQL queries all right now to execute our SQL queries I’ll use another user defined function which is execute underscore query and I’ll pass in the parameters as connection and query give a colon I’m going to write cursor equal to connection do cursor now this is used to establish a connection and run SQL statements next we’ll use the try and accept block so I’ll write try cursor dot execute this will take in one parameter which is going to be my query and then I’ll write connection do commit which is another method now let’s use the print statement so I’ll write print let’s say the message would be query was successful and then we’ll write our accept block which is accept if the tri block doesn’t work through an error using the print statement within double codes inside the inside the curly braces I’ll write err and close the double codes all right so let’s run it okay so we have successfully created our various functions that we needed to create a database establish a connection and to execute our queries all right now it’s time for us to create our first table inside the MySQL _ python database so to do that I’m going to write my create command in SQL so first we are going to assign our SQL command to a python variable using triple codes to create a multi-line string so let me show you how to do that I’ll write my variable name which is going to be create orders table it is always recommended to use relevant variable names to make it more readable and now I’m going to use triple codes so the triple quote will ensure I can create my multi-line string inside the triple quote I’m going to write my create command which is create table here I’m going to create an orders table first and inside the orders table I’m going to create my column names the First Column would be the order ID it is going to be of type integer and I’ll assign this order ID as my primary key column we’ll give a comma next the second column would be customer underscore name the customer name column would be of type varing character so I’ll write varar and I’ll give a size of 30 and this is also going to be not null moving ahead my fourth column would be the product name column so I’ll write productor name product name will be of type varing character the size is let’s say 20 and it is also not null next I’m going to create my fourth column which is the date on which the item was ordered or the product was ordered so I’ll write date ordered the data type will be date next I’ll create a quantity column to keep track of the number of quantities that were ordered this is of type integer my next column would be unit price which will basically have information about the price of each unit of product unit price can be of type float and finally I’ll have the phone number of the customer I’ll write phone number phone number can be kept as of type varing character I’ve have assigned a size of 20 now let’s give a semicolon and we’ll close the the triple codes all right so this is how the syntax would look like next to run this we are first going to call our create DB function so let me give a comment as connect to the database I’ll write connection equal to create _ dbor connection my parameters would be my host name which is Local Host my username which is root comma my password and then my database name which is MySQL python so I’ll write just DB all right finally let’s execute this query using the execute underscore query function that we had created earlier this takes in two parameter the first one is connection followed by the variable name which is create orders uncore table let us run it okay there is some error here let’s see what’s the error okay so here we have put four double code this should be triple codes now let’s run it okay there is another here let’s debug it it says name cursor not defined let me just roll it to the above cell if you see here in our execute underscore query function instead of cursor I have written cursor so R is missing let’s redun this and now let’s run this again there you go you can see here my SQL database connection successful even our query was also successful now if you want to recheck if the table that is orders was created or not you can check it on the MySQL workbench so let me show you how to do it so I am on my MySQL workbench and under MySQL python database you have something called as tables let me just right click and I’ll select refresh all there you go you can see this Arrow just click on this arrow and here you can see we have a table called orders so we have created our table called orders now you can check the columns as well you have the order ID you have the order ID the customer name product name ordered date quantity unit price and phone number now it’s time for us to insert a few records to this table which is orders now to insert records I’ll give a comment as insert data I’ll start with the variable name let’s say the variable name is data underscore orders I’ll give triple Cotes next I’ll write my insert into command so I’ll write insert into my table name that is orders for followed by values and now I’ll start entering my records for each of the rows so first I’ll give one1 which is the order ID then I’ll give the customers’s name let’s say Steve and the product he had ordered is let’s say laptop then I’ll give my date in which the item was ordered let’s say it is 2018 I’ll choose 06 as the month and the date is let’s say 12 we give another comma this time we’ll pass in the quantity which is two let’s say the price of each laptop was $800 and we’ll give a phone number this is random let’s say 62 9 3 7 3 0 Let’s see 802 all right similarly I’m going to insert five more records of different customers and their items that they have purchased to this table orders so here on my notepad I have my rest of the five records let me just copy it and we’ll paste it in the cell here this will save us some time okay let me recheck if everything is fine I’ll give a comma here all right so we have six customers in our table which have their customer IDs from 101 to 106 you have Steve jaw Stacy Nancy Maria and Danny you have the different items they have purchased laptop books trousers t-shirts headphones and smart TV is the date on which they had ordered this item the number of quantities they had ordered and then we have the unit price and some random phone numbers so let’s create the connection now I’ll write connection equal to I’ll write create undor dbor connection then I’ll going to give my same parameters let me just copy it from the top is Local Host the host name root is my username then we have password and the database name and then I’ll use the same query as above which is execute query I’ll copy this paste it here and instead of of create orders table variable I’ll put as data _ orders so this will store my insert into command you can see the variable I’ve used here is dataor orders now it’s time let’s just run it all right there was some mistake here let’s debug it again this should be triple quotes and not four now let me rerun it again there you go you can see here my SQL database connection successful and my query was also successful now we’ll create another user defined function which will help us read query and display the results so I’ll write my function name as DF read uncore query this will take in two parameters connection and query then I’ll write cursor equal to connection do cursor I’ll put my result as none and then I’ll use my try and except block I’ll write try cursor dot execute this will take in one parameter which is query and then I’ll give another variable which is result equal to cursor dot fetch all now this fetchall method will return all the results in the table I’ll write return result next we’ll use the accept block so I’ll write accept error as ER give a colon and I’ll use my print statement just scroll this down I’ll use my formatting F error give a colon followed by a space within single Cotes inside curly pess I’ll give ER and close my double Cotes let’s run it all right so now we are all set now we are going to use our select Clause having whereby then we’ll see how to use Auto by Clause some inbuilt functions we’ll update some records delete some records and do a lot of other stuff so let’s start with our first query so our first query is going to be using the select statement all right so suppose I want to display all the records that we have inserted into our ords table so the way to do is I’ll assign my query to a variable let’s say q1 I’ll give triple quotes within triple quotes I’ll write select star from orders we give a semicolon followed by the triple codes now we’ll establish the connection so let me just go to the top and I’ll copy this line which is to connect to our database I’ll paste it here now we’ll create a variable called results that will store the result of this query and we are going to assign this variable to our function that is read query and this read query will have two parameters the connection and the variable name which is q1 for the query next to display I’m going to use a for Loop I’ll write for results for result in results print I’ll say result now we are done let’s just run this query there you go you can see here we have successfully printed all the rows in our table which is orders you can see we have six records in total now we are going to explore a few more queries so let me just copy this and we are going to edit in the same query I’ll paste it here next let’s say you want to display individual columns from the table and not all the columns so let me let me create the variable Q2 now instead of star I’m going to display only the customer name and let’s see the phone numbers of the customer so I’ll write phone uncore number all right the rest all Remains the Same let me just recheck it and here instead of q1 we’ll put Q2 and let’s run this cell all right you can see here now we have displayed only two columns the First Column is the customer name and then we have the respective phone numbers okay now let me just paste that query again now we are going to see how you can use an inbuilt function that is in our table we have the order date and from the order date we are only going to display the different ear that are present in the order date so to do that I’m going to use the year function I’ll edit this query instead of q1 I’ll make it Q3 and here I’m going to write select here which is my function name from my column which is date ordered from orders and here I’ll change this to Q3 q1 Q2 Q3 are basically query 1 query 2 and query 3 let’s run it there you go so we have successfully extracted the different years present in the order date column now if you want to display the distinct or the unique dates present in the column you can use the dextin keyword in the select statement so the way to do it is I’ll write select distinct give a space the rest of the query Remains the Same and here Q3 I’ll write Q4 I’ll make this as Q4 let’s run it you can see 2018 and 2019 are the unique year values that are present in the order date column okay now moving ahead let’s write our fifth query and this time we are going to explore how you can use the wear Clause so I’ll change this to Q5 before I write my query so let’s say you want to display all the orders that were ordered before 31st of December 2018 so to filter this we are going to use the wear Clause so I’ll write write select star from orders next I’ll write where my date underscore ordered is less than within course I’ll give my date value which is 2018 December 31st so all the items or the products that were ordered before 31st of December 20 18 will be displayed so let’s run it all right you can see here there are three orders in our table which have been ordered before 31st of December now moving ahead we want to display all the orders that were made after 31st of December so here what you can do is I’ll just copy the above query again I’ll copy this line so instead of less than 31st of December 2018 I’ll make it as greater than so every order that was placed after 31st of December will be displayed if you run it so you can see here there are three orders in our table which were ordered after 31st of December 2018 now moving ahead let’s write a seventh query now let’s see how the autoby Clause Works in SQL so you can filter your results based on a particular column or sort it based on a particular column so this is going to be my query 7 I’ll write it from scratch again let’s say you want to display all the columns from the table so I’ll write select star from orders then I’m going to use order by unit price I’ll give a semicolon let’s run this query and see the output now if you see the result here and you mark the unit price column the result has been ordered in ascending order of unit price you see here it starts with the lowest price and then goes on with the highest price towards the end if you want to order it in descending order you can use the keyword Dees C so this will ensure your top or the most expensive products appear at the top and the least expensive products appear at the bottom all right next now let’s see how you can create a data frame from the given table so as you know using jupyter notebook and pandas you can create data frames and work on it very easily so with this table also we can create our own data frame so for that let me create an empty list first I’ll write from DB equal to I’ll assign this as an empty list so we are going to return a list of lists and then create a pandas data frame next I’ll write my for Loop I’ll write for result in results I’ll assign result to list of results so I’m converting the result into a list and then I’m going to append it to the empty variable or the empty list which is from DB do append I’ll append the result to my empty list next we need to pass in the column now that will be part of our data frame so I’ll write columns equal to this column I’ll pass it within a list so I’ll give my first column as order ID then we have the customer name next I have my product name then I have the date on which it was ordered give a comma then we’ll have the quantity column let me write it in the next line next we have the unit price column and finally we have the phone number column so I’ll write within double quotes phone number and this we are going to assign it to a data frame so I’ll be using PD do data frame which is my function to convert a list into a data frame my variable I’m going to pass this from _ DB and I’ll write my next argument is columns equal to my variable name that is columns finally let’s display the data frame which is DF all right so here I’m creating a empty list first and then I am creating a for Loop and I’m appending the results to my empty list here you can see I have created my column list and using pd. data frame I’m converting the list into a data frame if I run this this is append and not append all right you can see we have our data frame ready this is the index column it starts from zero onwards and then we have the different column names okay now let’s see how to use the update command now suppose you want to change the unit price of one of the orders you can use the update command so the way to do it I’ll first create my variable let’s say update and I’ll give three codes or triple codes then I’ll use my update command which is update followed by the table name that is orders next I’ll write set let’s say unitor price if you see this let’s say I want to set the unit price of trousers from $50 to let’s say $45 I want to update this particular record so I’m going to write set unit price column equal to $45 where the order ID equal to 103 so this query will update the third row in our table which is order ID 103 so it will update from $50 to $45 I’ll close the triple quotes and now I’ll use the connection queries again let me just paste it here all right I’ll delete these three lines of code and instead of that I’ll put execute underscore query and this will take into parameters as always which is going to be connection followed by the variable name that is update let’s run it you see here it says mySQL database connection successful query was successful now you can recheck that to do it let me just go to the top and we’ll just copy our first query which is q1 I’ll copy this and I’ll paste it here let me just rename this now this will be Q8 and I’ll change this as well I’ll write select star from orders where my order ID equal to 103 let’s see the unit price of 103 now you can see here instead of 50 now we have updated it to $45 all right now the last command we are going to say is how you can delete a record from the table I’ll write delete command as my comment now to delete a query I’ll first give my variable name which is delete uncore order and I’ll pass in within triple quotes next I’ll write my delete query which is delete from my table name that is orders then I’ll give my we Clause where let’s say I want to delete my order ID 105 let me just go to the top and explain you again so if you see this we want to delete the order ID 105 which was for customer name Maria and she had ordered headphones we want to completely remove this particular record so I have my delete query ready now let me just create my connection and display the results so I’ll go to the top and I’ll copy this connection command which also has the execute query command and I’ll paste it here and I’m going to make a change here instead of update we’ll write delete underscore order everything looks good let’s just run it you can see our query was successful and now if you want to print it let me just show you I’ll just copy this we’ll paste it here I’ll make this as q9 I want to verify if my order ID 105 was deleted or not instead of this statement I’ll write select star from orders and here I’ll change this to q9 if I run this you can see it here you can Mark order ID 105 was deleted and it no more appears in this table all right so this brings us to the end of the demo session on SQL with python let me just scroll you through what we did so first we imported the important libraries MySQL connector then we imported the error function then we imported pandas using PD we learned how to create a server connection to mySQL database we created a new database that is MySQL Python and now we connected to that database we created a function to execute our queries we saw how you can write a create table command then we inserted a few records to our orders table we created a read uncore query command to read the queries and display the results then we started exploring our different SQL commands one by one we saw how to use select query then we selected a few individual columns from our table followed by using a inbuilt function which was ear then we saw how to use the distinct keyword after that we used our wear Clause to filter our table based on specific conditions we saw how to order your results based on a particular column then we saw how you could convert the table into a data frame using pd. dataframe function finally we learned how to use the update command and the delete command postl is a very popular and widely used database in the industries in this tutorial we will learn post SQL or post chis SQL in detail with an extensive demo session so in today’s video we will learn what post chis SQL is and look at the history of postris SQL we will learn the features of postris SQL and jump into performing postris SQL commands on the SQL cell and PG admin so let’s begin by understanding what is post SQL postc SQL is an open-source object relational database management system it stores data in rows with columns has different data attributes according to the DB engines ranking postris SQL is currently ranked fourth in popularity amongst hundreds of databases worldwide it allows you to store process and retrieve data safely it was developed by a worldwide team of volunteers now let’s look at the history of postr sequel so in 1977 onwards the Ingress project was developed at the University of California Berkeley in 1986 the post Chris project was led by Professor Michael Stonebreaker in 1987 the first demo version was released and in 1994 a SQL interpreter was added to postris the first postris SQL release was known as version 6.0 or 6.0 on January 29 1997 and since then postr SQL has continued to be developed by the post SQL Global Development Group a diverse group of companies and many thousands of individual contributors now let’s look at some of the important features of postest SQL so postest SQL is the world’s most advanced open source database and is free to download it is compatible as it supports multiple operating systems such as Windows Linux and Macos it is highly secure robust and reliable postp SQL supports multiple programming interfaces such as C C++ Java and python postp SQL is compatible with various data types it can work with Primitives like integers numeric string and Boolean it supports structured data types such as dat and time array and range it can also work with documents such as Json and XML and finally postris SQL supports multiversion concurrency control or mvcc now with this Theory knowledge let’s look at the post SQL commands that we will be covering in the demo so we will start with the basic commands such as select update and delete we will learn how to filter data using where clause and having clause in SQL we will also look at how to group data using the group by clause and order the result using the order by Clause you will learn how to deal with null values get an idea about the like operator logical operator such as and and or we will also explore some of the popular inbuilt mathematical and string functions finally we’ll see some of the advanced concepts in postris SQL that is to write case statements subqueries and user defined functions so let’s head over to the demo now okay so let’s now start with our demo so first we’ll connect to post SQL using psql cell so here under type here to search I’ll search for psql you can see this is the SQL cell I’ll click on open let me maximize this okay so for Server I’ll just click enter database I’ll click enter port number is already taken which is 5432 I hit enter username is already given and now it is going to ask for password so here I’ll give my password so that I can connect to my post SQL database so it has given us a warning but we have successfully connected to post SQL all right so now to check if everything is fine you can just run a simple command to check the version of post SQL that we have loaded so the command is Select version with two brackets and a semicolon I’ll hit enter okay you can see the version post SQL 13.2 okay now let me show you the command that will help you display all the databases that are already there so if I hit slash L and hit enter it will give me the list of databases that are already there so we have post SQL there’s something called template 0o template 1 and we have a test database as well okay now for our demo I’ll create a new database so first I’ll write create space database and I’ll give my database name as SQL uncore demo I’ll give a semicolon and hit enter you see we have a message here that says create database so we have successfully created our SQL demo database now if you want to connect to that database you can use back/ c space SQL uncore demo there you go it says you are now connected to database SQL demo so here we can now create tables we can perform insert operation select operation update delete alter and much more now I’ll show you how to connect to post SQL using PG admin so when you install the post SQL database you will get the SQL cell and along with that you also have the PG admin so I’ll just search for PG you can see here it has prompted PG admin I’ll click on open this will open on a web browser you can see it has opened on Chrome and this is how the interface of PG admin looks like it is a very basic interface so on the top you can see the files we have object this tools and we have the help section as well and here you have dashboard properties SQL statistics dependencies dependence and here on the left panel you have servers let me just expand this so it will connect to one of the databases all right so if I go back you see when I had run back/ L to display the databases it had shown me post SQL and test now you can see here we have the post SQL database and the test database all right now we also created one more database which was SQL demo so let me show you how to work on this PG admin and the query tool all right so I’ll right click on SQL demo and I’ll select query tool I’ll just show you how to run a few commands on the query tool so let’s say you want to see the version of post SQL that you are using so you can use the same command that we did on psql Cell which is Select version closed with brackets and a semicolon I’ll select this and here you can see we have the execute button so if I hit execute or press F5 it will run that query you can see we have the output at the bottom and it says post SQL 13.2 compiled by visual C++ it has the 64-bit system okay now let me tell you how to perform a few basic operations using postr SQL commands so here let’s say I’ll write select 5 into 3 I’ll give a semicolon select this and hit F5 so this will run the query and it returns me the result that is the product of 5 and three which is 15 similarly let’s edit this let’s say I’ll write 5 + 3 + let’s say 6 I’ll select this and hit F5 to run it it gives me the sum of 5 + 3 + 6 which is 14 now the same task you can do it on this cell as well let me show you how to do it here so let’s say I’ll write select let’s say I want to multiply 7 into let’s say 10 you know the result it should be 70 if I hit enter it gives me 70 now this question mark column question World we’ll deal with this later all right let me go back to my PG admin again let me do one more operation let’s say this time I’ll write select 5 multiplied by and within brackets I’ll write 3 + 4 I’ll give a semicolon so what SQL will do is first it will evaluate the expression that is there inside the bracket that is 3 + 4 which is 7 and then it will multiply 7 with 5 now let me select this and I’ll hit execute so you can see 7 * 5 is 35 all right now we’ll go back to our shell and here I’ll show you how to create a table so we are going to create a table called movies on the cell that is psql cell so here we will learn how you can create a table and then you can enter a few data into that table all right let me just scroll down a bit okay so my create command goes something like this so I’ll write create table followed by the table name that is movies next my movies table will have a few columns let’s say I want the movie ID after the column name we need to give the data type so movie ID I’ll keep it as integer so integer is one of the data types that is provided by postr SQL next my second column the table would be the name of the movie so I’ll write moviecore name so all the variables or the column names should be as per SQL standards so there shouldn’t be any space between the column names so I have used underscore to make it more readable so my movie name will be of type varar or variable character or varing character and I’ll give the size as 40 so that it can hold 40 characters maximum next my third column will have the genre of the movie so I’ll write moviecore joner again joner is of type barar I’ll give the size as let’s say 30 and my final and the last column will have the IMDB ratings so I’ll write IMDb underscore ratings now the ratings will be of type real since it can have floating or decimal point values if I close the bracket I’ll give a semicolon and I’ll hit enter there you go so we have successfully created a table called movies now let me go back to my PG admin all right so here I have my database that is SQL demo I’ll just right click on this and click on refresh now let me go to schemas I’ll just scroll down a bit here under schemas we have something called as tables let me expand this okay so you can see we have a table called movies in the SQL demo database now and here you can check the columns that we have just added so our movies table has movie ID movie name j and readings all right now there is another way to create a table the previous time we created using the SQL cell now I’ll tell you how to create a table using the PG admin so here under tables I’ll right click and I have the option to create a table so I’ll select table okay so it’s asking me to give the name of the table so this time we are going to create a table called students so I’ll write my table name as students all right these will be default as it is now I’ll go to the columns tab so here you can create the number of columns that you want so you can see on the right I have a plus sign I’ll just select this so that I can add a new row so my first column would be let’s say the student role number I’ll write student underscore RO number again the column name should be as per SQL standards the data type I’m going to select is integer all right now if you want you can give these constraints such as not null so that student R number column will not have any null values and I’ll also check primary key which means all the values will be unique for role numbers all right now if you want to add another column you can just click on that plus sign and let’s say this time I want to give the student name as my second column so I’ll write student underscore name student name will be of type let’s say character wearing if you want to give the length you can specify the length as well let’s say 40 I’ll click on the plus sign again to add my final column the final column would be gender so gender I’ll keep this time as type character okay now you can click on save so that will successfully create your students table there you go so here on the left panel you can see earlier we had only one table that was movies and now we have two tables so one would be added that was students so if I expand this under columns you can see we have the three columns here student rule number student name and gender you can also check the constraints it will tell you if you have any constants so you can see it says students rule number there’s one primary key all right all right now let me run a select statement to show the columns that we have in the movies table so I’ll write select star from movies give a semicolon and let me execute this okay so here on the at the bottom you can see we have the movie ID the movie name movie Jor and IMDb readings now the next command we are going to learn is how to delete a table so there is one way by using the SQL command that is drop table followed by the table name let’s say you want to delete students you can write drop table students and that will delete the table from the database this is one of the methods so you just select and run it now the other way is to you just right click on the table name and here you have delete slash drop if I select this you get a prompt are you sure you want to drop table students I’ll select yes so you can see we have successfully deleted our students table all right now let’s perform a few operations and learn a few more commands in post SQL so to do that I’m going to insert a few records to my movies table so for that I’ll use my insert command so I have my insert query written on a notepad I’ll just copy this and I’ll paste it on my query editor okay so let me just scroll down all right so here you can see I have used my insert command so I have written insert into the name of the table that is movies and we have the movie ID the movie name movie Jer and IMDb readings and these are the records or the rows so we have the first record as movie ID 101 the name of the movie is a very popular movie which is vertigo then we have the movie genre that is Mystery it is also a romance movie and then we have the IMDb readings the current IMDb readings that is 8.3 similarly we have sank Redemption we have 12 Angry Men there’s the Matrix seven inter staler and The Lion King so there are total eight records that we are going to insert into our movies table so let me just select this and hit execute okay you can see it has returned successfully eight records now if I run select star from movies you can see the records that are present in the table so I’ll write select star from movies I’ll select this and I’ll execute it there you go at the bottom you
can see eight rows affected if I scroll this down you have the eight records of information in the movies table all right now if you want to describe the table you can go to the SQL cell and here if you write back SL D and the name of the table that is movies this will describe the table so here you have the column names this has the data type and here you can specify if there are any null values or any con constraints like default constraint or primary key or foreign key and others let me go back to my PG admin okay now first and foremost let me tell you how to update records in a table so suppose you have an existing table and by mistake you have uh entered some wrong values and you want to update those records later you can use the update query for that so I’m going to update my movies table and I’ll set the genre of movie ID 103 which is 12 Angry Men from drama to drama and crime so in our current Table we only have jonre as drama for 12 angry man I’m going to update this column which is the movie genre to drama and crime okay so let me show you how to do it I’ll write update followed by the name of the table that is movies go to the next line I’ll write set then I’ll give the column name which is moviecore Jer equal to I’m going to set it as drama comma crime earlier it was only drama and I’ll give my condition using the where Clause we’ll learn where clause in a bit so I’ll write where moviecore ID is equal to 103 so here our movie ID is the unique identifier so it will first look for movie ID 103 it will locate that movie and it change the genre to drama and crime so now you can see the difference earlier we had 12 Angry Men as drama as the movie genre now if I run this update statement okay you can see we have successfully updated one record now let me run the select statement again okay so here you can see if I scroll down there you go so movie ID 103 movie name 12 Angry Men we have successfully updated the genre as drama comma crime okay now let me tell you how you can delete records from a table so for that you can use the delete command so you’ll write delete from the table name that is movies where let’s say I want to delete the movie ID 108 which is The Lion King so I’ll write where moviecore ID is equal to 108 this is one of the ways to delete this particular movie or you can give let’s say where movie name is is equal to The Lion King let me select this and I’ll hit execute now if I run my select query again you see this time it has returned seven rows and you cannot find movie with movie ID 108 that was The Lion King so we have deleted it all right next we are going to learn about wear clause in post SQL so to learn we Clause I’ll be using the same movie table again let’s say we want to filter only those records for which the IMDB ratings of the movies is greater than 8.7 so this is my updated table now I want to display only those records or those movies whose IMDB ratings is greater than 8.7 so we’ll display 12 angry man which is 9 then we are going to display the Dark Knight which is again 9 and we are also going to display the sank Redemption which has 9.3 the rest of the movies have and am Tob rating less than 8.7 so we are not going to display those all right so let me show you how to write a we Clause so I’ll write select star from movies where I’ll give my column name that is IMDB ratings is greater than I’ll use the greater than symbol then I’ll pass my value that is 8.7 I’ll give a semicolon and let’s run it I’ll hit F5 there you go so we have returned the sank Redemption The Dark Knight and 12 Angry Men because only these movies had IMDB ratings greater than 8.7 okay now let’s see say you want to return only those movies which have IMDB ratings between 8.5 and 9 so for that I’m going to use another operator called between along with the wear Clause so let me show you how to use between with wear Clause I’ll write select star from movies where my IMDB uncore ratings is between I’ll write 8.5 I’ll give an and operator and 9.0 so all the movies that are between 8.5 and 9.0 ratings will be displayed so let’s select this and I’ll run it there you go so we have returned the darkno The Matrix the seven interal and we have the 12 Angry Men so a few of the course that we missed out where I think vertigo which has 8.3 and there’s one more all right now moving ahead let’s say you want to display the movies whose movie genre is action you can see in a table we have a few movies whose genre is action movie so you can do that as well I’ll write select star from movies where the movie J I’m writing this time in one line you can break it into two lines as well I’ll write moviecore Jer which is my column name equal to I’ll give within single codes action now why single code because action is a string hence we need to put it in single codes if I run this there you go so we had one movie in our table whose movie genre action that is The Dark Knight okay now you can also select particular columns from the table by specifying the column names now here in all the examples that we saw just now we are using star now star represents it will select all the columns in the table if you want to select specific columns in the table you can use the column names so you can specify the column names in the select statement let me show you let’s say you want to display the movie name and the movie genre from the table so you can write select moviecore name comma I’ll give the next column as moviecore Jer from my table name that is movies where let’s say the IMDB uncore ratings is less than 9.0 so this time in our result it will only show two columns that is movie name and movie JRE let me run it there you go so these are the movie names and the movie Jers you can see that have an IMDB ratings less than 9.0 all right like how you sh the between operator there is one more operator that you can use with the we Clause that is the in operator so the in operator works like a r clause or an R operator so let’s say I want to select all the columns from my movies table where the IMDB ratings is in 8 .7 or 9.0 if I run this it will display only those records whose IMDB ratings is 8.7 or 9.0 all right so up to now we have looked at how you can work on basic operations in SQL like your mathematical operations you saw how a select statement works we created a few tables then we inserted a few records to our tables we saw how you can delete a table from your database and we have performed a few operations like update delete and we saw how a wear Clause works now it’s time to load a employee CSV file or a CSV data set to post SQL so I’ll tell you how you can do that but first of all before loading or inserting the records we need to create an employee table so let me first go ahead and create a new table called employees in our SQL _ demo database so I’ll write create table my name of the table would be employees next I’m going to give my column names so my first column would be employee ID so the employee ID will be of type integer it is not going to contain any null values so I’ll write not null and I’ll give my constraint as primary key so the employee ID as you know is unique for all the employees in a company so once I write primary key it will ensure that there are no repetition in the employee IDs okay next I’ll have my employee name so my employee name is going to be of type varar and I’ll give my size as 40 okay next we’ll have the email address of the employee again email address would be of type varar and the size is 40 again I’ll give another comma this time we’ll have the gender of the employee gender is again worker of size let’s say 10 okay now let’s include a few more columns we’ll have the Department column so I’ll write department worker let’s say the size is 40 then let’s say we’ll have another column that is called address so the address column will have the country names of the employees address is also our car and finally we have the the salary of the employee salary I’m going to keep it as type real so real will ensure it will have decimal or floating Point values okay so now let me select this create table statement and execute it all right so we have successfully created our table if you want you can check by using select star from employees let me select this and I’ll hit execute all right you can see we have our employee ID as primary key there’s employee name email gender this department address and salary but we don’t have any records for each of these columns now it’s time for us to insert a few records to our employees table now to do that I’m going to use a CSV file so let me show you how the CSV file looks like okay so now I am on my Microsoft Excel sheet and on the top you can see this is my employe data. CSV file here we have the employee ID the employee name email gender this department address and salary now this data was generated using a simulator so this is not validated and you can see it has a few missing values so under email column you have a few employees who don’t have an email ID then you can see on Department also there are some missing values here as well all right so we’ll be importing this this table or the records present in this CSV file onto postr SQL all right so here in the left panel under tables let me right click and first refresh this there you go so initially we had only movies table and now we also have the employees table now what we need to do is I’ll right click again and here you see we have the option to import or export let me click on this and I don’t want to export I need to import so I’ll switch on import all right now it is asking me to give the file location so let me show you how to get the file location so this is my file location actually so my Excel file which was this is present in my e Drive under the data analytics folder I have another folder called postc SQL and within the post SQL folder I have my CSV file that is employe data. CSV so I’ll just select this you can either do it like this or you can browse and do okay now my format is CSP next I’m going to select my headers as yes and then let me go to columns and check if everything is fine all right so I have all my columns here let’s click on okay you can see I have a message here which says import undor export all right so here you can see successfully completed we can verify this by using select star from employees again if I run this all right let me close this there you go it says 150 rows affected which means we have inserted 150 rows of information to our employees table you can see we have the employee ID this are all unique we have the employee name the email we have the address and the salary let me scroll down so that okay you can see we have 150 rows of information that means we have 150 employes in our table okay now we are going to use this employees table and explore some Advanced SQL commands now there is an operator called distinct so see if I write select address from employees this is going to give me 150 address of all the employees there’s some problem here I did a spelling mistake there should be another D if I run this again I’ll query will return 150 rows you can see we have the different country names under address that is Russia we have France there United States we have Germany okay and I think we have Israel as well yeah now suppose you want to display only the unique address or the country names you can use the distinct keyword before the column name so if I write select distinct address from employee it will only display the unique country names present in the address column if I run this see it has return returned us six rows of information so we have Israel Russia Australia United States France and Germany all right now as I said there are a few null values which don’t have any information so you can use the isal operator in SQL to display all the null values that are there suppose I want to display all the employee names where the email ID has a null value so I’ll write select star from employees where email is null this is another way to use your wear Clause if I select and run this there you go so you see here for all these employee names there was no email ID present in the table so it has written us 16 rows of information so around 10% of employees do not have an email ID and if you see a few of them do not have an email ID and also they don’t have a department so if you want to know for those employees which do not have a department you can just replace where department is null instead of where email is null now if I select this okay it has returned us nine rows of information which means around 5% of employees do not have a Department moving ahead now let me show you how the order by Clause Works in SQL now the order buy is used to order your result in a particular format let’s say in ascending or descending order so the way to use is let’s say I want to select all the employ from my table so I’ll write select star from employees order by I want to order the employees based on their salary so I’ll write order by salary let me select and run it okay there is some problem I made a spelling mistake this should be employees let me run it again okay now if you mark the output a result has been ordered in ascending order so all the employees which have salary greater than $445,000 appear at the top and the employees with the highest salaries appear at the bottom so this has been ordered in ascending order which means your SQL or post SQL orders it in ascending order by default now let’s say you want to display the salaries in descending order so that all the top ranking employees in terms of salary appear at the top so you can use the dec keyword which means descending if I run this you can see the difference now so all the employees with the highest salary appear at the top while those with the lowest salaries appear at the bottom so this is how you can use an order by Clause okay so now I want to make a change in my existing table so here if you see under the address column we only have the country names so it would be better if we change the name of the address column to Country so I want to rename a column you can do this using the alter command in postc SQL so let me show you how to rename this column that is address so I’ll write alter table followed by the table name which is employees then I’m going to use rename column address I’ll write two I want to change it to Country if I give a semicolon and hit execute it will change my column name to Country now you can verify this if I run the select statement again there you go earlier it was address column and now we have successfully changed it to Country column okay let me come down now it’s time for us to explore a few more commands so this time I’m going to tell you how an and and an or operator Works in SQL so you can use the and and or operator along with the wear Clause so let’s say I want to s SE the employees who are from France and their salary is less than $80,000 so let me show you how to do it I’ll write select star from employees where I’m going to give two conditions so I’ll use the and clause or the and operator here I’ll write where country is equal to France now Mark here I’m not using address because because we just updated our table and changed the column name from address to Country so I’ll write country equal to France and my next condition would be my salary needs to be less than $80,000 I’ll go a semicolon let me run this all right so it has returned 19 rows of information you can see all my country names of France and the salary is less than $80,000 so this is how you can use or give multiple conditions in a we Clause using the and operator now let’s say you want to use the or operator and let’s say you want to know the employees who are from country Germany or the department should be sales so I’ll write select star from employees where country is equal to Germany and instead of and I’m going to use or their Department should be sales okay now let’s see the output I’ll hit F5 this time to run it all right so we have 23 rows of information now let me scroll to the right you can see either the country is Germany or the department is sales you see one of them in the table so here for the first record the country was Germany the second record the department was sales again sales again for the fourth record the country is Germany so this is how the or condition works so if one of the conditions are true it will return the result it need not be that both the conditions should satisfy now in post SQL there is another feature that is called limit so post SQL limit is an optional clause on the select statement now this is used as a constraint which will restrict the number of rows written by the query suppose you want to display the top five rows in a table you can use the limit operator suppose you want to skip the first five rows of information and then you want to display the next five you can do that using limit and offset so let’s explore how limit and offset works I’ll write select star from employees let’s say I’ll use my order by Clause I’ll write order by salary let’s say in descending and limit it to five this is going to display the top five employees which have the highest salary if I run this there you go you see it has given us five rows of information and these are the top five employes that have the highest salary okay so this is one method of or one way of using the limit Clause now in case you want to skip a number of rows before returning the result you can use offset Clause placed before the limit Clause so I’ll write select star from employees let’s say order by salary descending this time I’m going to use limit 5 and offset three so what this query will do is it will skip the first three rows and then it will print the next five rows if I run this there you go so this is how the result looks like okay now there is another clause which is called Fetch let me show you how that works I’ll copy my previous SQL query I’ll paste it here and here after descending I’m going to write fetch first three row only so my fetch is going to give me the first three rows from the top there you go it has given us the first three rows and you can see the top three employees that have the highest salary since we ordered it in descending order of salary all right you can also use the offset along with the fetch Clause I’ll copy this again and let me paste it here now after descending I’m going to write offset let’s say three rows and fetch first five rows only so what this SQL query is going to do is it will skip the first three rows of information and then it is going to display the next five rows it is going to work exactly the same as we saw for this query let me run it there you go so these are the first five rows of information after excluding the top three rows all right we have another operator that is called as like in post SQL so like is used to do pattern matching so suppose you have a table that has the employee names you forgot the full name of an employee but you remember the few initials so you can use the like operator to get an idea as to which employee name it is now let’s explore some examples to learn how the like operator Works in postris SQL so suppose you want to know the employees whose name starts with a so for that you can use the like operator let me show you how to do it so I want to display the employee name and let’s say I want to know their email IDs from the table name that is employee where since I want to know the employees whose name starts with a so I’ll write employee name like now to use the pattern is within single course I’ll write a and Then followed by percentage now this means the employee name should have an e in the beginning and percentage suggest it can have any other letter following a but in the beginning or the starting should be a if I run this so there is an error here the name of the table is employees and not employee let’s run this again there you go you can see there are 16 employees in our table whose name starts with a you can see this column employee name all of them have a letter A in the beginning okay now let me just copy this command or the query I’ll paste it here let’s say this time you want to know the employees whose name starts with s so instead of a I’ll write s so this means the starting letter should be S and followed by it can have any other letter if I run this so there are 10 employees in the table whose name starts with s okay let’s copy the query again and this time I want to know the employees whose name ends with d now the way to do it is instead of a percentage I’ll write this time percentage D which means at the beginning it can have any letter but the last letter in the string or in the name should be ending with D now let me copy and run this so there are 13 employees in the table whose name ends with a d you can see it here all right now let’s say you want to find the employees whose name contains ish or have ish in their names so the way to do is something like this so I’ll copy this now here instead of a percentage I’ll replace this with percentage ish percentage now this means that in the beginning it can have any letter and towards the end also it can have any letter but this ish should appear within the name let me run and show it to you okay so there is one employee who name contains ish you can see here there’s an ish in the last name of the employee all right now suppose you want to find the employee name which has U as the second letter it can have any letter in the beginning but the second letter of the employee name should have U now the way to do is I’ll copy this and instead of a% I’ll write underscore U followed by percent now this underscore you can think of a blank that can take any one letter so the beginning can start with a B C D or any of the 26 alphabets we have then then it should contain u as the second letter followed by any other letter or letters let me run this okay so there are 10 employees in the table whose name has a u as the second letter you can see these okay now moving ahead let me show you how you can use basic SQL functions or inbuild functions so we’ll explore a few mathematical functions now so let’s say you want to find the total sum of salary for all the employees so for that you can use the sum function that is available in SQL so I’ll write sum and inside the sum function I’ll give my column name that is salary from my table name that is employ let’s see the result this will return one unique value there you go now this is the total salary since the value is very large it has given in terms of e now one thing to note here is if you see the output the column says sum real so this output column is not really readable so SQL has a method which can fix is that is called an alas so since we are doing an operation of summing the salary column we can give an alas to this operation by using the as keyword so if I write sum of salary as let’s say total salary then this becomes my output column you can see the difference if I run this okay you can see now in the output we have the total salary now this is much more readable than the previous one so this is a feature in Excel where you can use or give alas names to your columns or your results now similarly let’s say you want to find the average of salary for all the employees now SQL has a function called AVG which calculates the mean or the average salary if I write AVG and I I can edit my alas name as well let’s see I’ll write mean salary let’s run it you can see the average salary for all the employees it’s around $81,000 okay now there are two more important functions that SQL provides us which is Max and minimum so if I write select maximum or Max which is the function name of salary as let’s say instead of total I’ll write maximum so this will return me the maximum salary of the employee let’s run it and see what is the maximum salary that is present in the salary column all right so we have 1ak 9,616 as highest salary of one of the employees similarly you can use the minan function as well I’ll just write minimum and this will return me the minimum salary of one of the employees in the table I’ll replace the alas name as minimum okay now run it this will give me the minimum salary that is present in our table so it is $4,680 okay now let’s say you you want to find the count of Department in the employees table you can use the count function so if I write select count let’s say I want to know the distinct Department names I can write inside the count function distinct Department as total departments from employees let’s run this this will return me the total number of departments that are there so it gives me there are 12 departments okay now let me show you one more thing here if I write select Department from employees let let’s run this okay so it has returned me 150 rows of information but what I’m going to do is I’ll place my distinct keyword here just before the column name so that I can verify how many departments are there in total there you go so there are 13 departments and one of them is null so moving ahead we’ll replace this null with a department Name by updating a table okay so now let’s update our department column so what we are going to do is wherever the department has a null value we are going to assign a new Department called analytics so earlier we have also learned how to use the update command so I’m going to show it again so we’ll write update followed by the table name that is employees I’m going to set my column that is Department equal to within single codes my name of the department would be analytics where department is I’ll say null so wherever the department has a null value we replace those information with Department that is analytics let’s run this you can see quy returned successfully now let’s say I’ll run this command again and this time you can see the difference there you go so we have 13 rows of information and there is no null department now we have added a new department that is analytics okay now we are going to explore two more crucial commands or Clauses in SQL that is Group by and having so let’s learn how Group by Clause Works in post SQL so the group by statement groups rows that have the same values into summary rows for example you can find the average salary of employees in each country or city or department so the group by Clause is used in collaboration with the select statement to arrange identical data into groups so suppose you want to find the average salary of the employees based on countries you can use the group by Clause so let me show you how to do it I’ll write select I want the countries and the average salary for each country so I’ll use the average function that is AVG and inside the function I’ll pass my column that is salary I’ll give an alas name as let’s say average uncore salary from my table name that is employees next I’m going to use my group by Clause so I’ll write Group by since I want to find the average salary for each country so I’ll write Group by country name let’s give a semicolon and let me run it I’ll use F5 there you go so here on the left you can see the country names we have Israel Russia Australia United States France and Germany and on the right the second column you can see the average salary for each of these countries now you can also order the result in whichever way you want suppose you want to arrange the results based on the average salary so you can use the order by Clause after the group by Clause so I’ll write order by here you can use the alas name that is average salary this is actually average uncore salary and let’s say I want to arrange it in descending order so I’ll write DSC now let’s run this you can mark the difference in the average salary column there you go so as per our result in United States the average salary is the highest and if I scroll down the average salary is the lowest in Germany now let’s see one more example using group buy suppose this time you want to find the maximum salary of male and female employees you can do that too so let me show you how to do it so I’ll write select this time we want to find the maximum salary based on gender so I’ll select my gender column comma and this time I’ll use my Max function since I want to find the maximum salary for male and female employees I’ll give an alas name as Max maximum underscore salary from my table that is employees Group by I’ll write gender okay so let’s run this there you go you can see so one of the female employees had a highest salary of $1 lak1 19,618 while of that of a me was $ 17,6 54 all right now suppose you want to find the count of employees based on each country you can use the count function along with the group by Clause so I’ll write the select statement select since I want to count the employees based on each country so I’ll first select my country column and then I’m going to use the count function I’ll write count e _ ID from my table name that is employees I’m going to group it by country so this query will give me the total number of employees from each country you can see here Israel there are four employees in Australia there are four employees in Russia we have 80 employees in France there were 31 in United States we have 2 7 so on and so forth now let me scroll down okay now it’s time to explore one more Clause a very important Clause that is used in post SQL that is having so the having Clause works like the wear Clause the difference is that wear Clause cannot be used with aggregate functions the having Clause is used with the group by Clause to return those rows that meet a condition so suppose you want to find the countries in which the average salary is greater than $80,000 so you can use the group by clause and the having Clause to get the result so I’ll write my select statement as select country comma I want the average salary so I’ll write AVG of salary I can give an alas name as average salary from employees now I’m going to group it by each country so Group by country colum since I want to find the countries in which the average salary is greater than 80,000 so I’ll use having Clause after the group by Clause I’ll write having average of salary is greater than 880,000 now this condition cannot be specified in the wear Clause so we need a having Clause you cannot use aggregate functions along with wear Clause let me just run it now there you go so we have Russia and United States where the average salary is greater than $80,000 all right now let’s say you want to find the count of employees in each country where there are less than 30 employees so for this I’m going to use the account function first let me select the country column then I’m going to use the count function and in the count function I’m going to pass my employee ID so that we can count the number of employees from my table that is employees now if you want you can use an alias name for this as well but I’m just skipping it for the time being I’ll write Group by country next I’ll write having count of employee ID less than 30 so this will return me the countries in which there are less than 30 employees let’s run it you can see here Israel Australia United States and Germany are the countries in which there are less than 30 employees okay now if you want you can use the order by Clause as well so suppose I’ll write here order by count of employee ID so what this will do is it will arrange my result in ascending order of employe ID count there you can see we have successfully arranged our result in s order of employee IDs okay next we are going to explore one more feature of post SQL that is of using a case statement now in post SQL the case expression is same as IFL statement in any other programming language it allows you to add if else logic to the query to form a powerful query now let me just scroll down and I’ll show you how to use a case statement this is very similar to your eel statement that you use on Excel in C++ in Python and or any other programming language so what I’m going to do is I’m going to write a SQL query that will create a new column and the name of the column would be let’s say salary range so I’m going to divide my salary suppose if the salary is greater than $45,000 and if it’s less than $555,000 in the new column that is salary range we are going to assign a value low salary now if the salary is greater than $55,000 and if it is less than $80,000 we are going to assign a value that is medium salary if the salary is greater than $80,000 we’ll assign a value High salary so all this we are going to do using our case expression in post SQL so I’ll start with my select statement but before that let me show you how to write a comment in post SQL so you can write a comment by giving a Double Dash comments are very helpful because they make your codes or the scripts readable I’ll write case expression in postc SQL similarly if you want you can go to the top and let’s say here you can write with Double Dash having clause okay let’s come down so I’ll write my select statement as select I want the department the country and the salary column I’ll give a comma and I’ll start with my case statement I’ll write case when my salary is greater than 45,000 and my salary is less than 55,000 then the result would be within single codes I’ll write low salary so this is exactly like an if else condition next I’ll write another case when salary is greater than 55,000 and salary is less than let’s say 80,000 then the result would be medium salary and finally I’ll give my last condition that is when salary is greater than 80,000 then the result will be high salary let me write this in a single line then High salary now one thing to remember in postris SQL the codes are insensitive so you can write your select statement in capital in lower case or in sentence case similarly I can write case as small C or you can write as Capital C all right now moving ahead after this I’m going to write end I’ll give an alas name as salary range now this is going to be my new column in the output let me just come down after this we need to give our table name from employees I’ll order it by salary descending okay so what I’m going to do here is I’ll first select Department country and salary column from my employes table and then I’m creating a new column that is salary range and I’m specifying the range so I have three conditions here for low salary for medium salary and high salary so so let’s run this and see the output there you go here you can see we have added a new column known as salary range and we have order our salary in descending order so all the highest salaries appear at the Top If I just scroll down you can see we have medium salaries here and if I scroll down further you can see this low salaries so case statements are really useful when you want to create a new column based on some conditions in the existing table all right now moving ahead we are now going to see how to write subqueries in post SQL so subqueries we write a query inside another query which is also known as nested query so suppose we want to find the employee name Department country and salary of those employees whose salary is greater than the average salary so in such cases you can use subqueries but let me show you how to write a query inside another query first I’ll write the select statement I’m going to select the employee name comma I want the department comma also want to display the country name and the salary from the employees table where my salary should be greater than the average salary so after this bear salary greater than I’m going to use brackets and write my subquery that is Select average salary from employees now let me break it down for you so first we are going to select the average salary from the employees so this particular SQL statement we’ll find the average salary from the table we’ll compare this average salary with salaries of all the employees so whichever employee has the salary greater than the average salary will display their names the department country and their original salary so if you want you can run this statement as well let me select this statement and run it for you you can see we have return the average salary of all the employees which is nearly $81,400 $6 so we want the salaries of the employees to be greater than this average value so let me run this and see how many employees have a salary greater than the average salary there you go so we have around 75 employees whose average salary or whose salary is greater than the average salary all right now moving ahead this time I’m going to tell you how to use some inbuilt functions we learn some inbuilt mathematical functions and string functions that are available in postris SQL so I’ll just give a comment there’s another way to write a comment instead of a Double Dash you can use the forward slash an asteris and inside the asteris you can write let’s say SQL functions and you need to close this so I’ll give another asteris and a forward slash so this is also a comment in postris SQL all right so first of all we’ll explore a few math functions so there is a function called ABS which is used to find the absolute of a value so if I write select abs of let’s say Min – 100 it is going to return me positive 100 or just 100 because as you know the absolute of any value will remove the negative sign involved in that value there you go so our original input was – 100 the absolute of- 100 is + 100 next let’s see another function that is called greatest so the greatest function in post SQL will return the greatest number in a range of numbers so suppose I write select greatest inside the greatest function I’ll pass in a few few numbers let’s say two I’m just randomly passing a few numbers let’s say 4 90 let’s say 56.5 and let’s say 70 I’ll give a semicolon let me run this you will see the greatest function will return the greatest integer value or greatest number that is present in the range of numbers that we have provided so in this case 90 was the largest number or the greatest numbers so we got the result as 90 again you can use an alas for each of these statements now like greatest we also have a function called least which is going to return the least number present in a range of numbers if I run this so the result is two because two is the least number that is present in this selection all right now there’s a function called mod which is going to return the remainder of a division so suppose I write select mod and this takes two parameters let’s say 54 ided 10 as you can guess the remainder is 4 and so is our result you can see it has return the remainder 54 divided by 10 the remainder is 4 all right if I scroll down now let’s see how to use the power function so I’ll write select power let’s say I want to know power 2 comma 3 which is 2 Cube that is 8 let me just run this there you go so the result is 8 you can also check let’s say power of 5 comma 3 it should be 125 all right next you can use the sqrt function that is available in post SQL to find the square root of a number I’ll write sqrt and let’s say I want to find the square root of 100 you can guess the result the output should be 10 if I run this you can see the output here 10 let’s say I want to find the square root of let’s say 144 you can again guess the result it should be 12 let’s verify it okay there is some error let me verify it again there you go it is 12 now there are a few trigonometric functions as well you can use the S function the COS function and the tan function let’s say I want to know the sign of 0 if you have studied High School mathematics you would know the sign of 0 is 0 you can see the result it is0 let’s say you want to know s 90 if I run it you can see the output here 89 all right now there are other functions like ceiling and floor that you can use so let me show you what the ceiling and floor function does I’ll write ceiling let’s say I’ll pass my floating value as 6.45 and let me run it you can see the ceiling function Returns the next highest integer that is 7 in this case since the next highest integer after 6.45 is 7 let’s see what the floor function does and let me run it as you can see the floor function Returns the next lowest integer that is six in this case or the nearest lowest integer to any provided decimal value okay now that we saw how to use mathematical functions there are a few string functions available in postris SQL so let’s explore them as well I’ll write string functions okay scroll down cool there’s a function called character length that gives you the length of a text string suppose I write select give the function as character length and inside this function I’m going to pass in a text let’s say India is a democracy this is my text let me run this okay you can see the result here which is 20 since there are 20 characters in my string that I have provided all right now there’s another function called concat in post SQL so concat is basically used to merge or combine multiple strings so I’ll write select concat within brackets I’ll give the text string now let’s say I want to combine post Crest SQL I’ll give a speed comma I want to merge post SQL is I’ll give another comma and write my final word that is interesting now what we have done is inside the concat function we have passed in separate strings and now using the concard function we want to merge the three strings let’s see what what the result is I’ll run it all right let me just expand this you can see here we have Conca inated the three strings successfully so the output is post SQL is interesting okay now there are functions like left right and mid in post SQL so what the left function does is it will extract the number of characters that you specify from the left of a string let’s say I’ll write select left and I’ll pass in my text string as India a democracy I’ll copy this and I’ll paste it here let’s say I want to extract the first five characters from my string so I’ll give five so what it will do is it will count five characters from left so 1 2 3 4 and 5 if I run this it should ideally print in for me there you go it has printed India for us all right similarly you can use the right function to extract few characters from the right of a string let’s say you want to extract let’s say I’ll give 12 characters from right so from here onwards it will count 12 characters I’ll change left to right now let me select this and run it so you can see here this is the output from the right it has counted 12 characters and returned a democracy okay now there is a function called repeat so the repeat function is going to repeat a particular string the number of times you specify let’s say I want to select and use my repeat function and inside the repeat function I’m going to pass in let’s say India and I want India to be displayed five times I’ll give a semicolon and run it in the output you can see India has been printed five times okay let’s scroll down there is another function a string function in post SQL called as reverse so what reverse function is going to do is it is going to print any string passed as an input in reverse order so if I write select reverse and inside the reverse function I’ll pass in my string that is India is a democracy I’m going to use the same string I’ll copy this and I’ll paste it here I close the codes and the brackets let’s print this you can see it here India is a democracy has been printed in reverse order there you go all right now this time we explored a few inbuilt functions that are already present in post SQL now post SQL also has the feature where you can write your own user defined functions so now we will learn how to write a function of Our Own in postris SQL so let’s create a function to count the total number of email IDs that are present in our employees table so for this we’ll write a function a user defined function so let me give my comment as user defined function okay so let me start by first writing create so this is the syntax to write a function in post SQL so I’ll write create or replace function then I’ll give my function name as count emails and as you know functions have brackets then I’ll write Returns the return type as integer then an alas with dollar symbol I’ll write total emails since I’m going to display the total number of email IDs that are present in my table I’ll close the dollar symbol then I’m going to declare a variable the variable name is going to be total underscore emails this is of type integer I’ll write big and inside begin I’ll write my select statement so I’ll write select I want to count the email IDs that are present so I’ll pass my column name that is email into total emails from my table name that is employees I’ll give a semicolon and then we’ll write return total emails as you know user defined functions often return a value so hence we have mentioned the return statement as well and now I’m going to end my function then the next syntax would be let me just scroll down Okay so here I’ll give my dollar symbol again followed by total underscore emails next I’ll write my language post SQL so the way to mention is PL p g SQL let’s give a semicolon and end it so this is my user defined function function that I have written so I created a function with the function name countor emails and this would return integer as an alas which is total _ emails we declared that variable as an integer then we started with our begin statement that has my select statement where I’m selecting the count of email IDs that are present in the employees table and I’m am putting the value into total _ email so I’ve have used the into keyword and this Returns the result as total _ emails and I have ended let’s run this okay there is some problem there is an typo so this should be integer okay let me run it once again there you go so you’ve successfully created a userdefined function now the final step is to call that function now to call this function I’m going to use my select statement and the function name that is countor emails I’ll give a semicolon let’s execute this there you go so here you can see there are 134 email IDs present in our employees table now one thing to Mark is there are total 150 employees in the table but out of them 134 employees have email IDs the rest of them don’t have so they would ID have null values all right so that brings us to the end of this demo session on postris SQL tutorial let me go to the top we have explored a lot so we started with checking the version of post SQL then we saw how to perform basic mathematical operation that is to add subtract multiply then we saw how to create a table that was movies we inserted a few records to our movies table then we used our select Clause we updated a few values then we deleted one row of information then we learned how to use the wear Clause we learned how to use the between operator we also learned how to use the in Operator Let Me scroll down we created a table called employees and then we learned how the distinct keyword works we also learned how to use isnull with wear Clause we learned about the order by Clause we saw how to alter or rename a column then we explored a few more examples on WE Clause where we learned about and and or operator then we learned how to use limit and offset as well as the fetch operator or the fetch keyword in postr SQL moving further we learned about the like operator in SQL which was used to perform pattern recognition or pattern matching you can say here we saw how to use basic inbuilt post SQL functions like sum average minimum count maximum next we saw how to update a value in a column using post SQL update command we Lo learned how to use Group by then we learned how to use having Clause then we learned how to use case expressions in postc SQL so we saw how case expression is similar to our ifls in any other programming language we explored a few mathematical and string functions and finally we wrote Our Own userdefined function so that brings us to the end of this tutorial on postris SQL in this session we will learn about how to join three or more tables in SQL that’s right so so far we have a fundamental understanding of how to join two tables but in a few situations you might have to extract the data by joining three or more tables right so that’s exactly what we are going to discuss today now without further delay let’s get started now we will jump into the MySQL workbench where we have our query ready so here we will be using three tables employe details employ employe register and employee joining register so we want employee name contact number and joining date joining date is available in joining register contact number is available in employee register and employee name is available in in employee details right so here we are utilizing all three tables and joining them to extract these three columns so here I’m providing the table name and the column name to make sure that the SQL workbench will not get confused which employee name I mean which employee name column should I access which table should I use so to clear that confusion I’m providing the table name so my SQL will use employee details table and from there it will extract the employee name column and contact number to be sure it is present in only one table so it will go to employee register and joining date it is also present in only one table so no need to specify the table name name but to be on safer side if you want to add you can add that’s well and good and it will create an impact to your interviewer considering that you know about the syntaxes so I’ll just follow the normal syntax the way I’m going now and now I’m trying to join it’s the same operation what you use to join two different tables so you’ll just join use the join clause and give the second table name and on what basis so I’m joining these two tables on the primary key employee ID so both tables have the same common employee ID So based on that I’m joining these two tables and then you will use the join Clause once again and give the third table name here and based on which primary key or which common key so this is the one which is common in both the tables and is unique which is the employee ID so I’m using the same one to join the two tables with the third one you can follow the same syntax and join another table to it as well so with this let’s execute this particular query to find our answer so there you go you got the query executed and you have the table over here with employee name contact and joining date let’s quickly get back to my SQL workbench so here let’s say you wanted to create a table and load the data so if the data is minimalistic maybe the data about 10 rows or 15 rows you can manually create a table and insert the elements into your table using such insert commands but this set like mine which is in the Excel spreadsheet and and has about 10,000 rows would you like to you know write 10,000 insert commands no right it would be really timec consuming so for situations like this MySQL workbench has enabled developers to load the data from spreadsheets within a few steps but before we get started let’s check our column headers so here you can see uh it is not compatible with the MySQL workbench or SQL commands right so it has a space and it has a word row and it is a keyword in my SQL right so we don’t want that confusion so for that reason we will try to modify this maybe using an underscore right similarly for the row ID you can eliminate space and use an underscore and when it comes to order ID the same so just let us quickly change all the column headers so that there are SQL compartible there you go so we have replaced all the headers with underscores and made them SQL compatible so let us save it and when you’re saving it just make sure that your Excel file data is also having the uh you know SQL compatible uh name so here we have load CSV to my SQL so here we have the name with space right so let’s try to change that to lowercase Excel data and like right now it is is SQL compatible Now quickly let’s go back to my SQL workbench here we have it now you can just create a new table just right click here or you can also do it from here create a new database and the new schema will be the name of your data set so let’s type it as Excel data or just Excel and apply now this is the schema apply now you have Excel right here just drop down on here you can see the tables right click the tables and here you have the option of uh import data Vis it right just click on that now browse your folders and have it so yeah another notification for you guys so you need to save your Excel spreadsheet in the form of comma separated file right let’s quickly go back and do that open your spreadsheet go to file see as comma separated file save there you go now let’s get back to workbench and now I think you will be able to find it just open next drop table of access make sure you do that to be on the safest side and check all the names here so we have a problem with row ID but that is something which you can fix down the line and before you go to the next step check all the other names as well every other name on the column header is fine just the first one we can just alter the table not a big deal and uh next it should start importing there you go the data file got imported of course it took a little while because it’s 10,000 that’s normal now let’s quickly go to the next step and here you can see 9,987 records imported successfully just click on finish and I think it should be shortly done let’s close the schema go to the query table and and yeah so you can quickly refresh so that you will have the Excel dat over here now let’s use the database that is Excel on it now we are in Excel database now the table we’re looking for is Select star from the data name the data table name is Excel data without a space semicolon just quickly run it and I think we should be shortly able to see all the data sets right over here yeah about the First Column ID we can simply use the alter table um function or query to change the name let’s quickly do that with the table name alter table rename okay what was this can you copy that copy field name rename row ID to so um a small see syntax ER there so I think they should be sorted yeah and now let’s quickly run this and I guess it should be done now let’s quickly run the select command again there you go so we have the row ID order ID uh aut dates ship dates ship in mode customer ID customer name Etc and everything is as per the expectations and that’s how you can load Excel data to mySQL workbench we will learn about the top five interview questions in SQL that you must know to crack your business analytics interviews now without further Ado let’s get started so speaking about the tough five interview questions let’s quickly jump to the workbench of SQL so I’m using my SQL workbench and here we have a database called use simply learn so SLP is the database name so we will be using the use command to get access to that particular database so we are in access to it and now let’s quickly check out tables show tables there you go we have a few tables here book collection book order employee details joining register and employee register so let’s go with the employee details so I’ll just simply shoot a command select star from EMP details so now we have our employee table so we will be using this particular data set to run few queries from our interview questions so getting back to the interview questions so most commonly you will be asked the following interview questions so the first question is you will be asked to find out the names of the employees that start with the vbls so AEI w u so they will ask a question give me the list of names that start with these five letters it can be either a or e or I O U right so what’s the question so here you will be using like operators and not like operators let’s say they want the names with vels so you can just use the like operator and use this particular command select employee name that’s the name of the column from employee register or employee details and then where employee name is like a modulus which means it should start with a and it can have any number of alphabets after that right so let’s quickly also check what do we have in employee register so I think it’s a similar table to employ details that we used before we have the same details here so no not our problem now let’s try to extract the names of the employees W names start with WS right so we will be using the we clause and like operators run the command and there you go so we have three names so if if you able to answer this question they will ask they might ask a similar question with a little modification so this time they might ask you give me the list of names that will not start with vels so you’ll just replace the like operator with not like so either they’ll ask the questions which might ask start with vels or they might ask a question which does not start with vels this is one of the common questions now going to the second question so here also they will give it a Simple Start they will ask to give the details of the employe who has the highest salary or they will ask you to give the highest salary you can simply use the max aggregation and you can get the maximum salary so we have $887,000 and sometimes if you are able to answer this question they will make it a little tricky to you and they will ask give the second highest salary here you can use offset or there are multiple possibilities but let me give you the simplest one where you can have the same query only difference is where salary is less than Max of salary first the subquery will be executed which will extract the maximum salary and the next one is less than so it’ll give you only one R which has a little salary which is Les less than the maximum one right so simply let’s execute the query so we will have the answer so the next highest salary is $78,000 now moving on to the third question so sometimes they will ask you to use the update commands as well so here we have some salary details of our employees in the employee register employee going to do here is let’s say this is the appraisal period and they’re giving you 15% hike to all the employees so you need to update the salary column so what you do is simply uh update table name salary plus salary into the percentage of hike which is 15% so here we adding that particular percentage in the form of decimal numbers which is 0.15 and simply run the command and you have it now you can just simply uh query the same detail which is Select Staff from employee details so you will have the updated salary list here there you go now let’s proceed with the next question that we have in our list which is about select the employee name and salary from a given range right if they ask you give me only the range of employees who ow salary lies between 50,000 to 70,000 so you can use the between operator here and range of numbers that is 50,000 and 70,000 just run this particular code and there you go you have the Dil so there are two employees whose salary lies between 50,000 and 70,000 now the last question is uh they might ask you to extract the details from a certain department so basically this might also turn up to find a difference between having and group by Clause sorry having and wear Clause yeah so this is one of the common interview questions where they will ask the difference between having and wear Clause so when you are implementing Group by in your quer query and you’re also implementing some aggregate functions like count sum Etc minimum maximum in those situations when a group by Claus is involved then you can use having when there is no uh Group by function then you can simply go with the wear Clause so here I’m trying to extract the number of people present in finance department so I’m not grouping by department so I can just use where clause and run this and now in situations where I have to group by I mean you know when I have to implement Group by in those scenarios you can include having clause in place of well so that is the fundamental difference between having and group by and also you got the understanding of group by command here imagine walking into a giant Library this isn’t just any Library it’s huge there are rows and rows of shelf each packed with thousands of books but wait you don’t need to read every book here you’re just looking for specific ones like books about space or stories about superheroes but finding what you need is such a huge uge Library that’s going to be tricky this massive library is like a database a database is a huge collection of information stored neatly ready to be used it holds everything name address grades prices whatever data you can think of but sometimes all that information can be overwhelming you don’t want to shift through everything every time you need something specific right that’s where views comes in in this video we will explore views in Sequel explaining what what they are and how they simplify working with databases we’ll also cover how to create views manage them by updating deleting and listing them and also introduce different types of views like simple complex read only and those with check option we’ll also dive into materialized views which store data for faster queries and by the end you will understand how views can manage data easier and more efficient we will also look into a quiz question to clarify your understanding so what exactly is a view let’s go back to our Library example imagine if you had a Magic Window a special one that only shows you the books about space or superheroes that you are interested in you don’t have to wander through the entire Library anymore you just look through your magic window and it gives you exactly what you need that Magic Window is what a view is in the world of databases a view is a special virtual window into a data that shows you only what you need to see and the best part is it’s not actually stor any new data it’s giving you a filtered look into a huge database think of view as a shortcut making your life a whole lot easier so let us get into the demo part about how to creating a view and the types of views in SQL so let’s start with the demo part that is how do we create a table in my SQL so here as you can see uh I’ve just logged into an online compiler and now we will just learn how we create a table and then we’ll move on to creating views and the types of views so first to create a table just enter this command just WR over here create create table sorry and this just enter the table name it could be something like student details and here you can give the U student ID name we want a student ID name in the first row so we’ll just keep s ID and and the type of the variable is integers we’ll just mention in here and since it’s a primary key you just mention it over your primary key
next variable can be something like name and for this the type is bar care you can just enter any number here suppose I’ve enter 255 I’m sorry comma next mention the address again it’s Vare address so as you can see we have created this table with student ID name so we have to give you underscore student ID name with primary key inte teer type and the name and the address over here so after we have created this table we want to insert data into the table this is our next step step so to insert data what could be the basic uh command which you can write can be something like insert into student the table name student details over here and just mention what all uh columns which we have attributes we have created which is uh S uncore ID next is name and then we have address over here and then we mention the values so now that we have created table and then we can enter our details over here so these are the values which I have inserted which is hsh Ashish prati tanra sham okay so like this you can enter the values over here and now if I want my records to be displayed here the command which I’m using is Select star from student details so all these the table will be shown here so we’ll just click on this run button over here okay so now you can see this is the Sid name number and the names which we have mentioned over here so as you can see this is the output generated this is the Sid the name and here is the address so as you can see the table is created now and the output is also shown now we will move on to the main step which is creating a view so what exactly is a view a view is like a window that lets you see specific data as I’ve already told you in the intro part so now let’s say we only care about students with S ID less than five okay and instead of running the same query every time we want to create a view so here’s how you can do it you can just simply enter the command so here’s how you can create a view so just mention this U command which is create view detail view as select name comma address from student details so student details is the table here which we have already created before and I want the student ID less than five okay so I want this to be shown here so we’ll just simply click on run over here so as you can see this is the error it’s showing why is it showing because line number 203 view must be the first statement in a query batch so to resolve this we have to ensure that the SQL batch is properly separated by go command and the view creation is syntactically correct all right so we haven’t used the go command here to do that we just simply after we have inserted our records in the table here you just type goo okay and so this is the end of the first batch and now since we have have created a view in a new batch we’ll just mention this uh whatever command we have already given and at the end of the second batch we’ll again write Cod okay and now next I want my query to be shown here so I’ll just write this command for my for generating my output which is Select select star from and the table name okay not that uh student details table name name I want this the view table name to be show which is detail view right this is the table name so just copy it from here just paste it give this semicolon so now you can see a view is created so why have we used go because go ensures that to create view command is executed in a new batch and this will help the SQL Server properly separate commands and avoid conflict all right right and we have also used the select star from details view which will fetch the data from The View and includes only student with student ID less than five which is danj pratik Ashish and hsh okay name and the address is displayed over here so this is our output generated now let’s talk about managing views and updating the view let’s say later on you want to update the view to also include the students age instead of deleting or recreating the view you can use create or replace view to update it so let’s add the age column to the table now so we have to just insert the data for the students here so to add a new column age to the students detail table we need to use this command which is Alter table and then provide the name of the table which is student details add age and the type is integer and give the semicolon over here all right and now we have to insert the age data for the student so to do this we will use the update command and here we will just write this update okay name of the table which is student and here we use set age is equal to 19 where S ID is equal to two no sorry with the S ID is equal to one so in the similar way you have to just update all the table over here so after updating it you can just simply search run here so like this you will just update the student details and set the age accordingly and now the next step is after updating it we have as you can see we have just entered all the age number which we want to be displayed here and we have used the insert to command to insert records into the table with the age values and now in order to select it we will uh give the command which is Select star from and the name of the table which is student details and now we want to end the P we have to use this go command enter so now as you can see we have updated and inserted all the age data for the students and we have selected all the data from student details to display the final result by giving this command which is Select Staff from student details and this is the name shown over here and here is the address and this is the age so we have inserted all the age data and we have also corrected the update statement set age is equal to 19 where Sid is equal to 1 and do not forget to add this go okay now the next thing which we will be talking about is deleting a view in order to delete a view you just simply have to uh give this command which is drop view if exists and then give the name I mean the table name which is details view right and just simply you can uh go to this just run and you can see that our table has been dropped okay so now we have deleted our table just by giving this command now so and do not forget that this command will delete the view but don’t worry the data in the original student details table will not be affected only the view table which we have created is deleted and now next we’ll be talking about listing all the views okay so now in order to list all the views just you have to Simply write this command which is show full tables where table type is equals to v i view so by doing this thing uh this output will give you all the views which we have created in this table the name student ID okay and the uh addresses and also the age so it will give you all the table view table which we have created now let’s move on to the main part which is uh the types of views in SQL now let us first understand what is a simple view so simple View view is Created from a single table it’s straightforward and it doesn’t involve complex Logics like joints or subqueries for example to create a simple table you just simply have to write this command which is create view student names as and then select name from student details okay so if you query this View and then write select star from student names okay so after doing this thing you just simply click on run so after we have written this view simple view which is create view student name as select name from student details and do not forget again to mention this go and then select query the simple view to display student names and Select Staff and student names all right and at last again add this go and here this is the output generated the names all right so by using simple query you can do this thing now let’s move on to the second card which is creating a complex view a complex view involves multiple tables or complex logic let’s say we have another table who has student marks that that stores student marks so let’s say so let’s say we have created this table student marks and uh we have given the details here and again we have inserted the data values student ID marks 1 93 these are the values which we have inserted this is the student ID and these are the marks which we have given all right so now let’s create a view that pulls data from both student details and student marks so now as you can see we have created a complex view by providing all these details and this is the student ID and the marks shown over here all right so now let’s move on to the third part which is readon view so a readon view ensures that no one can modify the data through the view this is useful when you want users to able to see the data but not change it to make the view read only you can use permissions in your SQL databases and this feature depends on on your database engine to create a readon view and SQL Server we cannot directly enforce read only Behavior with the create view statement however you can control access to the View using permissions so here’s you can how you can do it uh so first you can create the view normally you can use revoke insert update and also delete permissions from users for that particular view uring that they can only read the data so now uh this was all for the readon view let’s move on to the fourth type of the view which we are discussing today which is the check option so with the check option is seel which is with the check option I’ll just type here with check option okay this is the fourth type of view which we are talking about with check option and cq4 it ensures that any insert or update operation performed through a view complies with the conditions specified in the where laws of the view this means that you cannot insert or update records through the view that violate the condition of the view itself so let’s go through the creation of a view with the WID check option and provide an example with an explanation and expected output so now as you can see we have the expected output for the valid insertion uh using this command which is with check option and with view creation check option we created a view named a sample view as you can see here that selects student ID and name from the student details table but only where the name is not null all right so as you can see we have clearly mentioned here not null and the width check option ensures that any insert or update through the view must comply with a condition where name is not n and the first insert into sample view inserts so student where stent id6 name which also has a valid name and here is the output generated here all right this is the student ID the name over here and that’s it so now we have learned about this width check option as well now I’ll be talking about a materialized view so what exactly is a materialized view well a materialized view is different from a regular view because it stores the actual data in a database meaning the data is precomputed and doesn’t need to be fetched from tables every time you query it this makes accessing data from a materialized view much faster especially for complex query all right so as you can see we have created the table and we create three tables the order details table and the product details and also the customer details table over here and then we have provide the necessary data also the Second Step was the sample data is inserted into all the three tables the materialized view fast order summary is created to summarize orders joining data from the three tables and Computing the total cost and the next step is the materialized view is queried and because the data is precomputed it returns results very quickly with new data when added to the audit DS table the materialized view is refreshed using refreshed materialized view to include the new data all right I hope you get it why are we using the refreshed option and the next step is the materialized view is deleted using the job materialized table over here we have used this job materialized view fast order summary and in this way you can you know uh create a materialized view so now you might be wondering that what is the difference between the materialized view and the complex View and the simple view I’ll be seeing you that later so first let us now discuss why are views so useful okay so uh why are views so useful by now you might be wondering what’s the big deal with the views and here’s why it’s so useful they make our life easy you don’t have to keep writing complex queries over and over you create a view once and it saves you tons of time we also help you in simplifying data instead of pulling everything from your database use can help you narrow down exactly what you need you also help in improving security want to show only certain parts of the data to certain people use views to control what others can see without letting them touch the raw data you can also rename columns for clarity you can rename confusing columns in the view without changing the original table making it easier for users to understand the data so so let us not discuss the differences between the simple complex and the materialized view so here’s a quick summary of the differences so simple view PS data from a single table no complex logic or joints involved and it doesn’t store data it also the performance is it executes every query every time whereas the complex view it combines data from multiple tables using joints Aggregates or other complex logic and it doesn’t store data executes query every time and talking about a materialized view it stores the result of a query making it faster to retrieve data without running the query again it also stores data and it is much faster as it uses pre computed data so now it’s time for the quiz here’s a quiz question for you what is the key difference between a regular View and a materialized view in SQL the first option is regular view store data but materialized views don’t second option is materialized view stores data but regular views don’t C both regular and materialized view stores data and number D is neither of them store data so if you want to answer them you can just write them in the comment section below and that’s it views are an incredible tool in SQL that can simplify your queries improve security and make your life easier so SQL Server is a powerful relational database management system developed by the Microsoft which is widely used for managing and storing the data its benefits include High scalability robust security features and seamless integration with other Microsoft tools tools and Technologies SQL Server provides efficient data management through advanced features like indexing full text search and inmemory processing it also offers excellent support for large D sets making it ideal for Enterprise applications the built-in business intelligence tools help organization gain valuable insights from their data SQL service High availability and Disaster Recovery features ensure continuous operations with minimal downtime with strong data integrity and transactional support it ensures reliable and consistent data management across all applications that said if these are the type of videos you’d like to watch then hit that like And subscribe buttons and the bell icon to get notified so in this session for today which is SQL over tutorial we will cover the SQL Basics that is how to create a table how to insert data how to retrieve the data from the tables Etc and apart from that we will also go through some of the other fundamentals of SQL basics which include sorting in SQL server and followed by that we will also go through the group bu and OD by sequences in SQL Server next ahead we will also learn another important part which is conditional statements which includes case statements in SQL proceeding ahead we will get into another segment of today’s session which is about joints and SQL where we will be combining two or more tables in SQL Server followed by that we have the next part of today’s session which is all about the having clause in SQL Server next we will proceed with learning the next part which is about the between operator in SQL server and followed by that we will get ahead with pattern matching in SQL Server next we will cover the time and date functions available in SQL server and after that we will proceed with temp which is temporary tables and SQL server and proceeding ahead we have the most important part which is about the Common Table expressions in SQL server and followed by that we have the last part in SQL Server tutorial which is about creating views and executing a query to extract the data present in a view in SQL so far so good so these are the foundational skills in SQL Server that you need to get before becoming a pro in SQL Server so this particular tutorial will discuss the major Foundation skills the fundamental skills the basics of SE server and its operations now without further delay let’s get started with one of the compilers which can help us execute the SQL Server queries so we are on one of the SQL Server compilers available online in case if you are facing any difficulty setting up the SQL Server management studio in your PC then you can come up with this one and we’ve also set a particular tutorial where you can learn how to download and install SQL Server management studio and how to configure your SQL Server management studio and the link to that particular tutorial will be dropped in the description box below make sure to refer that in case if you wanted to execute these same codes in SQL Server management studio right so first we will be dealing with two different data table so we will be dealing with customer data and dealership data so the first table will be about the customer data where we will be having about the order ID audit date delivery date dealership in code product category and car fuel type Etc followed by that we will insert some rows into that particular customer table about 15 to 20 tables and after that we have dealership database where we will be having AIT date state region customer ID customer name primary foreign Etc and we will be inserting about 20 entries into that particular table don’t worry if you have more than 20 entries in case if you have more than 20 entries let’s say about 2,000 entries or 20,000 entries it’s not at all a big deal if you’re working on SQL Server studio right there you can use the wizard to just ingest the data from your Source into the SQL Server right you can use ssis tool to import all that data and you’re good to go you don’t have to manually create the data and you don’t have to manually insert the data just for the sake of learning the basic process of how to create a data table and how to insert the data table you’re going through this particular procedure so far so good so we have also inserted the data in the next data table which is our dealership data and now let’s query the data from our tables and now let’s select the execute data there you go we have the data table right over here and let’s copy this query and paste it to query the data from the customer table as well so instead of dealership we will be writing down customer so if You observe closely I’m using uppercase for us like select and from and sentence case or lower case for the variables the table names Etc so that there’s a difference between the keyword as well as the regular variables there you go we have executed the customer data and here we have it the order ID order date delivery date dat to deliver category and Etc now let’s mention the use cases corre so um let’s use double codes or maybe we can also use hasht along with the pipe symbol here or the slash symbol here and write the use cases and then proceed with executing the codes or queries so the first use case is let’s try to filter customers from specific regions and now let’s close this particular command proceed with writing the query so we will be using select keyword and we want customers from a specific region right so let’s write down the customer name so I think the customer name is not mentioned in the customer data but instead of customer name we have a order ID so let’s extract the order ID not a problem select order ID and dealership name for state and the important part which is about the region and what product did they and what’s the revenue that uh the dealership has extracted out of that particular customer from the table customer next line is about join dealership so we will be using join table dealership the keyword so we have one common keyword in between both the tables which is about the order ID which is equals to order ID so this is not done here since we are clubbing both tables we need to specify the tables here so dealership do order ID and customer. Order ID so that the SQL Server will identify via are trying to map two different tables and uh we are trying to combine and extract the data now we are trying to filter out uh customers from a specific region correct now we will be using we Clause to specify that particular region so let’s go with west region correct since we are using the text format here so it’s better we choose double codes correct now let’s run this query let’s give it a next try ambiguous order ID so um let’s do one thing we will pull customer table name here so that it’s no more ambiguous there you go so the thing is we have order ID in both the tables customer table and dealership table so SQL Server got confused here which particular table you want me to extract order ID from so if you mention customer table or dealership table it will choose a specific table and extract that particular column now we have the order ID dealership and all the members that we selected are from the vest region if you check here now let’s proceed with the Second Use case of today’s session or okay let’s continue with the same here so that it is clearly visible for us now let’s sort you’re using the sort command here products B B on the revenue so it’s like maybe uh highest revenue should be in the top or lowest Revenue should be in the top let’s go with the highest revenue okay sort products based on uh or based by Revenue we will write the command here itself we will go with uh some columns I would like to have product here so I’ll eliminate everything else and the product will be here and I want the revenue as well now I don’t want the revenue I want the total revenue so if you’re looking for total then you will be going with the function or aggregate function which is sum right I want the sum of all the revenues that particular product has earned throughout the years or that particular Financial Year from which table so I have both product and revenue in the customer table itself so I’ll go with customer table itself I don’t want to join anything here I’ll eliminate the join command and uh instead of region or instead of we command I would use the group by command here bu product and it is not done so far we want to order it so we want to keep it in descending order so that the highest grossing product is on the top and the lowest Crossing product is on the bottom right so that we can also make make sure that our inventory is filled with those products which are giving us High Revenue so we will be using or bu total revenue right so uh we can give it an Laos as total revenue all right now we can use this particular term here order by total revenue descending DEC is good enough now this is how you will sld the products now let’s execute this particular query there you go so the car Model T okay yeah so the car model j is giving us the highest revenue next is followed by TM and sng so we are not mentioning specific if you go back to the uh table here we are not mentioning the actual car brands and actual car Nam so that we don’t want any copyrights to be faced so we are just mentioning some random names not to be too specific there now let’s proceed with our next query for the session where we want to group by state and calculate total revenue let’s edit the same comment here don’t worry if you want this uh demo document we will also link that demo document with all edit and view rights so that all the viewers can have a quick glance and try to execute these queries in their own local systems okay now let me type Down group by state and calculate total revenue there you go now let’s try to edit the same query now we want statewise revenue and we want from customer table right we want to join dealership for this particular one because dealership is the one which has some details for it so the state data right here you can see we have inserted State data California Texas so we want we want the state data so it is present in dealership table so we also want to perform a join there no worries let’s edit the same query here so we want instead of product we want state and instead of sum of Revenue uh yeah we want Revenue so we will keep it as it is total revenue as total revenue some of Revenue as total revenue from customer table and we want to perform a join operation so let’s create some space join dealership okay uh let’s copy the name so that we don’t make any confusions here dealership sorry here don’t worry we’ll make some edits down the line dealership on order ID so let’s go with the order ID which is available here Order ID equals to order ID remember the first step we did eliminate the confusions by mentioning the table names dot column name and customer name sorry the customer table name do column name on yeah we did it on dealership do order IDE equals to customer. order ID now we will be performing the group buy operation we don’t want uh the order buy Here Group by state of course in case if you wanted to you know order by the highest crossing state you could have used order by there but so far according to the use case we don’t want that so let’s continue with the same exec tion here click execute and you will be having the answer here so based on the states you have their respective revenues there you go now let’s proceed with the next use case for today’s discussion which is about using the conditional statement like case using case query or clause for custom calculations we’ll perform some customized calculations now let’s say we wanted to find out a product which is uh giving Revenue greater than 10,000 as high Revenue let’s say we have revenues in terms of thousands of dollars and let’s have a benchmark like $10,000 and $10,000 is the minimum Revenue you wanted to extract out of that product and if it is not yielding at least 10,000 for your dealership then that product is not selling much so that you can at least make some space in your dealership so that you can import some products which are giving you highest uh Revenue right so we want to find out those uh products which are giving us revenue between 5,000 to 10,000 as medium revenue and less than 5,000 as low revenue and greater than 10,000 as high Revenue you understood the game right so we have three segments High Revenue medium okay okay type of Revenue Vue and least Revenue product so that you can eliminate the least Revenue products out of your inventory now we want order ID so copy the order ID from here and paste it here so we are looking for order ID and we are also looking for product so uh we don’t want some much revenue here let’s eliminate this one and instead of that let’s add product from we’ll extract this from okay before from uh this this form from statement will be at the last here we will begin with a use case right let’s type case now when a specific product or Revenue okay let’s revenue is greater than $10,000 then let’s not use sentence case let’s use keywords and caps then mention it as high or high Revenue let’s copy paste the same code here when revenue is between between 5,000 and we will be using an and operator here then term it as okay we don’t want alter average revenue or medium revenue and copy paste the same here we remove the alter and revenue is less than okay I think we don’t want to use this instead of this we can just place else right so we will place else it should be termed as low revenue or least let’s go with low Revenue there you go and now from which table you want to extract that I want to extract all this data from customer table so everything will go off apart from that now let’s execute this particular code let’s see if we get any errors incorrect Syntax for case I think we made some mistake here okay we forgot the comma here should have mentioned a comma and yeah so the thing is we by mistake chose the alter from suggestions alter keyword from suggestions so far we missed a comma and Al from suggestions I think it’s everything good to go let’s try to execute and if it faces some issues no problem we’ll try to resolve it in a different way so incorrect we missed to write the end okay we did not end the case okay okay fine fine not a problem end as okay uh let’s term it as uh the entire uh table as Revenue category since we are uh splitting into three categories Revenue categories there you go so this is the way to learn make some mistakes that uh you can learn in a better way for the next time you’ll never make a mistake so far so good so we have all the car models categorized into the uh revenue revenue categories High Revenue low Revenue so far we have all the cars in high Revenue not a bad deal I think all the cards are performing really well maybe if you change the numbers a little bit maybe if we take up one lakh in place of 10,000 then maybe we can get a couple of costs but so far so good this is how the query works now let’s uh switch to our next query next use case where we will be combining data from multiple tables using joh we already did that but still for the sake of uh learning experience we will also perform that particular operation so we will be naming this particular use case as combine data from multiple tables using a join so far you have already have a good experience on how joins work but still we will try to do that now we will use customer table here customer table. order ID and uh we want customer table do product and customer. Revenue and uh let’s also say take some data from dealership so we already know that some data of states is in dealership so we’ll also take the dealership data so uh dealership dot State also take the region dealership. region from customer table let’s eliminate the case statement from customer and we want to join let’s push this to the first line so that we don’t have confusion how and where the query is going on so here I want to join so the second line is all about join dealership so let’s copy this we we are combining dealership with customer on order ID in the same way we will take order ID equals to order ID now which order ID is equals to which order ID the first table customer. order ID is equals to the second table dealership order ID so that SQL Server understands which columns from which tables are being joined here for what reason right so far so good we have uh given the columns that we want to have in our output which are these columns and we are joining two tables based on certain criteria but now maybe we can specifically me mention some more data let’s say we already mentioned some uh query where we wanted data for a specific region let’s try to continue that where region is equals to West there you go let’s close the single code or double code now what happens is it is trying to give us the details of customer order ID product Revenue state region from customer table and we’re going to also execute or extract the details of state and region from the dealership table and we’re trying to join to extract those data with dealership data and we are specifically extracting the data where the region is West let’s execute the data don’t worry if we find any errors in this particular code there you go we have the result so we have all the details from the west region there you go let me expand this so that we have a better view anyways it’s okay now let’s go with the sixth quarium for today’s session where we will be executing a query based on having close in SQL server and we will try to filter out some states only show the states where total revenue is greater than 50,000 just like we discussed before we initially used a case statement with $110,000 at least Revenue but now let’s increase the number to 50,000 and we want those states which are giving us minimum 50,000 Revenue so let’s uh rename the use is here we will try to name it as filter data or groups with having with having Clause okay so we want uh State we have state here and let’s also count as total orders so order ID so customer. order ID will be now count of total orders so the count will be the aggregation function here count total order IDs and we can name it as total orders and uh do we have Revenue yes we have revenue and let’s also remove the product I don’t think we might need a product here and the aggregation will be some aggregation has sum of Revenue and I don’t think we need to specifically mention it here but anyways we’ll keep State first copy State and drop it here we want State count of orders as orders and sum Revenue as total revenue what this does is it’ll give a Clarity right so instead of uh if I don’t alos it what it shows is sum of Revenue and for a generic person it might not be as helpful as a data engineer can right for a data engineer data analyst he can totally understand by just reading the agregation function so we counting the orders here we are getting the sum of Revenue but for a mere person who just wants to see the report the business guy who just wants to see the report for him total orders or uh total revenue is a simplest language that he can understand correct now we are extracting this data from customer and uh we will join dealership again okay because we also want to extract the state right so we did not mention the state here but we can mention the state anyway okay we did mention the state so we since we have the state column for extracting State details will anyway go for the dealership so we are already joined dealership data with customer data and let’s do a group by function here Group by region Group by state actually and now the condition having some of Revenue at least or greater than 50,000 so we will go with the having Clause here and sum of Revenue or you can also use the aliah’s name total revenue greater than $50,000 so let’s execute this query and see if the alas works or not okay I think alas will not work here in case of alas let’s go with the ACT ual term and place it over here and now let’s try to execute this if it works or not okay we mentioned two things here region and the state so we just wanted State there not region I think this should solve the issue 20 rows affected ambiguous column order ID now we will let’s say take customer name and place it here or I think the dealership would be the best because we have the details of orders here no no no customer table has the order details so let’s keep it that way there you go so we have uh so many number of states which have Revenue greater than 50,000 so these are the outputs there you go now let’s proceed with the next query where we will be using the range function which is also known as the between function in SQL Server so let’s name the comment as using the range function let’s not uh capitalize it using the range function named between so between is the keyword right now let’s build a use case let’s identify couple of uh States we will use the same query will not make some major differences here so we we want to identify a couple of States whose revenue is greater than 50,000 but less than one L right this makes a good use case for that for that implementation of bit function we will keep all the uh columns as they are and maybe we can also include a couple of columns maybe product as well and uh we will join two tables now I think we don’t need a group by here but in place of group by and in place of having we will specify where some so to use where Clause I mean to use between Clause having is not the right keyword where is the right keyword so Su of Revenue should be in place of the symbol between you take the suggestions and one lakh yeah this is 1 2 3 4 5 zeros so this is is the right way now let’s try to execute if it works or not if there is some error yeah there is some error an aggregate may not appear where Clause unless it is subsidary okay contain having claes select I think this is the place where we can go with the alas name let’s try that or if areas does not work then we will go with a revenue okay let’s simplify this let’s not go with the join statement here let’s eliminate the join we shall just simplify we don’t want State uh we won’t okay let’s not count this there was no need for counting I think that was one error we will go with order ID I to L we don’t want this and anyways we want to take a look at the the product and dealership name as well and we also want revenue from customer table where revenue is between these numbers let’s try to simplify and run this query think it should work yeah it worked so these are the uh dealerships and products which yield in revenue between 50,000 to 1 lakh right so this is the best way to use it so we had some extra aggregate functions which complicated the query no no problem now let’s go with some pattern matching sometimes uh let’s say we are looking for sales from California something like that or we are looking for some car model correct and we don’t know the full name or we might have a spelling mistake right uh we have California here but let’s say we don’t know the spelling of California but we know the first three or four letters of California right so in those instance what you can do is try to match the pattern that we have in our hand with the pattern which is available in the data table and wherever it finds a match it extracts those rows so this is how the pattern matching works now let’s execute a query for a better understanding let’s keep the same query use okay let’s change uh the use case here using the pattern matching in SQL Server so we will have order ID as it is dealership name as it is and product as it is from customer table where product so let’s copy this and paste it here like or I like I like is uh the opposite so let’s go with like car model let’s check with the car models first so all car models are the same so me maybe if uh we go with the fuel type I think fuel type will help because all the card models have the keyword card model and only thing is an alphabet M Etc so it doesn’t make sense for us to execute bit uh the like operator here so instead of product let’s keep fuel type like hybrid correct let’s go with the hybrid one so we will take only the first four letters let’s imagine we don’t know the full spelling of hybrid so if the M pattern is like and since it’s a text we will use uh single quotes or double quotes if the pattern H hybr matches with the elements or the column present in the customer data it will pull all those uh columns or rules now let’s execute this so there you go okay we one important thing we missed to include the percentage this percentage symbol will make SQL Server understand that the text should have hybr and after that anything is accepted so anything which has a beginning with hybr should be pulled out right now let’s execute once again and see the data there you go you have it correct so remember the symbol percentage right and in case if you did didn’t knew uh the hybrid let’s say it has some ybr in its format right it’s it has some vbr in the format anything before vbr is okay anything after vbr is okay again it will yield the same result hybrid there you go and in case if you wanted to change let’s write the name of petrol I don’t know the spelling of petrol so let’s say p e d so let’s imagine I don’t know the name of petrol so instead of petrol I’m writing eliminating p and I’m writing p e t r and anything after that now let’s run this query and see the output there you go so we have all the vehicles which are of fuel type petrol so this is how you can use pattern matching in SEL server now let’s perform some calculations with dates right so we’ll perform some date and time calculations let’s rename the comment as state and time functions and time and some mathematical math Cals now let’s say you are the owner of dealership and uh let’s say for a specific car you are losing customers initially it was using uh initially it used to perform very good like many customers used to come for that specific car and right now that specific car is receiving less orders and you order to find out what is the reason behind it and after a General survey you come to know that the number of days you took to deliver that car is growing let’s say earlier you used to deliver that car in 2 days but right now you’re taking like two months to deliver that car right so that might be the reason but you wanted to make sure that you have a solid proof that you have the number of days that you’re looking at to show your sales team why is so many number of dayses being taken to deliver a card now for that let’s go with select order ID order date you want order date for it now you can simply copy the order date here and paste it there and you also want the delivery date and copy that and paste it that remember these date and Order date and delivery date are of date data type and now you want to calculate the difference so for that you use a function called Date diff or date diff so the the you know there’s there’s two ways of calling it so if you prefer calling it as date right diff or a you call it as dated if okay so there’s two ways of calling that function so I prefer to call date diff because date difference so date diff comma sorry uh Open Bracket so we want to provide some details here so I want to provide a day I want to count the days right so I want to count of days I’m mentioning day and Order date difference between the order date and the delivery date you can specify that using a comma so I want the difference between these two in case if you want to Alas it you can also do that days to deliver from customer and we don’t want this simply place a semicolon there and now just copy this and execute there you go now you have the uh number of days that you or your sales team is taking to deliver the vehicles so and on an average you’re taking about 10 days to deliver now you can prove this to your sales team and uh let them know this is not good you want to have at least minimum 4 to 5 days or 2 to 3 days to deliver a vehicle if this goes on the say might drop and it might be a little problematic right now let’s proceed with the next use case of today’s session about the temporary tables now why temporary tables so there comes a situation where you have to just run some numbers run some something very uh not too critical but you just wanted to do that so in such scenarios what you can do is you don’t want to harm the original table so what you can do is you can create a temporary table or a copy of that table which is somewhere in the intermediate memory storage and as as soon as you close the studio it fades away and nothing happens to your original data table so that is where the temporary tables come to come into picture now let’s create a simple use case for that particular temporary tables right so we will be selecting order ID and let’s also take product and let’s also take Revenue into temp or yeah it’s enough temp order tables I don’t want that temp order and I don’t want to it and from which table are you taking that I’m taking that from customers you can either choose to keep it in in a new line or you can keep it in the same line but I want to keep it in a new line from customers and where revenue is I said you right you wanted to just explore a few things which is not too mandatory uh but you just want to take a look at it that’s why you take the option of creating U temporary tables and I want to find out which are those uh products which are giving us Revenue greater than 1 to 3 4 five1 lakh dollar right now let’s run this it’s run it’s done but we don’t know where it is so what you can do is you just write up simple select query select stock from your temporary table close the query and run it there you go so far there are no such uh products which are giving us greater than one lakh sales of that single product in valid object name temp order okay there might be some error about this particular query let’s try to reduce the number did we miss anything where revenue is greater than okay this is a number we don’t want semicolon there okay we missed a semicolon here let’s run this execute let’s keep it as 10,000 okay okay so there was no certain product which was above one lakh that was a problem okay so these are the products which are giving us U 10,000 sales at least not a problem now let’s proceed with the next uh important part of today’s session which is about the Common Table Expressions so Remember Common Table expressions are also known as CTE play a very major role in realtime data analytics right so uh let’s have a sample of that let’s create a Common Table expression which is rather simple and uh just learn how technically it works right so the only difference is the CTE start with a keyword word named as withd and after that you can term your uh comment table as some data so or some name I’ll give it give it as sales data as and open a bracket and inside this bracket is where you write your actual query now let’s organize a few things I don’t want the temp order table here and I don’t want temp order here from okay I I want to extract a few columns I want uh maybe State and I maybe want uh I don’t want order ID I want Revenue maybe revenue is good so let’s sum the revenue and uh term it as total revenue from customer and now let’s proceed with a join command where we will be joining dealership data let’s copy the dealership data table name we will be joining based on the order ID that is common so uh maybe order ID copy the order ID is equals to order ID and which order ID are you talking about I’m talking about uh customers order ID and dealership data order ID so we need to mention that on dealership. order ID and customer. Order ID Group by state good since we took State as well we can use a group by function here and this is not the end of uh everything so semicolon may have to wait so this is the first part now using this particular CTE this entire term is CTE so this particular the output of this particular query will be stored in sales data now we will make use of this CT or sales data turn to extract a few more queries so we will be writing another select statement select state from from okay let’s also take the total revenue which we created here let’s take the same term here so that we don’t make a mistake from the CT which we created very recently which happens to be the sales data where total revenue is crossing something about $50,000 and now is the semicolon let’s try to execute this there you go so a few regions or a few States and total revenue Vue which are exceeding $50,000 are there you go what is post SQL postris SQL is an open-source object relational database management system it stores data in rows with columns as different data attributes according to the DB engines ranking post SQL is currently ranked fourth in popularity amongst hundreds of databases worldwide it allows you to store process and retrieve data safely it was developed by a worldwide team of volunteers now let’s look at the history of post Christ sequel so in 1977 onwards the Ingress project was developed at the University of California Berkeley in 1986 the post Chris project was led by Professor Michael Stonebreaker in 1987 the first demo version was released and in 1994 a SQL interpreter was added to postris the first postris sequel release was known as as version 6.0 or 6.0 on January 29 1997 and since then post SQL has continued to be developed by the post SQL Global Development Group a diverse group of companies and many thousands of individual contributors now let’s look at some of the important features of postest SQL so postest SQL is the world’s most advanced open source database and is free to download it is compatible as it supports multiple operating systems such as Windows Linux and Mac OS it is highly secure robust and reliable postp SQL supports multiple programming interfaces such as C C++ Java and python postp SQL is compatible with various data types it can work with Primitives like integers numeric string and Boolean it supports structured data types such as dat and time array and range it can also work with documents such as Json and XML and finally postris SQL supports multiversion concurrency control or mvcc now with this Theory knowledge let’s look at the post SQL commands that we will be covering in the demo so we will start with the basic commands such as select update and delete we will learn how to filter data using wear clause and having clause in SQL we will also look at how to group data using the group by clause and order the result using the order by Clause you will learn how to deal with null values get an idea about the like operator logical operator such as and and or we will also explore some of the popular inbuilt mathematical and string functions finally we’ll see some of the advanced concepts in postris SQL that is to write case statements subqueries and user defined functions so let’s head over to the demo now okay so let’s now start with our demo so first we’ll connect to post SQL using psql cell so here under type here to search I’ll search for psql you you can see this is the SQL cell I’ll click on open let me maximize this okay so for Server I’ll just click enter database I’ll click enter port number is already taken which is 5432 I’ll hit enter username is already given and now it is going to ask for password so here I’ll give my password so that I can connect to my post SQL database so it has given us a warning but we have successfully connected to postr SQL all right so now to check if everything is fine you can just run a simple command to check the version of post SQL that we have loaded so the command is Select version with two brackets and a semicolon I’ll hit enter okay you can see the version post SQL 13.2 okay now let me show you the command that will help you to display all the databases that are already there so if I hit /l and hit enter it will give me the list of databases that are already there so we have post SQL there’s something called template zero template 1 and we have a test database as well okay now for our demo I’ll create a new database so first I’ll write create space database and I’ll give my database name as as SQL demo I’ll give a semicolon and hit enter you see we have a message here that says create database so we have successfully created our SQL demo database now if you want to connect to that database you can use back SL c space SQL demo there you go it says you are now connected to database SQL demo so here here we can now create tables we can perform insert operation select operation update delete alter and much more now I’ll show you how to connect to post SQL using PG admin so when you install the post SQL database you will get the SQL cell and along with that you also have the PG admin so I’ll just search for PG you can see here it has prompted PG admin I’ll click on open this will open on a web browser you can see it has opened on Chrome and this is how the interface of PG admin looks like it is a very basic interface so on the top you can see the files we have object there’s tools and we have the help section as well and here you have dashboard properties SQL statistics dependencies dependence and here on the left panel you have servers let me just expand this so it will connect to one of the databases all right so if I go back you see when I had run back/ L to display the databases it had shown me post SQL and test now you can see here we have the post SQL database and the test database all right now we also created one more database which was SQL demo so let me show you how to work on this PG admin and the query tool all right so I’ll right click on SQL demo and I’ll select query tool I’ll just show you how to run a few commands on the query tool so let’s say you want to see the version of post SQL that you are using so you can use the same command that we did on psql Cell which is Select version closed with brackets and a semicolon I’ll select this and here you can see we have the execute button so if I hit execute or press F5 it will run that query you can see we have the output at the bottom and it says post SQL 13.2 compiled by visual C++ it has the 64-bit system okay now let me tell you how to perform a few basic operations using post SQL commands so here let’s say I’ll write select five into 3 I’ll give a semicolon select this and hit F5 so this will run the query and it returns me the result that is the product of 5 and three which is 15 similarly let’s edit this let’s say I’ll write 5 + 3 + let’s say six I’ll select this and hit F5 to run it it gives me the sum of 5 + 3 + 6 which is 14 now the same task you can do it on this cell as well let me show you how to do it here so let’s say I’ll write select let’s say I want to multiply 7 into let’s say 10 you know the result it should be 70 if I hit enter it gives me 70 now this question mark column question World we’ll deal with this later all right let me go back to my PG admin again let me do one more operation let’s say this time I’ll write select 5 multiplied by and within brackets I’ll write 3 + 4 I’ll give a semicolon so what SQL will do is first it will evaluate the expression that is there inside the bracket that is 3 + 4 which is 7 and then it will multiply 7 with 5 now let me select this and I’ll hit execute so you can see 7 * 5 is 35 all right now we’ll go back to our shell and here I’ll show you how to create a table so we are going to create a table called movies on the cell that is psql cell so here we will learn how you you can create a table and then you can enter a few data into that table all right let me just scroll down a bit okay so my create command goes something like this so I’ll write create table followed by the table name that is movies next my movies table will have a few columns let’s say I want the movie ID after the column name we need to give the data type so movie ID I’ll keep it as integer so integer is one of the data types that is provided by post SQL next my second column the table would be the name of the movie so I’ll write moviecore name so all the variables or the column name should be as per SQL standards so there shouldn’t be any space between the column name so I have used underscore to make it more readable so my movie name will be of type varar or variable character or varing character and I’ll give the size as 40 so that it can hold 40 characters maximum next my third column will have the genre of the movie so I’ll write moviecore Jer again joner is of type barar I’ll give the size as let’s say 30 and my final and the last column we’ll have the IMDB ratings so I’ll write IMDb underscore ratings now the ratings will be of type real since it can have floating or decimal point values if I close the bracket I’ll give a semicolon and I’ll hit enter there you go so we have successfully created a table called movies now let me go back to my PG admin all right so here I have my database that is SQL demo I’ll just right click on this and click on refresh now let me go to schemas I’ll just scroll down a bit here under schemas we have something called as tables let me expand this okay so you can see we have a table called movies in the SQL demo database now and here you can check the columns that we have just added so our movies table has movie ID movie name Jor and ratings all right now there is another way to create a table the previous time we created using the SQL cell now I’ll tell you how to create a table using the PG admin so here under tables I’ll right click and I have the option to create a table so I’ll select table okay so it’s asking me to give the name of the table so this time we are going to create a table called students so I’ll write my table name as students all right these will be default as it is now I’ll go to the columns tab so here you can create the number of columns that you want so you can see on the right I have a plus sign I’ll just select this so that I can add a new row so my first column would be let’s say the student role number I’ll write student uncore RO number again the column name should be as per SQL standards the data type I’m going to select is integer all right now if you want you can give these constraints such as not null so that student role number column will not have any null values and I’ll also check primary key which means all the values will be unique for role numbers all right if you want to add another column you can just click on that plus sign and let’s say this time I want to give the student name as my second column so I’ll write student underscore name student name will be of type let’s say character wearing if you want to give the length you can specify the length as well let’s say 40 I’ll click on the plus sign again to add my final column the final colum would be gender so gender I’ll keep this time as type character okay now you can click on save so that will successfully create your students table there you go so here on the left panel you can see earlier we had only one table that was movies and now we have two tables so one would be added that was students so if I expand this under columns you can see we have the three columns here student rule number student name and gender you can also check the constraints it will tell you if you have any constraints so you can see it says students rule number there’s one primary key all right all right now let me run a select statement to show The Columns that we have in the movies table so I’ll write select star from movies give a semicolon and let me execute this okay so here on the at the bottom you can see we have the movie ID the movie name movie JRE and IMDb readings now the next command we are going to learn is how to delete a table so there is one way by using the SQL command that is drop table followed by the table name let’s say you want to delete students you can write drop table students and that will delete the table from the database this is one of the methods so you just select and run it now the other way is to you just right click on the table name and here you have delete slash drop if I select this you get a prompt are you sure you want to drop table students I’ll select yes so you can see we have successfully deleted our students table all right now let’s perform a few operations and learn a few more commands in post SQL so to do that I’m going to insert a few records to my movies table so for that I’ll use my insert command so I have my insert query written on a notepad I’ll just copy this and I’ll paste it on my query editor okay so let me just scroll down all right so here you can see I have used my insert command so I have written insert into the name of the table that is movies and we have the movie ID the movie name movie Jer and IMDb readings and these are the records or the rows so we have the first record as movie ID 101 the name of the movie is a very popular movie which is vertigo then we have the movie genre that is Mystery it is also a romance movie and then we have the IMDb readings the current IMDb readings that is 8. three similarly we have sank Redemption we have 12 Angry Men there’s the Matrix 7 inter staler and The Lion King so there are total eight records that we are going to insert into our movies table so let me just select this and hit execute okay you can see it has returned successfully eight records now if I run select star from movies you can see the records that are present in the table so I’ll write select star from movies I’ll select this and I’ll execute it there you go at the bottom you can see eight rows affected if I scroll this down you have the eight records of information in the movies table all right now if you want to describe the table you can go to the SQL cell and here if you write back SL D and the name of the table that is movies this will describe the table so here you have the column names this has the data type and here you can specify if there are any null values or any constraints like default constant or primary key or foreign key and others let me go back to my PG admin okay now first and foremost let me tell you how to update records in a table so suppose you have an existing table and by mistake you have entered some wrong values and you want to update those records later you can use the update query for that so I’m going to update my movies table and I’ll set the genre of movie ID 103 which is 12 Angry Men from drama to drama and crime so in our current Table we only have Jon as drama for 12 Angry Men I’m going to update this column which is the movie genre to drama and crime okay so let me show you how to do it I’ll write update followed by the name of the table that is movies go to the next line I’ll write set then I’ll give the column name which is moviecore Jer equal to I’m going to set it as drama comma crime earlier it was only drama and I’ll give my condition using the wear Clause we’ll learn where clause in a bit so I’ll write where moviecore ID is equal to 103 so here our movie ID is the unique identifier so it will first look for movie ID 103 it will locate that movie and it change the genre to drama and crime so now you can see the difference earlier we had 12 Angry Men as drama as the movie genre now if I run this update statement okay you can see we have successfully updated one record now let me run the select statement again okay so here you can see if I scroll down there you go so movie ID 103 movie name 12 Angry Men we have successfully updated the genre as drama comma crime okay now let me tell you how you can delete records from a table so for that you can use the delete command so you’ll write delete from the table name that is movies where let’s say I want to delete the movie ID 108 which is The Lion King so I’ll write where moviecore ID is equal to 108 this is one of the ways to delete this particular movie or you can give let’s say where movie name is equal to The Lion King let me select this and I’ll hit execute now if I run my select query again you see this time it has returned seven rows and and you cannot find movie with movie ID 108 that was The Lion King so we have deleted it all right next we are going to learn about we clause in post SQL so to learn we Clause I’ll be using the same movie table again let’s say we want to filter only
those records for which the IMDB ratings of the movies is greater than 8.7 so this is my updated table now I want to display only those records or those movies whose IMDB ratings is greater than 8.7 so we’ll display 12 angry man which is 9 then we are going to display The Dark Knight which is again 9 and we are also going to display the sank Redemption which has 9.3 the rest of the movies have and IM to be rating less than 8.7 so we are not going to display those all right right so let me show you how to write a wear Clause so I’ll write select star from movies where I’ll give my column name that is IMDB ratings is greater than I’ll use the greater than symbol then I’ll pass my value that is 8.7 I’ll give a semicolon and let’s run it I’ll hit F5 there you go so we have returned the sashank Redemption The Dark Knight and 12 Angry Men because only these movies had IMDB ratings greater than 8.7 okay now let’s say you want to return only those movies which have IMDB ratings between 8.5 and 9 so for that I’m going to use another operator called between along with the wear Clause so let me show you how to use between with wear clause I’ll write select star from movies where my IMDb underscore ratings is between I’ll write 8.5 I’ll give an and operator and 9.0 so all the movies that are between 8.5 and 9.0 ratings will be displayed so let’s select this and I’ll run it there you go so we have returned the Dark Knight The Matrix the seven interal and we have the 12 Angry Men so a few of the records that we missed out where I think vertigo which has 8.3 and there’s one more all right now moving ahead let’s say you want to display the movies whose movie genner is action you can see in a table we have a few movies whose genre is action movie so you can do that as well I’ll write select star from movies where the movie genre I’m writing this time in one line you can break it into two lines as well I’ll write moviecore Jer which is my column name equal to I’ll give within single quotes action now why single code because action is a string hence we need to put it in single codes if I run this there you go so we had one movie in our table whose movie genre was action that is The Dark Knight okay now you can also select particular columns from the table by specifying the column names now here in all the examples that we saw just now we are using star now star represents it will select all the columns in the table if you want to select specific columns in the table you can use the column names so you can specify the column names in the select statement let me show you let’s say you want to display the movie name and the movie genre from the table so you can write select moviecore name Comm I’ll give the next column as moviecore Jer from my table name that is movies where let’s say the IMDB uncore ratings is less than 9.0 so this time in our result it will only show two columns that is movie name and movie JRE let me run it there you go so these are the movie names and the movie genners you can see that have an IMDB ratings less than 9.0 all right like how you sh the between operator there is one more operator that you can use with the we Clause that is the in operator so the in operator works like a r clause or an or operator so let’s say I want to select all the columns from my movies table where the IMDB ratings is in 8.7 or 9.0 if I run this it will display only those records whose IMDB ratings is 8.7 or 9.0 all right so up to now we have looked at how you can work on basic operations in SQL like your mathematical operations you saw how a select statement works we created a few tables then we inserted a few records to our tables we saw how you can delete a table from your database and we have performed a few operations like update delete and we saw how a wear Clause works now it’s time to load a employee CSV file or a CSV data set to post SQL so I’ll tell you how you can do that but first of all before loading or inserting the records we need to create an employee table so let me first go ahead and create a new table called employees in our SQL demo database so I’ll write create table my name of the table would be employees next I’m going to give my column names so my first column would be employee ID so the employee ID will be of type integer it is not going to contain any null values so I’ll write not null and I’ll give my constraint as primary key so the employee ID as you know is unique for all the employees in a company so once I write primary key will ensure that there are no repetition in the employee IDs okay next I’ll have my employee name so my employee name is going to be of type varar and I’ll give my size as 40 okay next we’ll have the email address of the employee again email address would be of type varar and the size is 40 again I’ll give another comma this time we’ll have the gender of the employee gender is again worker of size let’s say 10 okay now let’s include a few more columns we’ll have the department column so I’ll write Department varar let’s say the size is 40 then let’s say we’ll have an another column that is called address so the address column will have the country names of the employees address is also VAR car and finally we have the salary of the employee salary I’m going to keep it as type real so real will ensure it will have decimal or floating Point values okay so now let me select this create table statement and execute it all right so we have successfully created our table if you want you can check by using select star from employees let me select this and I’ll hit execute all right you can see we have our employee ID as primary key there’s employee name email gender this department address and salary but we don’t have any records for each of these columns now it’s time for us to insert a few records to our employees table now to do that I’m going to use a CSV file so let me show you how the CSV file looks like okay so now I am on my Microsoft Excel sheet and on the top you can see this is my employe data. CSV file here we have the employee ID the employee name email gender this department address and salary now this data was generated using a simulator so this is not validated and you can see it has a few missing values so under email column we have a few employees who don’t have an email ID then you can see under Department also there are some missing values here as well all right so we’ll be importing this table or the records present in this CSV file onto postr SQL all right so here in the left panel under tables let me right click and first refresh this there you go so initially we had only movies table and now we also have the employees table now what we need to do is I’ll right click again and here you see we have the option to import or export let me click on this and I don’t want to export I need to import so I’ll switch on import all right now it is asking me to give the file location so let me show you how to get the file location so this is my file location actually so my Excel file which was this is present in my e Drive under the data analytics folder I have another folder called post SQL and within the postc SQL folder I have my CSV file that is employee data. CSV so I’ll just select this you can either do it like this or you can browse and do okay now my format is CSP next I’m going to select my headers as yes and then let me go to columns and check if everything is fine all right so I have all my columns here let’s click on okay you can see I have a message here which says import undor export all right so here you can see successfully completed we can verify this by using select star from employees again if I run this all right let me close this there you go it says 150 rows affected which means we have inserted 150 rows of information to our employees table you can see we have the employee ID these are all unique we have the employee name the email we have the address and the salary let me scroll down so that okay you can see we have 150 rows of information that means means we have 150 employees in our table okay now we are going to use this employees table and explore some Advanced SQL commands now there is an operator called distinct so say if I write select address from employees this is going to give me 150 address of all the employees there’s some problem here I did a spelling mistake there should be another D if I run this again AL query will return 150 rows you can see we have the different country names under address that is Russia we have France the United States we have Germany okay and I think we have Israel as well yeah now suppose you want to display only the unique address or the country names you can use the distinct keyword before the column name so if I write select distinct address from employee it will only display the unique country names present in the address column if I run this see it has returned us six rows of information so we have Israel Russia Australia United States France and Germany all right now as I said there are a few null values which don’t have any information so you can use the isnull operator in SQL to display all the null values that are there suppose I want to display all the employee names where the email ID has a null value so I’ll write select star from employees where email is null so this is another way to use your wear Clause if I select and run this there you go so you see here for all these employee names there was no email ID present in the table so it has written us 16 rows of information so around 10% of employees do not have an email ID and if you see a few of them do not have an email ID and also they don’t have a department so if you want to know for those employees which do not have a department you can just replace where department is null instead of where email is null now if I select this okay it has written us nine rows of information which means around 5% of employees do not have a department moving ahead now let me show you how the order by Clause Works in SQL now the order buy is used to order your result in a particular format let’s say in a sending or descending order so the way to use is let’s say I want to select all the employees from my table so I’ll write select star from employees order by I want to order the employees based on their salary so I’ll write order by salary let me select and run it okay there is some problem I made a spelling mistake this should be employees let me run it again okay now if you mark the output a result has been ordered in ascending order so all the employees which have salary greater than $445,000 appear at the top and the employees with the highest salaries appear at the bottom so this has been ordered in ascending order which means your SQL or post SQL orders it in ascending order by default now let’s say you want to display the salaries in descending order so that all the top ranking employees in terms of salary appear at the top so you can use the dec keyword which means desending if I run this you can see the difference now so all the employees with the highest salary appear at the top while those with the lowest salaries appear at the bottom so this is how you can use an order by Clause okay so now I want to make a change in my existing table so here if you see under the address column we only have the country names so it would be better if we change the name of the address column to Country so I want to rename a column you can do this using the alter command in post SQL so let me show you how to rename this column that is address so I’ll write alter table followed by the table name which is employees then I’m going to use rename column address I’ll write two I want to change it to Country if I give a semicolon and hit execute it will change my column name to Country now you can verify this if I run the select statement again there you go earlier it was address column and now we have successfully changed it to Country column okay let me come down now it’s time for us to explore a few more commands so so this time I’m going to tell you how an and and an or operator Works in SQL so you can use the and and or operator along with the wear Clause so let’s say I want to select the employees who are from France and their salary is less than $80,000 so let me show you how to do it I’ll write select star from employees where I’m going to give two conditions so I’ll use the and clause or the and operator here I’ll write where country is equal to France now Mark here I’m not using address because we just updated our table and changed the column name from address to Country so I’ll write country equal to France and by next condition would be my salary needs to be less than $80,000 I’ll give a semicolon let me run this all right so it has returned 19 rows of information you can see all my country names are France and the salary is less than $80,000 so this is how you can use or give multiple conditions in a we Clause using the and operator now let’s say you want to use the or operator and let’s say you want to know the employees who are from country Germany or the department should be sales so I’ll write select star from employees where country is equal to Germany and instead of and I’m going to use or their depart M should be sales okay now let’s see the output I’ll hit F5 this time to run it all right so we have 23 row of information now let me scroll to the right you can see either the country is Germany or the department is sales you see one of them in the table so here for the first record the country was Germany the second record the department was sales again sales again for the fourth record the country is Germany so this is how the or condition works so if one of the conditions are true it will return the result it need not be that both the conditions should satisfy now in post SQL there is another feature that is called limit so post SQL limit is an optional clause on the select statement now this is used as a con ST which will restrict the number of rows written by the query suppose you want to display the top five rows in a table you can use the limit operator suppose you want to skip the first five rows of information and then you want to display the next five you can do that using limit and offset so let’s explore how limit and offset works I’ll write select star from employees let’s say I’ll use my order by Clause I’ll write order by salary let’s say in descending and limit it to five this is going to display the top five employees which have the highest salary if I run this there you go you see it has given us five rows of information and these are are the top five employees that have the highest salary okay so this is one method of or one way of using the limit Clause now in case you want to skip a number of rows before returning the result you can use offset Clause placed before the limit Clause so I’ll write select star from employees let’s say order by salary descending this time I’m going to use limit five and offset three so what this query will do is it will skip the first three rows and then it will print the next five rows if I run this there you go so this is how the result looks like okay now there is another class which is called Fetch let me show you how that works I’ll copy my previous SQL query I’ll paste it here and here after descending I’m going to write fetch first three row only so my fetch is going to give me the first three rows from the top there you go it has given us the first three rows and you can see the top three employees that have the highest salary since we ordered it in descending order of salary all right you can also use the offset along with the fetch Clause I’ll copy this again and let me paste it here now after descending I’m going to write offset let’s say three rows and fetch first five rows only so what this SQL query is going to do is it will skip the first three rows of information and then it is going to display the next five rows it is going to work exactly the same as we saw for this query let me run it there you go so these are the first five rows of information after excluding the top three rows all right we have another operator that is called as like in post SQL so like is used to do pattern matching so suppose you have a table that has the employee names you forgot the full name of an employee but you remember the few initials so you can use the like operator to get an idea as to which employee name it is now let’s explore some examples to learn how the like operator Works in post SQL so suppose you want to know the employees whose name starts with a so for that you can use the like operator let me show you how to do it so I want to display the employee name and let’s say I want to know their email IDs from the table name that is employee where since I want to know the employees whose name starts with a so I’ll write employee name like now to use the pattern is within single course I’ll write a and Then followed by percentage now this means the employee name should have an e in the beginning and percentage suggest it can have any other letter following a but in the beginning or the starting should be a if I run this so there is an error here the name of the table is employees and not employee let’s run this again there you go you can see there are 16 employees in our table whose name starts with a you can see this column employee name all of them have a letter A in the beginning okay now let me just copy this command or the query I’ll paste it here let’s say this time you want to know the employees whose name starts with s so instead of a I’ll write s so this means the starting letter should be S and followed by it can have any other letter if I run this so there are 10 employees in the table whose name starts with s okay let’s copy the query again and this time I want to know the employees whose name ends with d now the way to do it is instead of a percentage I’ll write this time percentage D which means at the beginning it can have any letter but the last letter in the string or in the name should be ending with d now let me copy and run this so there are 13 employees in the table whose name ends with a d you can see it here all right now let’s say you want to find the employees whose name contains ish or have ish in their names so the way to do is something like this so I’ll copy this now here instead of a percentage I’ll replace this with percentage ish percentage now this means that in the beginning it can have any letter and towards the end also it can have any letter but this is message should appear within the name let me run and show it to you okay so there is one employee whose name contains ish you can see here there is an ish in the last name of the employee all right now suppose you want to find the employee name which has U as the second letter it can have any letter in the beginning but the second letter of the employee name should have U now the way to do is I’ll copy this and instead of a% I’ll write underscore U followed by percent now this underscore you can think of a blank that can take any one letter so the beginning can start with a B C D or any of the 26 alphabets we have then then it should contain you as the second letter followed by any other letter or letters let me run this okay so there are 10 employees in the table whose name has a u as the second letter you can see these okay now moving ahead let me show you how you can use basic SQL functions or inbuild functions so we’ll explore a few mathematical functions now so let’s say you want to find the total sum of salary for all the employees so for that you can use this sum function that is available in SQL so I’ll write sum and inside the sum function I’ll give my colum name that is salary from my table name that is employees let’s see the result this will return one unique value there you go now this is the total salary since the value is very large it has given in terms of E now one thing to note here is if you see the output the column says sum real so this output column is not really readable so SQL has a method which can fix this that is called an alas so since we are doing an operation of summing the salary column we can give an alas to this operation by using the as keyword so if I write sum of salary as let’s say total salary then this becomes my output column you can see the difference if I run this okay you can see now in the output we have the total salary now this is much more readable than the previous one so this is a feature in Excel where you can use or give alas names to your columns or your results now similarly let’s say you want to find the average of salary for all the employees now SQL has a function called AVG which calculates the mean or the average salary if I write AVG and I can edit my alas name as well let’s see I’ll write mean salary let’s run it you can see the average salary for all the employees it’s around $81,000 okay now there are two more important functions that SQL provides us which is Max and minimum so if I write select maximum or Max which is the function name of salary as let’s say instead of total I’ll write maximum so this will return me the maximum salary of the employee let’s run it and see what is the maximum salary that is present in the salary column all right so we have 9,616 as highest salary of one of the employees similarly you can use the minan function as well I’ll just write minimum and this will return me the minimum salary of one of the employees in the table I’ll replace the alas name as minimum okay now run it this will give me the minimum salary that is present in our table so it is $ 4,685 okay now let’s say you want to find the count of Department in the employees table you can use the count function so if I write select count let’s say I want to know the distinct Department names so I can write inside the count function distinct Department as total departments from employees let’s run this this will return me the total number of departments that are there so it gives me there are departments okay now let me show you one more thing here if I write select Department from employees let’s run this okay so it has returned me 150 rows of information but what I’m going to do is I’ll place my distinct keyword here just before the call name so that I can verify how many departments are there in total there you go so there are 13 departments and one of them is null so moving ahead we’ll replace this null with a department Name by updating a table okay so now let’s update our department column so what we are going to do is wherever the department has a null value we are going to assign a new Department called analytics so earlier we have also learned how to use the update command so I’m going to show it again so we’ll write update followed by the table name that is employees I’m going to set my column that is Department equal to within single codes my name of the department would be Analytics where department is I’ll say null so wherever the department has a null value we’ll replace those information with Department that is analytics let’s run this you can see query returned successfully now let’s say I’ll run this command again and this time you can see the difference there you go so we have 13 rows of information and there is no null department now we have added a new department that is analytics okay now we are going to explore two more crucial commands or Clauses in SQL that is Group by and having so let’s learn how Group by Clause Works in post SQL so the group by statement groups rows that have the same values into summary rules for example you can find the average salary of employees in each country or city or department so the group by Clause is used in collaboration with the select statement to arrange identical data into groups so suppose you want to find the average salary of the employees based on countries you can use the group by Clause so let me show you how to do it I’ll write select I want the countries and the average salary for each country so I’ll use the average function that is AVG and inside the function I’ll pass my column that is salary I’ll give an alas name as let’s say average uncore salary from my table name that is employees next I’m going to use my group by Clause so I’ll write Group by since I want to find the average salary for each country so I’ll write Group by country name let’s give a semicolon and let me run it I’ll use F5 there you go so here on the left you can see the country names we have Israel Russia Australia United States France and Germany and on the right the second column you can see the average salary for each of these countries now you can also order the result in whichever way you want suppose you want to arrange the results based on the average salary so you can use the order by Clause after the group by Clause so I’ll write order by here you can use the alas name that is average salary this is actually average uncore salary and let’s say I want to arrange it in descending order so I’ll write DSC now let’s run this you can mark the difference in the average salary column there you go so as per our result in United States the average salary is the highest and if I scroll down the average salary is the lowest in Germany now let’s see one more example using group buy suppose this time you want to find the maximum salary of male and female employees you can do that too so let me show you how to do it so I’ll write select this time we want to find the max salary based on gender so I’ll select my gender column comma and this time I’ll use my Max function since I want to find the maximum salary for male and female employees I’ll give an alas name as maximum underscore salary from my table that is employees Group by I’ll write gender okay so let’s run this there you go you can see so one of the female employees had a highest salary of $11 19,618 while of that of a me was $ 17,6 54 all right now suppose you want to find the count of employees based on each country you can use the count function along with the group by Clause so so I’ll write the select statement select since I want to count the employees based on each country so I’ll first select my country column and then I’m going to use the count function I’ll write count empore ID from my table name that is employees I’m going to group it by country so this query will give me the total number of employees from each country you can see here Israel there are four employees in Australia there are four employees in Russia we have 80 employees in France there were 31 in United States we have 27 so on and so forth let me scroll down okay now it’s time to explore one more Clause a very important Clause that is used in post SQL that is having so the having Clause works like the wear Clause the difference is that wear Clause cannot be used with aggregate functions the having Clause is used with a group by Clause to return those rows that meet a condition so suppose you want to find the countries in which the average salary is greater than $80,000 so you can use the group by clause and the having Clause to get the result so I’ll write my select statement as select country comma I want the average salary so I’ll write AVG of salary I can give an alas name as average salary from employees now I’m going to group it by each country so Group by country column since I want to find the countries in which the average salary is greater than 80,000 so I’ll use having Clause after the group by Clause I’ll write having average of salary is greater than 880,000 now this condition cannot be specified in the wear Clause so we need a having Clause you cannot use aggregate functions along with wear Clause let me just run it now there you go so we have Russia and United States where the average salary is greater than $80,000 all right now let’s say you want to find the count of employees in each country where there are less than 30 employees so for this I’m going to use the account function first let me select the country column then I’m going to use the count function and in the count function I’m going to pass my employee ID so that we can count the number of employees from my table that is employees now if you want you can use an alas name for this as well but I’m just skipping it for the time being I’ll write Group by country next I’ll write having count of employee ID less than 30 so this will return me the countries in which there are less than 30 employees let’s run it you can see here Israel Australia United States and Germany are the countries in which there are less than 30 employees okay now if you want you can use the order by Clause as well so suppose I’ll write here order by count of employee ID so what this will do is it will arrange my result in ascending order of employee ID count there you can see we have successfully arranged our result in ascending order of employee IDs okay next we are going to explore one more feature of post SQL that is of using a case statement now in post SQL the case expression is same as IFL statement in any other programming language it allows you to add ifls logic to the query to form a powerful query now let me just scroll down and I’ll show you how to use a case statement this is very similar to your eel statement that you use on Excel in C++ in Python and or any other programming language so what I’m going to do is I’m going to write a SQL query that will create a new column and the name of the column would be let’s say salary range so I’m going to divide my salary suppose if the salary is greater than $45,000 and if it’s less than $55,000 in the new column that is salary range we are going to assign a value low salary now if the salary is greater than $55,000 and if it is less than $80,000 we are going to assign a value that is medium salary if the salary is greater than $80,000 we’ll assign a value High salary so all this we are going to do using our case expression in post SQL so I’ll start with my select statement but before that let me show you how to write a comment in post SQL so you can write a comment by giving a Double Dash comments are very helpful because they make your codes or the scripts readable I’ll write case expression in postc SQL similarly if you want you can go to the top and let’s say here you can write with Double Dash having Clause okay let’s come down so I’ll write my select statement as select I want the department the country and the salary column I’ll give a comma and I’ll start with my case statement I’ll write case when my salary is greater than 45,000 and my salary is less than 55,000 then the result would be B within single codes I’ll write low salary so this is exactly like an if else condition next I’ll write another case when salary is greater than 55,000 and salary is less than let’s say 80,000 then then the result would be medium salary and finally I’ll give my last condition that is when salary is greater than 80,000 then the result will be high salary let me write this in a single line then High salary now one thing to remember in post SQL the codes are insensitive so you can write your select statement in capital in lower case or in sentence case similarly I can write case as small C or you can write as capital c all right now moving ahead after this I’m going to write end I’ll give an alas name as salary range now this is going to be my new column in the output let me just come down after this we need to give our table name from employees I’ll order it by salary descending okay so what I’m going to do here is I’ll first select Department country and salary column from my employees table and then I’m creating a new column that is salary range and I’m specifying the range so I have three conditions here for low salary for medium salary and high salary so let’s run this and see the output there you go here you can see we have added a new column known as salary range and we have order our salary in descending order so all the highest salaries appear at the Top If I just scroll down you can see we have medium salaries here and if I scroll down further you can see these low salaries so case statements are really useful when you want to create a new column based on some conditions in the existing table all right now moving ahead we are now going to see how to write subqueries in post SQL so subqueries we write a query inside another query which is also known as nested query so suppose we want to find the employee name Department country and salary of those employees whose salary is greater than the average salary so in such cases you can use subqueries now let me show you how to write a query inside another query first I’ll write the select statement I’m going to select the employee name comma I want the department comma also want to display the country name and the salary from the employees table where my salary should be greater than the average salary so after this be salary greater than I’m going to use brackets and write my subquery that is Select average salary from employees now let me break it down for you so first we are going to select the average salary from the employees so this particular SQL statement will find the average salary from the table we’ll compare this average salary with salaries of all the employees so whichever employee has the salary greater than the average salary will display their names the department country and their original salary so if you want you can run this statement as well let me select this statement and run it for you you can see we have returned the average salary of all the employees which is nearly $81,400 employees whose average salary or whose salary is greater than the average salary all right now moving ahead this time I’m going to tell you how to use some inbuilt functions we’ll learn some inbuilt mathematical functions and string functions that are available in postris SQL so I’ll just give a comment there’s another way to write a comment instead of a Double Dash you can use the forward slash an asteris and inside the asteris you can write let’s say SQL functions and you need to close this so I’ll give another ASX and a forward slash so this is also a comment in postris SQL all right so first of all we’ll explore a few math functions so there is a function called ABS which is used to find the absolute of a value so if I write select abs of let’s say minus 100 it is going to return me positive 100 or just 100 because as you know the absolute of any value will remove the negative sign involved in that value there you go so our original input was – 100 the absolute of- 100 is + 100 next let’s see another function that is called greatest so the greatest function in postcript SQL will return the greatest number in a range of numbers so suppose I write select greatest inside the greatest function I’ll pass in a few numbers let’s say two I’m just randomly passing a few numbers let’s say 4 90 let’s say 56.5 and let’s say 70 I’ll give a semicolon let me run this you will see the greatest function will return the greatest integer value or greatest number that is present in the range of numbers that we have provided so in this case 90 was the largest number or the greatest number so we got the result as 90 again you can use an alas for each of these statements now like greatest we also have a function called least which is going to return the least number present in a range of numbers if I run this so the result is two because two is the least number that is present in this selection all right now there’s a function called mod which is going to return the remainder of a division so suppose I write select mod and this takes two parameters let’s say 54 divided 10 as you can guess the remainder is 4 and so is our result you can see it has return the remainder 54 divided by 10 the remainder is 4 all right if I scroll down now let’s see how to use the power function so I’ll write select power let’s say I want to know power 2 comma 3 which is 2 Cube that is 8 let me just run this there you go so the result is 8 you can also check let’s say power of 5 comma 3 it should be 125 all right next you can use the sqrt function that is available in post SQL to find the square root of a number I’ll write sqrt and let’s say I want to find the square root of 100 you can get guess the result the output should be 10 if I run this you can see the output here 10 let’s say I want to find the square root of let’s say 144 you can again guess the result it should be 12 let’s verify it okay there is some error let me verify it again there you go it is 12 now there are a few trigonometric functions as well you can use the S function the COS function and the tan function let’s say I want to know the sign of 0 if you have studied High School mathematics you would know the sign of 0 is 0 you can see the result it is zero let’s say you want to know s 90 if I run it you can see the output here 89 all right now there are other functions like C and floor that you can use so let me show you what the ceiling and floor function does I’ll write seiling let’s say I’ll pass my floating value as 6.45 and let me run it you can see the ceiling function Returns the next highest integer that is seven in this case since the next highest integer after 6.45 is 7 let’s see what the floor function does and let me run it as you can see the floor function Returns the next lowest integer that is six in this case or the nearest lowest integer to any provided decimal value okay now that we saw how to use mathematical functions there are a few string functions available in postr SQL so let’s explore them as well I’ll write string functions okay we scroll down cool there’s a function called character length that gives you the length of a text string suppose I write select give the function as character length and inside this function I’m going to pass in a text let’s say India is a democracy this is my text let me run this okay you can see the result here which is 20 since there are 20 characters in my string that I have provided all right now there’s another function called concat in po SQL so concat is basically used to merge or combine multiple strings so I’ll write select concat within brackets I’ll give the text string now let’s say I want to combine postest SQL I’ll give a space comma I want to merge post SQL is I’ll give another comma and write my final word that is interesting now what we have done is inside the concat function we have passed in separate strings and now using the concat function we want to merge the three strings let’s see what the result is I’ll run it all right let me just expand this you can see here we have concatenated the three string successfully so the output is post SQL is interesting okay now there are functions like left right and mid in postc SQL so what the left function does is it will extract the number of characters that you specify from the left of a string let’s say I’ll write select left and I’ll pass in my text string as India a democracy I’ll copy this and I’ll paste it here let’s say I want to extract the first five characters from my string so I’ll give five so what it will do is it will count five characters from left so 1 2 3 4 and five if I run this it should ideally print India for me there you go it has printed India for us all right similarly you can use the right function to extract few characters from the right of a string let’s say you want to extract let’s say I’ll give 12 characters from right so from here onwards it will count 12 characters I’ll change left to right now let me select this and run it so you can see here this is the output from the right it has counted 12 characters and returned a democracy okay now there is a function called repeat so the repeat function is going to repeat a particular string the number of times you specify let’s say I want to select and use my repeat function and inside the repeat function I’m going to pass in let’s say India and I want India to be displayed five times I’ll give a semicolon and run it in the output you can see India has been printed five times okay let’s scroll down there is another function a string function in postc equl called as reverse so what reverse function is going to do is it is going to print any string passed as an input in reverse order so if I write select reverse and inside the reverse function I’ll pass in my string that is India is is a democracy I’m going to use the same string I’ll copy this and I’ll paste it here I close the codes and the brackets let’s print this you can see it here India is a democracy has been printed in reverse order there you go all right now this time we explored a few inbuilt functions that are already present in postris SQL now post SQL also has the feature where you can write your own user defined functions so now we will learn how to write a function of Our Own in post SQL so let’s create a function to count the total number of email IDs that are present in our employees table so for this we’ll write a function a user defined function so let me give my comment as user defined function okay so let me start by first writing create so this is the syntax to write a function in post SQL so I’ll write create or replace function then I’ll give my function name as count emails and as you know functions have brackets then I’ll write Returns the return type as integer then an alas with dollar symbol I’ll write total emails since I’m going to display the total number of email IDs that are present in my table I’ll close the dollar symbol then I’m going to declare a variable the variable name is going to be total underscore emails this is of type integer I’ll write begin and inside begin I’ll write my select statement so I’ll write select I want to count the email IDs that are present so I’ll pass my call column name that is email into total emails from my table name that is employees I’ll give a semicolon and then we’ll write return total emails as you know user defined functions often return a value so hence we have mentioned the return statement as well and now I’m going to end my function then the next syntax would be let me just scroll down okay so here I’ll give my dollar symbol again followed by total underscore emails next I’ll write my language as post SQL so the way to mention is PL p g SQL let’s give a semicolon and end it so this is my user defined function that I have written so I created a function with the function name countor emails and this would return integer as an alas which is total _ emails we declared that variable as an integer then we started with a begin statement that has my select statement where I’m selecting the count of email IDs that are present in the employees table and I am putting the value into total _ email so I’ve have used the into keyword and this Returns the result as total _ emails and I have ended let’s run this okay there is some problem there is an typo so this should be integer okay let me run it once again there you go so youve successfully created a user defined function now the final step is to call that function now to call this function I’m going to use my select statement and the function name that is countor emails I’ll give a semicolon let’s execute this there you go so here you can see there are 134 email IDs present in our employees table now one thing to Mark is there are total 150 employees in the table but out of them 134 employees have email IDs the rest of them don’t have so they would ideally have null values all right so that brings us to the end of this demo session on post SQL tutorial let me go to the top we have explored a lot so we started with checking the version of post SQL then we saw how to perform basic mathematical operation that is to add subtract multiply then we saw how to create a table that was movies we inserted a few records to our movies table then we used our select Clause we updated a few values then we deleted one row of information then we learned how to use the we Clause we learned how to use the between operator we also learned how to use the in Operator Let Me scroll down we created a table called employees and then we learned how the distinct keyword works we also learned how to use isnull with wear Clause we learned about the order by Clause we saw how to alter or rename a column then we explored a few more examples on wear Clause we learned about and and R operator then we learned how to use limit and offset as well as the fetch operator or the fetch keyword in postr SQL moving further we learned about the like operator in SQL which was used to perform pattern recognition or pattern matching you can say here we saw how to use basic inbuilt post SQL functions like sum average minimum count maximum next we saw how to update a value in a column using post SQL update command we learned how to use Group by then we learned how to use having Clause then we learned how to use case expressions in post SQL so we saw how case expression is similar to our ifls in any other programming language we explored a few mathematical and string functions and finally we wrote Our Own user defined function so that brings us to the end of this tutorial on postris SQL now if you want to get this SQL file that we have used in the demo you can give your email IDs in the comment section and our team will share this SQL file with you over email so what exactly is a CT you ask now if you are a beginner in sequence let’s say you wanted to Club two different tables or more different tables maybe three or four right so you will be using one keyword which is join right and let’s say you may have to create a query in such a way that you have to Club different tables and you have to extract the results from one table into another and finally create a output table right so this might be sounding a little too complex so basically Al what CTE does is it acts as a temporary table right now you can write a query and save it as a CTE right and that particular resultant table from CTE will not be created but will be in the memory as a temporary data or an intermediate resultant data right now whenever you want to use a join or whenever you want to use the same query inside a bracket or inside something in your query you can just simply use the name of the CT and then the data you require the columns you require and done you will get the data now this might be a little too complicated to understand in just m words now let’s just go through the formal definition of what exactly is a CTE and what it does and then let’s quickly so it’s a little too complicated to understand it just with m words so let’s get started with a practice iCal examples but before that let’s understand a formal definition of what exactly is a CTN SQL there CTE also known as or also called as the Common Table expression or some people also are used to call it as a width expression so the keyword is withth so a comma table expression in SQL is a temporary result set that you can Define within a query as I said it helps to break down the complex queries make the the code more readable and allows you to reuse the result set multiple times within the same query just you need to use the name of the CTE in the places where you want in your query and it reduces the code length as well as the execution time now CTE are defined using the width keyword as we discussed before followed by the CTE name right so for every uh column name or anything in your set you give a name right similarly when you are using CTE in SQL you also need to give a name to the CTE and that particular name will be used in your subquery positions that will reduce the query length and execution time so you should be giving a name and the query that generates the result set the CTE is available only during the execution of that particular query or specific query right so as I said the CTE table the resultant table which is created while you are using the CTE will not be created as a permanent table in the datab base right it will be a temporary or intermediate result which will be active as long as your current query which is using the CTE is active now let’s go to the demonstration mode then we will try to create some simple queries right and and we will understand how exactly a CT can be beneficial in those situations now let’s go to the MySQL workbench this is my my SQL workbench so I have a lot of tables here we have the credit card data set we have the sakila data set SLP data set says Superstar World Etc right so we will be using the Superstar data set uh I mean the database so firstly we need to write in the query which means that I am going to use the superstore database right sorry Superstar so I uh prefer using uh smaller case or lower case for database names and column names and uh uppercase for the keywords for example here use is uppercase and Superstar is the name of the keyword right so uh that is for uh identifying or easy readability which is a keyword and which is a name right so let’s execute this query and have access to the superstore data set and in Superstore data set I have one table called as Excel data now let’s quickly check what we have in Excel data select star from Excel data here we have R the order right ID audit dat ship mode customer everything right so we have region we have uh sales quantity discount profit rate so we have a number of possibilities and number of reports that we can generate but let’s try to keep it simple let’s try to find out unique regions right select unique of regions right or just regions Exel data Group by regions so let’s quickly execute this statement and see the output or maybe we can make some modifications to it right instead of that you might want to use you might want to use distinct function so that you don’t get all 10,000 plus rows so basically uh this particular data set has about 10,000 or more rows in it and it’s not real it’s completely made up report using artificial intelligence so we use chat GPT to create 10,000 rows of data for 30 years maybe from 2000 or 2001 to up to 2030 or 31 right so we don’t want all those 10,000 plus columns sorry rows so let’s use a distinct here and uh try to exit this statement so that we get U five uh of uh the regions what we have so there you go we have um five regions as expected NorthEast Southwest and Central now you can also uh select uh kind of maybe average of sales uh maximum of sales and total sales so so uh this is bringing us somewhere we can you know try to find out regionwise sales right so region wise sales Group by years like 30 years what was the sale happened in the year 2021 sorry 2001 right we begin from 2000 or 2001 to all the way up to 2030 or 2031 right so we can see if there is an increase in the year on-ear sales a decrease in the year on year sales you can identify the best performing eror the worst performing eror right so uh this sounds like a good use case now let’s go to the code where I’ve written it as a CTE and understand the workflow so here I have uh named my CTE as sales CTE so I’m starting it with the keyword with right so with sales CTE as now this is our query right what am I doing I’m extracting so according to the data set we have the date right so the date is year month and date of that particular day right but we want just the year so we’re using the ear function to extract the Year from the date as sales ear region uh sum of sales right we want you to find out the total sales happened in that particular year as total sales from the data set Excel data and appr by a right we wanted it in increasing order so 2000 to all the way up to 230 or 20031 correct so that’s how it is and I’m saving all this as a CTE named as sales CTE now I want to select some parts of that particular CTE so I want to select sales air region total sales from sales CD which is right over here and and order it in form of sales and region right now let’s try to copy this code and run this in our workbench right now let’s Okay let me close this quickly so that we have a complete view of the code right so let’s now select all the code so now we have selected this particular code let’s try to run this and see the output there you go we okay we have the output but there is something wrong we did not get uh the ears right so all the 30 years of data here it is Group by region which is fine okay we don’t want region we want to group it by ear that’s okay and the thing is we need to fix this particular year so maybe there is something wrong with the ud date right so I think uh the database has saved this particular oh okay okay since this is generated by chart jpd maybe the data type of the date is other than data right other than date data type it may be string now we might have to do some type casting to change the data type of the audit date and let’s quickly do that so now we have updated the ear so so what we have done is just uh cast here right so we have changed the string type of date to the normal date which follows by year month and date so this is a simple type casting that you can do and uh rest everything and we’ve also added a where condition so where uh date is not added or date is equals to null then you can just uh ignore that and uh now let’s try to execute this query and we have also removed that region thing right Group by region or order by region so we want that to be ordered according to the uh year which should start with uh 2001 or the first ever year at to the last ever year according to dat set now let’s select the entire CT query and run that and check our outputs so there you go so you have the year on-ear sales from 2001 to all the way up to the year 2030 and 2031 right so that’s how the CTE or common uh table expressions and SQL or the width query in SQL can be used so welcome to the demo part of the SQL project so in this we will do digital music store analysis okay so this SQL Pro is for the beginners so what you will learn from this uh project main thing is like so what’s the objective of this project this particular project so this project is for beginners and we’ll teach you how to analyze the music playlist database and you can examine the data set with SQL and help the store understand its business growth by answering simple questions so as you can see I will show you so I have three set of questions first one is easy okay and the second one is moderate and the third one is advanced level so we have three set of questions easy set moderate and the Advan okay so every set is of three three questions I guess yes in every set there is three three questions so okay in easy one there are five so we have 5 + 3 8 8 + 3 11 we have 11 questions to solve okay from this you will understand how you can you know analyze data with SQL how you can extract something from database how you can store something like this okay so and one more thing I will show you the schema of the particular uh data set which we will you know soon we will will restore so we have the tables in this artist album track media type genre invoice line invoice customer employee ID playlist playlist track and all okay so this is the music playlist database schema so without any further Ado let me create one database so here just right click create database okay here I will write music okay and save so now our database is created okay so if you will go to schema and if you go to tables there is no tables in it means there is no there is database but nothing is there the database is empty so now what I will do just go to your database just right click here you can see the restore option okay restore so format as it is then here file name go to this music store database I will put this database Link in the description box below don’t worry open then restore process started process complete some people will face this uh that the process is failed or something okay so for that what you have to do just go to file then preferences here you have to set the path just binary path okay see I am using the 15th version okay so I have set the path here also and this also but you have to set this path is important okay if you will not set this edv Advanced server path it’s fine but this part is most important okay but for the future reference I have added on the both what you have to do you have to just see where you will find this path just go to this PC then OS then program files here you will find this post G SQL then I’m using C5 15 then bin so you have to copy this path right you have to just copy it and paste it here and then select this one after that just save you won’t find any fail thing okay the process will complete right so now let’s move forward and see the tables okay it’s still empty while just refresh it see now you can see all the you know columns in my tables okay so what I will do for the checking I will run one query here okay let me close it okay I will write here select star from album okay let me run it so now as you can see here my table is working fine everything seems good okay so now what we will do we will solve question one by one okay so the first question let’s see the first question easy one who is the senior most employee based on job title okay who is the senior most okay so I will write here like the first question is who is the Senor most employee based on job title okay so this is our question so we know we have the table name called employer so we will select that table first so I will write here Select Staff from employee so you should know uh which table should to select okay so here as you can see in this question there is you know t uh word employee who is the senior most employee based on the job title most employee means means employees and employee table right so I will run it okay so what I will do I will just select this and run it okay so now you can see there in employer there is employee ID last name first name title report levels bu date higher date and all the details of the particular okay so we will do so there is one more thing you can see the levels okay level one level two so we have to who is the senior most employee based on the job title so what I will do I will write here order by levels and decreasing order okay so first I will do so now you can see the levels are in the descending order from senior to this okay L7 to L L1 so what we want we want only one uh employee name so what I will write here limit is one okay I will copy this and done it okay so now you can see the last name is Madan moan sorry moan Madan this is last name this is first name so Mohan Manan is the senior most employee based on the job de so question first is done so the second question is which countries have the most invoices okay first I will write down the question which country has the have the most have the most invoices okay so for this what we have to do see just first check first we have to check from which table you know we will get the solution so here you can see the word invoices Okay so we have one table invoice and invoice line we have to select it from this okay so I will write here Select Staff from invoice okay so we have customer ID invoice date billing address billing city billing State and everything so here you can see we have the billing country as well okay because we need the country name so we will take this column right so I will write here so I will write select we change it select count star from the select star billing country from invoice Group by billing country so why I’m doing this group by because as you can see uh we have USA multiple times USA USA USA then Canada also and the other countries as well so from this I will get only the one okay I will group them and I will get the one fine so from this we will uh get the count so after this I I will write order by so here I will write see see descending okay let me run it so now you can see the billing this is the billing order okay so or you can see the on voices USA got the 131 and Canada 76 Brazil 61 if I will write here again the limit one what I will get see USA we got the USA so us is the country which have the most invoices okay if you will remove this limit so you will get the other country as well second in Canada third is Brazil and like this okay and the third question is what are the top three values of total invoices okay again we need the same table okay first I will write the question third question is what are are the top three values of total invoices top three value of total invoices okay I know I can just solve this question by the second one but I want to do it from the starting okay so first I will take select stuff from invoice let me run it so first we will sort the data here I will write order by total because the last you know this is a table name okay total and the descending order so first I will select so we need just the top three so first I will do everyone know limit three okay okay so here I have done I have wrote this star that is why it’s giving me the all the values if I want this only this value so I can write select total from invoice order by this okay I will say and to run it so I have this total like 23.75 999 and 19.8 and 19.8 so these are the top three values of total invoices okay so here the fourth question is which city has the best customer we would like to throw a promotional music festival in the city we made the most money write a query that returns one city that has the highest sum of invoice Total return both the city name and some of the all invoice total so let me write the question first okay I’m writing question for you know your better understanding okay question fourth which city has the best customers we would like to throw a party uh promotional promotional music festival in the city we made the most money we made the most money write a query that it does one city that has the highest sum of invoices has sum of invoices total both the city name and sum of all invoices okay so we have this question okay so which city has the best customer we would like to throw a promotional music festival in the city we made the most money write a query that returns one city that has the highest sum of invoice Total return both the city name and the sum of all the invoices okay so first what we will do we will select select stuff from invoice okay sorry we select this okay so first we will select the billing city we have to focus on this and the total in this this two table we have to just focus on okay so here I will write some of total as invoice total comma billing city from invoice so this time we will do group by pilling City because we need the city names uh then I will addite order by invoice total and the descending order seems good select some total as invoice Total Building City from invoice building okay so let me select this so as you can see the highest billing city is parag Prague and the best customer is from the parag city okay so this city has the best customer obviously parag pragu or sorry for the you know mispronunciation okay so this is how we have solved our fourth question as well okay because WR both the city name and the sum of all the inv you know these is the city names and then inv voice total okay moving forward to our fifth question which is again the long one who is the best customer the customer who has spent the most money will be the declared the best customer write a query that Returns the person who has spent the most money okay so I will write here who is the best customer the customer who has spent
the most money will be declare the best customer so write a query right that returns that Returns the person who has spent most money okay yeah so who is the best customer the customer who has spent the most money will be declared as the best write a query that Returns the person who has spent the most money okay so for this we have to take this customer Data customer table data okay so I will write as select stuff from customer okay I will select this and I will run it okay so this is our know data table data of customer okay so we have the country facts emails state city address last name first name okay so as you can see there is nothing uh like no detail of invoice or the money okay which have spent by the customer so what we will do we will look at our schema so now what we can do if we can’t solve a particular question from with one table we have to you know join the table to the other table so here we have to join customer table to invoice table so in this you can see there is customer ID and here also customer ID so on the basis of customer ID we can join the join both the table and with the help of this total we will sort out the uh that guy okay that customer right so for this I will write here select customer Dot customer ID comma customer Dot first name comma customer dot last name because we need need the full name of that guy comma sum invoice do total as total okay okay let me can P okay I don’t need the search pad right then I will write from customer okay my bad then join invoice on customer Dot customer ID equ alss to invoice do customer ID then I need Group by okay Group by customer Dot customer ID after this uh uh let me order it by the descending order so the most you know spend customer will come up so I will write here order order by total the descending order then limit equals to one fine let me run it let’s see what output should okay okay some error is coming okay sorry okay so as you can see the customer ID is five first name is R the last name is m our m is spent the highest value 14454 0 and two so who is the best customer M our mad sorry my bad our ma right our mother has spent the most money okay so this is how we are done with our easy set of questions now let’s jump into the moderate one okay so let me write the question first for the moderate so I will write here moderate questions so these analytics skill help you in the data analytics to become a data analyst or to become a data scientist okay so the question first is write query to return okay write query to return the email qu to return the email comma first name first name comma last name and genre of all do music list okay then return your list ordered alphabetically by email starting with a okay let yeah so for this okay let me open this first yeah okay fine so first what I will do so now in this question as you can see we need the we have to return the email first name and the last name and the J of all rock music listeners so if you will see select stuff from customer okay let we run this and if I will see there is no column name genre okay if I will show you the schema of this see the genre is here and the customers is here okay we need the first name last name and the email ID and the genre okay and the genre is Will should be Rock okay so what I can do I can connect this genre with track that because here is also track ID and here is also track ID then track ID to invoice line then invoice line to invoice then invoice to customer with the customer ID okay this pattern I have to follow right so for this I will write select just copy this okay just follow the steps select distinct email comma first name comma last name from customer join invo voice on customer Dot customer ID equals to invoice do customer uncore ID then join invoice underscore line on invoice do invoice ID okay then invoice _ ID then where check ID should be in here I will do select track ID okay from track then join then join genre yeah on track dot genre dot ID equ to genre dot genre ID where this is important genre name like rock because we need as you can see right a quy to return this is this and genre of all rock music listeners okay rock right then order by email okay before that let me show you this track okay select star from track okay let me show you this table you can see the name the track ID album ID Media type genre ID okay then the composer this this this bites and the unit price right okay so you know this we have then this customer okay invoice ID we to right fine so now what I will do I will just select this and okay invoice ID okay inv voice it is ambiguous here I have to write invoice line do invoice ID okay let me now run let me run it okay one more join genre on track. genre ID there is entry for table genre but it cannot be referenced from this part of the query okay okay as you can see the have the table name is Jore ID that was the mistake okay one more JRA do name spelling mistake sorry my bad guys no shes it happens okay now you can see we have all the people who love rock music and we have the email then first name then the last name see Adan Mitchell Alexandra Roa a grber like this cam Dan Edward like this okay so there are total 59 people who loves rock music from this particular database okay now question two question two is let’s Okay first let me show you let’s invite the artists who have written the most rock music data set write a query that Returns the artist name and the total track count on the top 10 rock bands okay so let’s invite the artist who have written the most rock music in our data set so write write a query that Returns the art this name and and the total count of track count of top 10 rock band so now what we need here okay let me do this so what we need here so let’s invite the artists who have the written the most rock music first we need the artist okay and the second is rock music then we need track okay and the total count total track count means we will get from the track so here we have track column track uh table and we have the artist so now let’s see the schema part so we have we need genre okay for the you know uh rock music then we have to combine this with the track ID because JN R is there from track ID to album because we need the artist name see artist ID and artist ID so this is how we have to connect the table now so for this I will write here just follow the steps select artist dot artist ID comma artist dot name comma count artist. artist ID as number of songs because we need the total number okay who have written the most rock music number of songs find from track now we have to join album on on album do album ID equals to track dot album ID okay then we have to join the artist with artist ID so join artist column on the basis of artist AR equals to then album to album. artist ID okay so here I have joined the artist to the album colum table okay then I have to join johra to the track table with the track table okay so here I will write join genre on genre do genre ID equ to JRA ID okay sorry track ID track. J okay so here I will write where where genre dot name name like shock okay rock fine then I will Group by my B group by artist do artist ID I need the ID as well then order by order by number of songs the descending then limit I need only 10 rock bands limit will be 10 let me run it okay let me run it okay album okay now let me run it okay now you can see this guy let zeppin AR side is 22 and wrote the most songs 144 then U2 122 d purple 92 then then this this this then this okay so this is how we solved our second question right so now the third question okay return all the track names names that have a song Length longer than the average song Length return return the name and the millisecond of the each track order by the song Length okay so first I will write this question Q3 so return all the track names that have a song Length longer than the the average song length Okay as we return all the track names that have the song Length longer than the average song Length okay then return the name and milliseconds for each track fine after that order by them order by the songs with the longest s listed first okay fine so we have to return all the track names that have song length and the okay first we will find the total length of this songs then we will do the where then we will put the V Clause to find out the particular uh longest song okay so this is this we’ll do in the two you know step first you will find the average strike length Okay so I will write here select select name comma millisecond okay from track where milliseconds here I will write select average from the millisecond okay then I will write here as average track length Okay then here I will write from track after this I will write here order by milliseconds I need in the descending order okay so let me run it so now you can see see first I will uh read it again so return return all the track names that have a song Length longer than the average song Length return the name and the millisecond for each Strat order by the song with the longest song listed first okay so this is the longest song okay so we have all the songs which are the longer than the average song Length right so now moving forward we have jumped into the advanced set of questions okay so now we will do the advanced questions okay so let’s see first find how much amount spent by each customer on artist write a query to return customer name artist name and total spense okay so first we will write down the questions okay then question one question one okay find how much amount spent by each customer on art is just write a query to return customer name comma artist name comma total spend and total spent okay so how to solve this so first find which artist has earned the most according to the invoice lines okay first uh let me show you the schema okay we need the artist name we need the customer name and we need the total spend okay with the invoice line because the quantity should be there okay so first we’ll see how to join these three table artist table customer invoice and invoice line like this okay this is how we will you know join the table fine so now I will tell you the you know steps so first find which artist has earned the most according to the invoice line okay the second now use the artist to find which customer spend the most on the artist so for this query you will uh be need to use the invoice invoice line track customer album and the artist table so just remember this one is tricky because the total spent in the invoice table right let me show you so total spent on the invoice table might not be a single product so that is why I was saying we need the quantity so we need the invoice line table to find out how many each product was purchased then we have to multiply this by the price of each artist okay fine so now so this is the lengthy one I will just you know write it for you and get back to you yeah so this is how you can see okay Group by five I have wrote this you can just you know write it okay like this okay we took artist name then sum of invoice line unit price into invoice line the quantity that I showed you okay we have multiply this total with the quantity okay then we join the table track with invoice album with track artist with album okay so now let’s run it yeah so now you can see this H or queen amount spent 27 the customer ID is this okay then Nicholas scer then 18 okay we have the the everything okay customer name artist name and the total spend this is the customer Name the artist name and the total amount this spent fine so now let’s move forward to the next one which is okay okay yeah so the second one is this we want to find out the most popular music genre for each country we determine the most popular genre as the genre with the highest amount of purchase so write a query that returns each country along with the top genre for countries with the maximum number of purchases shared return all the genres okay so what I will do first I will write the question okay so we okay question two so find how the most popular music music genre for each country okay with the mine determine the most popular genre as the genre with the highest amount of purchase okay then write a query that returns each country along with the top genre for countries where the maximum number of purchase okay so so there are two parts in this question first the most popular music genre and the second is the need of data at the country level okay so we can do it from the two methods okay using CT and the using the recursive method so I will use the using City I will do this city so for that you have to write with popular genre as select count invoice line dot quantity okay as purchases comma customer dot country comma genre do name comma genre dot genre ID okay then here I will write row underscore number number then I will write over Partition by customer. country order by count voice line dot quantity okay into descending order as row number okay so from invoice line okay yeah so here I will join the tables join invoice on invoice do invoice ID equals to invoice line do invoice ID okay then again join customer on customer. customer ID do idore idals to invoice Dot customer ID fine then again we have to join track track on track. track ID equals to invoice uncore line Dot trackcore ID then join genre on genre. JRA ID okay then track do John Ry okay then I will do group by Group by 2 comma 3 comma 4 then I will do order by two then ascending order and then one to descending order okay okay then now I will write select star from popular genre where row number less than greater than one okay now let me run it so now you can see we have okay I will let me read so we have to find the most popular music genre for each country okay so now we have the Contin margentina the most popular is alternative and punk John R is this store number is this okay purchases this then the Australia this rock rock rock rock rock okay certain Rock USA Rock and everything is there right so this is how you can find the most popular music genre for each country okay the last question is the here now the last question is write a query that determines the customer that has spent the most on the music for each country write a query that Returns the country along Ong with the top customer and how much they spend okay for the countries where the top amount of spent is shared right and they provide all the customer who spend this amount okay so for this um this is like a similar to this question okay so there are two parts in this question find the most spent on music for each country and the second is the filter the data for the resp customer it’s very easy okay so okay I will write the solution okay you can check the question from there I’ll write customer with country as as uh I will here select customer do customer ID comma first name comma last name comma billing billing country comma sum should be total as total spending right then I will write zow number same over we have also written here now right the same we have to write here over then Partition by billing country order by by some total descending order as row number okay so after this I will write here from invoice you have to fetch then again the same thing we have to join the table join customer on customer Dot customer ID equals to invoice do customer ID okay then here I will write Group by by 1 comma 2 comma 3 comma 4 comma okay that’s it okay then I will write here order by four ASC ascending order comma five to descending order fine so now I will write here select start from customer with country where row number is one fine so let me run it see we have first name last name billing country total spting R number and the customer ID let me show you the question here write a query that determines okay let me make it okay yeah so write a query that determines the customer that has spent the most on the music so customer we have the customer name for each country write a query that result the country along with this so we have the country name with the top customer how much they spend we have the total spending for the countries where the top amount is shared provide all the customer who has customer who spent this amount okay so we have everything here right we have this Le from Brazil this this this this this okay with the customer so this is how you can solve these questions so till now I can say you have a good data analytics skills so for this I can say this will help you in the interview of data analyst in data science or any SQL okay picture this you are in the interview and the interviewer ask can you write a query to find the top five sales records you freeze for a moment thinking am I ready for this or not don’t worry SQL might sound complicated but it’s actually a super useful tool that lets you interact with databases have you ever wondered how all those apps and websites stores and organize their data well that’s where SQL comes in SQL which stands for structured query language is a universal language for talking to databases it’s super powerful and lets you do things like pull out specific information add new data update existing stuff or even delete things you don’t need it’s basically your magic key to manage huge amounts of information with ease exactly and if you’re aiming for a career in Tech whether it’s a database administrator data analyst or software developer SQL is a must-have skill databases are at the heart of almost every application so knowing SQL can unlock some really exciting opportunities now here’s the exciting part this video is your secret weapon to master SQL interviews we have packed it with 45 carefully chosen SQL interview questions that everything you need to raise those tough questions so we’ll be starting with the basics like how databases work and then diving into advanced query challenges and by the end you’ll be fully prepared to tackle any SQL question thrown at your way so let’s dive in and get you closer to your dream job so let’s get started so now let’s start with a SQL interview question we’ll cover every question starting from basic level to advanced level so now let’s look at our first question which is very basic what is SQL so we all know that SQL stands for structured query language and it is the language which is used talk to databases think of it like giving instructions to a computer system that stores and organizes data for example if you want to find out all the customers who ordered a specific product then SQL can help you do that with a simple command you can also use SQL to add new data like entering a new customer details into database if you want to update someone’s phone number SQL has got you covered or maybe you want to delete old records that are no longer needed SQL can handle that too here’s a quick example if you want to find all the customers in New York you could write something like select star from customers where city is equals to New York so we are using this command to find all the customers in New York remember if you want to find out all the specific data from the table you have to use the Star Command and if you want to add new customer you can just simply write in insert into customers name City and then you can also insert value name which is John or and you can just enter the specific location so SQL works the same way across many popular databases like my SQL post SQL or SQL Server which is why it’s such an important skill for anyone working with data so now let’s look at our second question which is what are the different types of SQL commands so SQL commands are like instructions you give it to a database to tell it what to do there are different types of commands and each one has a specific purpose so let me explain these in simple terms remember if an interviewer asks you such questions simply explain using the proper keywords and uh use proper definitions and you know easy language that’s it so the ddl command we have which stands for data definition language it basically defines the structure of the database for example if you want to create a table or if you want to alter a table or if you want to drop a table then we have DML which stands for data manipulation language it deals with the actual data in the database for example insert update delete all of these things then we have DCL which is data control language it will manage permissions and access control so if you want to manage permission or access control then you have to use this particular type of SQL command which is DCL which is Grant and Revo Grant will provide access rights and ROK is used to remove the access rights we also have TCL which stands for transaction control language it will manage transactions in the databases for example commit commit is used to save changes rback is used to undo changes and save point is used to create intermediate points in a transaction so for instance in a schema with customers table and an order table data definition language commands are used to Def find the table whereas data manipulation commands which is uh select insert and delete it is used to update customer or order data DCL is used to control access and TCL which is transaction control language is used to manage transactions that’s it it was very simple now let’s look at your third question which is what is a primary key in SQL a primary key in SQL is like a unique ID for each record in a table think of it as a way to ensure that no no two rows in a table have the same value remember that no two rows must have the same value it is also a rule that the primary key column can’t have empty or null value so these are the basic criteria for a key to be a primary key for example in a table of customers you have a column called customer ID as the primary key then each customer have a unique customer ID like 1 2 3 and so on this makes it easy to identify Y and retrieve specific customers from database here’s a simple example suppose we have created this table create table customers and we have given the customer ID as primary key we have given name as Vare the city name as well so the primary key will ensure that each customer ID is unique no duplicates are there no customer ID is left blank that is there should be no null values and primary keys are also important when linking tables together for example if you have orders table you can use the customer ID as a reference to connect each order to a specific customer this help maintain data Integrity across the databases now the fourth question is what is a foreign key a foreign key in SQL is like a connection or link between two tables it’s just like a field in one table that refers to the primary key in another table this creates a relationship between the tables and ensures that the data stays consistent for example let’s suppose you have two tables a customer table with a primary key called customer ID and a sales table with a field called customer ID which is a foreign key linking back to the customer ID in the customer table so here’s how it will look like we have created this table and you can see in this example that suppose we have customers and sales table so we have used this customer ID in our customer table as well so here customer ID is the foreign key and in the customer table the customer ID which we had used is the primary key so now let’s move on to our fifth question which is delete and trunk it command what is the difference between delete and truncate commands so delete and truncate commands in SQL both remove the data from a table but then they work in different ways let me break it down for you the delete command so what delete command will do is basically this is used when you want to remove specific rows from a table based on a condition for example if you want to delete all all the customers from a specific City then you have to use this delete command it will allow you to be selective but it’s slower because it logs each row deletion which also makes it possible to roll back the changes if needed if you’re using transactions moving on to trunade command trunk gate will remove all the rows from a table at once without allowing any condition for example if you just want to remove all the rows just in a one go you have to use this truncate command just simp write this query as trunk table suppose the table name is customers is much faster because it doesn’t log individual road deletions and it simply clears the entire table in one go however you can’t roll back a trunk operation in most databases once it’s done so the key differences is delete is for specific row trunk it is for clearing the entire table trunk it is much more faster because it uses fewer system resources delete can be rolled back if used within a transaction trunade usually cannot delete logs each rout deletion and trunade doesn’t so this was all for this trunade and delete commands so anytime if the interviewer asks you such questions just simply explain this now let’s move on to the sixth question which is what is a joint in SQL and what are its types so this is one of the most important question you’ll be getting to know in the interviews you’ll be asked such questions in the interviews so basically a chin and SQL is used to combine a data from two or more tables based on related column like a column key that links them together it’s just like connecting pieces of puzzles join help you see the bigger picture by merging related data for example if you have a customer table and a sales table you can use the join to see which customer placed which order by linking them through a common column such as the customer ID so you all know what a join and SQL and let’s discuss its type so joint types are basically there are four types of joint which is inner joint left joint right joint and full outer joint as well so what inner joint will do is it will combine rows from both tables where there is a match in the common column think of it as the overlapping section in the v diagram only rows that exist in both the tables are included the left join or we can call or the left outer joint it will retrieve all the rows from the left table and the matching rows from the right table remember the left joint will retrieve all the rows from the left table and only the matching rows from the right table if there’s no match the result includes null values for the right tables column think of it as including the entire left Circle in the vent diagram along with any matches in the right Circle right join or the right outer joint is similar to the left joint but it will retrieve all the rows from the right table and matching rows from the left table if there’s no match null values are included for the left table column think of it as including the entire right Circle in the vend diagram along with any matches on the left Circle then we have the full joint full joint will combine rows when there’s a match in the either table if no match is found it includes null or the missing values from the either table think of it as combining both circles in the v diagram everything from both tables are included now let’s move on to the seventh question which is what do you mean by a null value in SQL it’s very easy null value in SQL means that a column has no data it’s missing or unknown it’s not the same as an empty string or the number zero those represent actual value while null represent no value at all for example if you have a table of customers and one of the row doesn’t have a phone number then the phone number column for that row would be none see we have listed this in the table here you can see if you do not have any data just simply write none so the next question is Define a unique key in SQL so unique key in SQL ensures that all values in a column or a combination of columns are unique that is no duplicates are allowed it’s like a having a rule to make sure that no two rows in a table must have the same value in that column for example in a user table the email column can have a unique key to ensure that MO two users can register with the same email address remember the key points unlike a primary key a table can have more than one unique key unique Keys allow null values while primary keys do not so these are very important to remember so if you’re ask a difference between a primary key and a unique key just simply say that unlike primary key a table can have more than one unique key and unique key allows null values while primary key doesn’t so here is a table we have listed below that is create table users and we have given user ID as integer primary key and then we have email Vare as a unique key so here the email column is a unique key so that each email must be different it should not be the same now let’s look at our next question which is what is the database a database is organized way to store and manage data think of it like a digital filling cabinet where information is Neely arranged in tables with rows and column each row represents a record and each each column represents a specific detail about that record for example a database for a library must have a table for books then the rows could represent individual books and the columns could include the book title author and the publication year the main purpose of a database is to make it easy to store manage and quickly retrieve data whenever you need it databases are used in everything from apps and websites to banking system and e-commerce platform now let’s look at our question number 10 which is explain the differences between SQL and no SQL databases so here’s a simple explanation SQL databases are structured which means they can store in tables with rows and column like a spreadsheet they follow predefined schema meaning the structure of a data is fixed and you need to Define it before adding any data these databases are great when you need consistent and reliable data like for banking system or inventory management examples are MySQL Oracle Ms SQL post SQL Etc and SQL databases are also known as rdbms which is relational database management system let’s talk about nosql databases so nosql databases are flexible and do not use stable instead they can handle unstructured or semi structured data so no SQL database is dynamic where data is primarily stored in Json objects key value pair graph nodes Etc they don’t have a specific ific structure the such databases are mostly not preferred for performing complex query operations and the examples include mongod DB couch DB elastic search Etc so now let’s move on to question number 11 which is what is a table and a field in a SQL so a table is like a spreadsheet that stores data in organized way using rows and columns each table contains records and the details for example a table name employees could store information about employees in a company whereas a field is a column in a table and it represents a specific attribute or property of the data for example in the employees table Fields could be employee ID name and department so here’s a simple example of the table you can see here the fields name or we can say column and we also have the records or the rows we can see that so in this table the entire table is called employee each row or record stores information about one employee and each column of field represents specific details like employee ID name and the department now let’s move on to a question number 12 which is describe the select statement well the select statement in SQL is used to retrieve data from a table or multiple tables it’s like asking databases show me this specific information here’s how it works you can specify which columns you can see for example to retrieve all customer names from a customer table you can select name from customer if you want to retrieve all the data just write this query select star from customers remember I told you before in the first question we using start if you want to retrieve all the data from the table and if you want to just retrieve from a particular row or a column just simply write select name from customers the field name you can also apply filters you can use a wear Clause to filter the results for example you can write select name from customers where the city name is New York you can also sort the results use the order by to sort the data for instance to sort customers by the name you can just write select name from customers and then order by name ASC ASC means ascending order so in short the select statement lets you choose what data you want to see now let’s talk about what is a constant in SQL and name a few so if you ask this question just simply answer a constant in SQL is a rule applied to a table that ensures the data stored is accurate and consistent it also help in maintaining data Integrity by restricting what values can be added or modified in a table here are some common constraints primary key we have foreign key then we also have unique key check not null and default so we have already discussed about primary key it ensures that each row in a table has a unique identifier and the column can’t contain null values foreign key links to a column in one table to a primary key in another table to maintain relationship unique key ensures all the value in a column are distinct that is there are no duplicates the check ensures that data meets a specific condition before being inserted or updated not null ensures that a column cannot have null values the constraints are essential for maintaining reliable and valid data in your database now let’s talk about what is normalization in SQL normal I ization in SQL is a process which is used to organize data in a database to make it more efficient and reliable the goal is to reduce R deny which is duplicate data and Ure data consistency this is done by splitting a large table into smaller related tables and then linking them using relationship like primary and foreign key for example imagine a single table that stores customer details and the orders if the same customer places multiple orders then their information like name and address would be repeated for each order now using normalization you would separate this into two tables first we would have customer table which will store customer details like customer ID name and the address and then we have order table which will store order details like order ID customer ID and the order table now by linking these tables using customer ID you can also reduce duplication and ensure that any changes to customer details are updated in just one place now let’s talk about question number 15 which is how do you use the wear Clause it’s very easy so just answer the we Clause within SQL queries serves the purpose of selectively filtering rows according to a specified condition thereby enabling you to fetch exclusive those rows that align with the criteria you define for example select star from employees where department is equals to HR now let’s move on to question number 17 which is difference between Union and Union or so Union is used to merge the contents of two structurally compatible table into a single combined table the difference between union and Union all is that Union will omit duplicate records whereas Union all will include duplicate records very easy Union will omit duplicate records and Union all will include duplicate records the performance of Union all will typically be better than Union since Union requires the server to do the additional work of removing any duplicates so in cases where is certain that there are not any duplicates or we having duplicates is not a problem then we can use Union all it would be recommended for performance so now let’s move on to the question number 18 so here a table is given below and you will have to see what will be the result of a query the query is Select star from Runners where ID not in select winner ID from races so the answer is given the simp data provided the result of this query will be an empty set so the reason for this is as follows if the set is being evaluated by the SQL not in condition contains any value that are null then the outer query here will return an empty set even if there are many Runner IDs that match winner IDs in the races table question number 19 is what are indexes in SQL indexes in SQL are just like having a shortcut to quickly find data in a table instead of searching through every Row one by one an index creates a sorted structure based on one or more columns making data retrieval much more easier for example you can think of an index in a book if you’re looking for a specific topic you can go to the index at the back and find the page number in stad flipping through every page similarly in database an index help the system quickly locate the rows you need so here’s how it works if you often search for customers by the name created an index will speed up those queries you can just write create index idx customer name on customer and then the customer name the database uses the index to find the row so you just have to run a query which is Select star from customers where name is John and then you can use the index to find a row with name is equals to John much faster let’s move on to question number 20 which is explain Group by in SQL the group by clause in SQL Will Group rows with the same values in a column allowing you to apply functions like sum count or average to each group for example in a sales table to find total sales by region you just simply have to write this query which is Select region some amount as total sales from Sales Group by region so the group the sales by region and calculates the total for each it’s a quick way to summarize data by categories so now let’s talk about question number 21 which is what is SQL Alias a SQL Alias is a temporary name you can give it to a table or a column in a query to make it easy to read or work with it’s like giving a nickname to something for clarity for example if you have a column named first name you can use an alias to rename it as first name in the query results you just simply have to write this query as select first name as first name in capital letter last name as last name from employe here the as keyword assign the Alias and the output will show The Columns as first name and last name aliases are also useful for tables so for this you can just write the code AS select e first name from Department table so this shortens table name for easier referencing alyses are not permanent they only exist while the query is running now let’s talk about the question number 22 which is explain orderby in SQL so you can answer this question like the order by clause in SQL is used to sort the result set of a query based on one or more columns you can specify each column sorting order ascending or descending for ascending you have to use ASC and for descending you have to use the ESC okay so just have to Simply write this query as select star from product order by Price DEC now let’s talk about question number 23 which is differences between where and having in SQL the where Clause is employed to restrict individual rows before they are grouped such as when filtering rows prior to a group by operation conversely the having Clause is utilized to filter groups of rows after they have been grouped like filtering groups based on aggregate values the having Clause it cannot be used without the group Clause whereas the where Clause specifies the criteria which individual records must mean the selected query it can be used with the group by Clause question number 24 is what is view in sec one more important question so and SQL view is essentially a virtual table that will derive its data from the outcome of a select query view serve multiple purposes including simplifying intricate queries enhancing data security through an added layer and enabling the presentation of targeted data subsets to users all while keeping the underlying table structur hidden now let’s move on to question number 25 which is what is a store procedure so if you asked this question just simply say a sequel stored procedure comprises of prec compiled SQL statements that can be executed together as a unified entity these procedures are commonly used to encapsulate business logic improve performance and also ensure consistent data manipulation practices that’s it now let’s move on to question number 26 which is one more important question which is what what is triggers in SQL a SQL trigger consists of a predefined sequence of actions that are executed automatically when a particular event occurs such as when an insert or delete operation is performed on a table triggers are employed to ensure data consistency conduct auditing and streamline various tasks so you can use insert trigger update Trigger or delete trigger accordingly now let’s talk about what are the aggregate functions and if you know them name a few it’s very easy to answer aggregate function and SQL perform calculations on a set of values and return a single result at first we have minimum which will get the minimum value from the resultant set then we have the max function which will give you the maximum value from the resultant set the sum will give you the sum of values from the resultant set average will give you the simple average of the resultant set and the count will count of numbers records from the resultant set now let’s talk about question number 28 which is how do you update a value in SQL the update statement serves the purpose of altering pre-existing records within a table it involves specifying the target from the update the specific columns to be modified and the desired new values to be applied for example if you want to update you can use Query like update employees set salary is equals to 6,000 where the department is ID now we’ll be moving on to some intermediate mediate SQL interview question and answers so one of the question is what is a self join and how would you use it I would like to repeat again these join types of question is very important these are often asked in interviews so talking about what is a self joint a self join and squ is a type of join where a table is joined with itself it’s useful for comparing rows with the same table or exploring hierarchal relationship such as finding employees and the managers in an organization so imagine if you have an employee table so you have employee ID name and the manager ID so if you want to find each employee and the manager you can use a self jooin you can just simply write a query as select e name as employee M name as manager from employee left join employees on manager so I’ve already discussed with you before what is the meaning of Left Right and self jooin so here the table joined with itself using manager ID to link each employee to the manager a self joint is helpful for comparing rows in the same table or working with hierarchial data so now let’s move on to question number 30 which is explain different types of joints with example at first we have inner joint the inner joint will gather rows that have matching values in both the tables then we have the right joint it will gather all the rows from the right table and any matching rows from the left table left join will gather all the rows from the left table and any matching rows from the right table and the full joint will gather all rows where there’s a match in either table including unmatched rows from both the tables very easy now let’s move on to question number 31 which is what is subquery and provide it using an example so subquery basically refers to a query that is embedded within another query serving the purpose of fetching information that will subsequently be employed as a condition or value within the encompassing out a query so you can just use this uh query which is Select name from employees where salary is greater than select average from salary from employees now the next question is how do you optimize SQL queries so basically the answer to this question would be something like SQL query optimization involves improving the performance of SQL queries by reducing resource usage and execution time strategies include using appropriate indexes optimizing very structured and avoiding cost operations like fully table scans now let’s talk about question number 33 which is what are correlated subqueries it’s a type of subquery that makes reference to columns from the surrounding outer query this subquery is executed repeatedly once for each row being processed by the outer query and its execution depends on the outcomes of the outer query now we’ll be talking about what is a transaction in SQL and it’s very important one of the most important question asked every time in SQL interview questions so basically a transaction in SQL is a group of one or more SQL commands that are treated as a single unit it ensures that all the operations in the group either succeed completely or fail entirely this guarantees the Integrity of the database imagine you’re transferring money from your bank account to a friend’s account that the bank first deducts the amount from your account account and then it adds the same amount to your friend’s account these two steps together form a transaction if one of these steps fails example the system crashes after deducting money from your account but before adding it to your friend’s account then the entire transaction is rolled back meaning no money is transferred and the database returns to its original state so you can also explain this question with the help of example that would be more you know clear to the interviewer now let’s talk about what are asset properties in SQL so basically asset stands for atomicity consistency isolation and durability and these are Key properties that ensures database transactions are reliable and maintain data Integrity atomicity you can think of it as All or Nothing a transaction is a single unit of work if any part of the transaction fails then the entire transaction is rolled back and no changes are made to the database for example if you’re transferring money between two accounts either both the debit and credit operations happen or neither does the second we’re going to talk about is consistency the database must always be in valid State a transaction takes the database from one valid state to another following all the rules and constraints for example if a transaction adds a record that violates a rule like a duplicate primary key then the transaction fails key keeping the database consistent isolation transactions don’t interfere with each other even if multiple transactions are running at the same time each transaction works as if it’s the only one happening example if you two people are updating the same record then one transaction will wait until the other is complete talking about durability once a transaction is committed it’s permanent even if there’s a power outage or system failure the data is saved and it won’t be lost after you complete an online purchase the transaction is stored securely even if the server crashes immediately after so this was for the asset properties and now we’ll be moving on to our next question which is how to you implement error handling in SQL error handling in SQL is a process to manage and respond to errors that occur during query execution different database system have specific ways to handle errors in SQL server the TR catch block is commonly used the tri block contains the main operation while the catch block handles errors if they occur for instance in a transaction you can use roll back in a catch block to undo changes if something goes wrong similarly in Oracle the exception block within PL SQL is used to handle errors if an error arises the exception block executes rolling back the transaction and the logging the error message by implementing error handling you ensure that operations fail gracefully without corrupting data making the database operations more reliable and secure next question which is describe the data types in SQL SQL supports various types of data types which Define the kind of data a column can hold these are broadly categorized into numeric character data type and binary types so we have numeric data types like integer float then we have character string like Car Bar we also have uni code character string like N N Text then we have binary which includes binary image date and time which includes date and date and time then we also have some miscellaneous data types which is XML and Json so the next question is explain normalization and denormalization often this question is asked in this way also or it could be asked something like explain the difference between normalization and denormalization so to answer this you have to just simply explain what normalization is which I have already discussed before once again I’m seeing you normalization and denormalization are ways to organize data in a database normalization is all about breaking big tables into smaller ones to remove duplicate data and improve accuracy for example instead of repeating customer details in every order you create one table for customers and another for orders linking them with a key denormalization on the other hand is when you combine or duplicate data to make it faster and retrieve for instance you might add customer details directly to the c table so that you don’t need to join tables during a query normalization help you space and maintain consistency while denormalization makes data retrieval quicker depending on what the database needs let’s move on to a next question which is what is a clustered index it’s very easy just simply answer by saying that a cluster index in SQL determines the phys physical order of the data rows in a table each table can have only one clustered index which impacts the table storage structure rows in a table are physically stored in the same order as the clustered index key now we have next question which is how do you prevent SQL injection so talking about this question SQL injection is a security risk where attackers insert harmful code into SQL queries potentially accessing or tampering it with your database to prevent this you can use parameterized queries or repair statements to handle the user input safely you can validate inputs to allow only expected values used store procedures to separate logic from data limit database permission non Escape special characters these steps help you secure that your database is free from SQL injection attacks the next question on the list is explain the concept of database schema in SQL a database schema functions as a conceptual container for housing various database elements such as tables views indexes and procedures its primary purpose is facilitate the organization and segregation of these databases elements while specifying their structure and interconnections next question is how we data Integrity insured in SQL just simply answer by saying that data Integrity in SQL is ensured through various means including constants example primary Keys foreign Keys check constants normalization trans actions and referential integrity constants as well these mechanism prevent invalid or inconsistent data from being stored in the database question number 42 which is what is an SQL injection we have already discussed about how we can protect our data from SQL injections so now let’s discuss what is basically a SQL injection so SQL injection in cyber security attack that involves insertion of malous SQL code into applications in input fields or parameters this unauthorized action enables attackers to illicitly access a database extract confidential information or manipulate the data the next question is how do we create a stored procedure you use the create procedure statement to create a stor procedure in SQL a stor procedure can contain SQL statements parameters and variables so here’s a very simple example you can just simply create by writing this query as create procedure get employ by ID add employee ID integer as begin select star from employees where employee ID is equals to add employee ID and then you have just have to write end that’s it so next question is what is a deadlock in SQL and how it can be prevented one more important question often asked an interview so you have to answer something by saying that a deadlock in SQL happens when two or more transactions are stuck because they are waiting for each other to release resources it’s just like two people trying to go through a narrow door at the same time each refusing to step back and let the other pass transaction a locks table one and weights to access table two transaction B locks table two and weights to access table one this is just a simple example so we can see that both the transactions are waiting for each other neither can proceed creating a deadlock so it’s very simple and how we can prevent this deadlock is by locking hierarchies always access resources in the same order so that transactions don’t block each other timeouts set a time limit for transaction to wait out for the resources you can also use deadlock detection and resolution system to detect Deadlocks and cancel one transaction and let the other proceed now let’s move on to a last question on the list which is difference between in and exist in basically works on list result set it doesn’t work on subqueries creating a virtual table with multiple columns Compares every value in the result list performance is comparatively slow of a large result set of subquery whereas the exist works on Virtual tables it is used with correlated queries exist comparison when matches found and the performance is comparatively fast for larger result set of subquery so guys that’s it for this video on the top 45 SQL interview question asked in SQL interviews ever wondered how seems to know exactly what you want before you do that’s the magic of data analytics imagine you’re shopping for a camera and suddenly Amazon suggests the perfect lens tripod and memory card all before you even think of them it’s not magic but the power of analyzing massive data sets to track what millions of Shoppers like you search for and buy together this helps Amazon create a personalized shopping experience that boosts sales and keeps your coming back from predicting Trends to fine-tuning their stock Talk data analyis is a secret Source behind their seamless shopping experience hey everyone welcome back to Simply n’s YouTube channel today we have got an exciting topic lined up the top 10 data analytics certifications I will be walking you through the expanding scope and financial growth of data analytics worldwide why pursuing a data analytic certification is essential and finally the top 10 data analytics certifications that can supercharge your carrier that can open doors to exciting opportunities let’s Dive In and explore the world of data analytics together now let us explore the expanding scope and the financial growth of data analytics the scope of data analytics is worst promising Financial growth and Rising salaries for data analytics scientists and Engineers as Industries digitalize demand surges and finance for fraud detection healthc care for predictive diagnosis retail for personalized marketing and Manufacturing for productive maintenance Innovations like augmented analytics and realtime processing enhances importance companies like Google Amazon Microsoft and IBM consistently higher analytics experts in India entry level salaries range from 4 to six lakhs with perom with experienced professionals earning 10 to 20 lakhs perom in USA entry level salaries are1 60,000 to1 80,000 with experience roles at do$ 100,000 to1 15,000 plus the future promises greater advancements making data analytics a lucrative field with work potential now let us see why is pursuing a data analytics certification essential pursuing a data analytics certification is crucial as it validates your expertise boost your credibility and lights up your resume in a competitive job market certifications provide you with in demand skills like data visualization statistical analysis and machine learning keeping you current with the industry Trends they can lead you to paying high paying job roles and career growth as employers favor certified Professionals for job data driven positions whether you’re starting or advancing your career or certification showcases your commitments and skills enhancing job prospects in fields like Finance Healthcare retail and Tech as well so all right guys the moment you have been waiting for is here it’s time to reveal the top data analytics certifications by simply learn buckle up and let’s dive into this carrier boosting programs that will set you on the path of success coming to the number one that is a post-graduate program in data analytics boost your career with simply Lars postgraduate program ineda analytics offered in partnership with bir University and in collaboration with IBM this comprehensive 8mon live online course is perfect for professionals from any background and covers crucial skills like data analysis visualization and supervised learning using python R SQL and powerbi the program features master classes by Purdue faculty and IBM experts Hands-On projects with real world data sets from Google Play Store lift and more and exclusive hackathons and AMA sessions receive joint certifications from Padu and simply learn IBM recognized certificates and benefit from carrier Support Services like resum building and job assistance through simply learns job assist no prior experience required just a bachelor’s degree with at least 50% marks is required enroll now to gain industry relevant experience and stand out to the top employers like Google and Amazon to check for the coast Link in the description box and pin comments below now moving on to the number two that is calch postgraduate progr in data science Advance your career with simply learns postgraduate program in data science in collaboration with calch ctme and IBM this comprehensive 11 month live online course covers essential skills and tools including python machine learning data visualization generative AI promt engineering chat juty and more with master classes by Caltech instructors and IBM experts you will G hands-on experience to 25 plus industy Rel projects Capstone projects across three romens and seamless access to integrated Labs on a tees program completion certificate and up to 14 counting education units from CTIC ctme along with the NY recognized IBM certificates enhance your career with job assistance master classes and exclusive hackathons with no prior work experience required this program is suitable for professionals from any background who hold a bachelor’s degree enroll now to become a data science expert and stand out to top employers to check for the course Link in the description box below and pin comments now moving on to the number third that is professional certificate programming data analytics and generative Advance your career with professional certificate program in data analytics and generative AI by simply learn in collaboration with E and ICT Academy IIT goti and IBM this comprehensive 11mon live online program is designed to equip you with cutting a skills in data analytics and generative AI covering essential tools like SQL Excel python W power VI and more learn from distinguished I faculty and IBM experts through interactive master classes Hands-On projects and Capstone experiences gain practical expertise with exposure to jni tools such as chaty and Gemini and earn industry recognized certifications from IVM along with the executive alumni status from I goti enhance your Professional Profile with simply learn job assess resume building and job placement support to get noticed by the top hiring companies enroll now to elevate your career and join network of Industry leaders do check for the co Link in the description box below and pin comments moving on to the number four that is professional certificate course in D s Master data science with a professional certificate course in data science by simply learn in collaboration with ICT Academy I kpur this comprehensive 11 month live online program equips with essential skills and tools such as python power BW chat jity and more benefit from the master classes delivered by distinguished IIT kpur faculty gain practical experience with 25 plus Hands-On projects and access integrated La for real world training with dedicated modules on generative AI prompt engineering and explainable AI you will stay ahead in the rapidly evolving AI landscape ear a prestigious program completion certificate from E and ICT Academy IIT kpur and take advantage of Simply Lars job asset to enhance your Professional Profile and stand out to recruiters apply now to enhance your career in data science and AI do check for the course Link in the description box below and pin comments now moving on to the fifth one that is the postgraduate program in data science supercharge your career with the postgraduate program in data science by simply learn in collaboration with bird University and IBM ranked as the number one data science program by Economic Times this 11 month live online program equips with with the in demand skills including python machine learning deep learning NLP data visualization generative Ai and chargeability benefit from the master classes led by Purdue faculty and IBM experts engageing Hands-On training with 25 plus projects and free Capstone projects and gain access to Industry leading tools such as T flow carers powerbi and more earn dual certificates from perue University online and IBM boosting your Professional Profile and carrier prospects the simply learns job assess receive guidance and resume support to stand out to the top employers applications close on November 8 2024 and enroll now to transform your career in data science and AI to check for the course Link in the description box below and pin comments now moving on to the sixth one that is applied Ai and data science Advance your career with applied Ai and data science program offered by Brown University’s School of Professional studies and collaboration with simply learn this 14 week CPL program empowers you with essential skills in AI generative Ai and data science including handson learning and Industry Rel projects learn from Steam Brown faculty through top not video content and monthly live master classes covering tools and Concepts such as python machine learning neural lent walking and jpt models benefit from a curriculum design to refine your expertise supported by integrated labs and exclusive content on generative AI andn a prestigious certificate of completion from Brown University and a credly badge upon program completion enhance your profile with simply L job asset resumee building support and exclusive I IM job membership to stand out in today’s competitive job market enroll now to gain The Cutting age knowledge and take your carrier in Ai and data science to the next level do check for the course Link in the description box below and pin comments now moving on to the next that is the data analyst elevate your career with simplys data analyst certification rank number one by carrier Karma this comprehensive 11 month program is designed to transform you into a data analytic expert with practical training and SQL R python data visualization and predictive analystics learn through live interactive classes Capstone projects and 20 plus Hands-On projects that ensure Real World Experience G industry recognized certifications from Simply learn and IBM access exclusive master classes and am sessions by IBM experts and receive dedicated job assistance to help you stand out to the top employers like Amazon Microsoft and Google start your journey to becoming a data analytics professional today with simply learns trusted and robust training program to check for the course Link in the description box below and pin comments now moving on to the next one that is data scientist Advance your career with simply learns industry leading data scientist certification program now ranked number one by carer Karma this 11 month course in collaboration with IBM equips you with the essential data science skills including python SQL machine learning generative Ai and W gain practical Real World experience to 25 plus Hands-On projects and a Capstone project benefit from master classes by IBM experts interactive live sessions led by industry professionals and lifetime access to the self placed learning content simply lears job assess program further boost your carer prospects helping you stand out to thep employers like Amazon Microsoft and Google to check for the course Link in the description box below and pin comments now moving on to the second last one that is the professional certificate program in data engineering launch your data engineering career with simply launch professional certificate program in data engineering offered in partnership with P University online this 32e program accuses with the indman skills covering python SQL nosql Big Data AWS Azure and snowflake fundamentals aligned with industry recognized certifications like AWS certified data engineer Microsoft 203 and snow Pro core this course ensures comprehensive learning through live online classes practical projects and a Capstone experience gain access to puru Alumni Association exclusive master classes and simplys job asset for carer support join now to become a certified data engineer and FASTT trck eradio to high impact roles in the field do check for the course Link in the description box below and pin comments now moving on to the last but not Le is Microsoft certified as your data engineer associate dp23 Advanced your carer will simply learns Microsoft certified Azure data engineer associate dp23 training aligned with official certification Master essential Azure skills like data integration transformation and storage while gaining hands-on experience with the key services such as Azure signups analytics data Factory and Azure data braas benefit from live online classes led by Microsoft certified trainers access to official Microsoft handbooks practice lab and comprehensive practice test to help you excellent dp23 exam this course designed for real world application ensures you develop job ready skills and earn a official course completion batch hosted on the Microsoft learn portal enroll now to elevate your data engineering expertise do check for the course Link in the description box below and pin comments so getting a data analytic certification can be a game changer for your growth however choosing the right certification is crucial it’s like finding the perfect key to unlock your potential select the one that best aligns with your career goals and S SK to maximize your journey in data analytics so that’s a WRA so that concludes our SQL full course if you have any doubts or question you can ask them in the comment section below our team of experts will reply you as soon as possible thank you and keep learning with simply staying ahead in your career requires continuous learning and upscaling whether you’re a student aiming to learn today’s top skills or a working professional looking to advance your career we’ve got you covered explore our impressive catalog of certification programs in cuttingedge domains including data science cloud computing cyber security AI machine learning or digital marketing designed in collaboration with leading universities and top corporations and delivered by industry experts choose any of our programs and set yourself on the path to Career Success click the link in the description to know more hi there if you like this video subscribe to the simply learn YouTube channel and click here to watch similar videos to nerd up and get certified click here
Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!