Amjad Izhar Blog

Author: Amjad Izhar

Financial Accounting Fundamentals
This collection of text comprises a series of video lectures on financial accounting. The lectures explain fundamental accounting concepts, including the preparation of financial statements (income statement, statement of changes in equity, balance sheet), and the calculation and interpretation of financial ratios. The lectures also cover adjusting journal entries, bad debt expense, bank reconciliations, and depreciation methods. Specific accounting methods like FIFO, LIFO, and weighted average are demonstrated, and the importance of internal controls is emphasized. Finally, the lectures discuss the statement of cash flows and its preparation using both direct and indirect methods.

Financial Accounting Study Guide

Quiz

Instructions: Answer each question in 2-3 sentences.
1. What is the key difference between an asset and a liability?
2. Explain the concept of “accounts payable” in the context of a company’s liabilities.
3. How do revenues differ from expenses for a business?
4. Define the term “dividends” and explain their relationship to a company’s profits.
5. What distinguishes a current asset from a long-term asset?
6. What does it mean for a company to “debit” an account?
7. What is the significance of the accounting equation (A = L + SE)?
8. What is the purpose of a journal entry in accounting?
9. Explain the concept of “accumulated depreciation” and its function.
10. Briefly describe the difference between the FIFO, LIFO, and weighted-average methods of inventory valuation.
Quiz Answer Key
1. An asset is something of value that a company owns or controls, whereas a liability is an obligation a company owes to someone else, requiring repayment in the future. Assets are what the company possesses, and liabilities are what the company owes.
2. Accounts payable represents a company’s short-term debts, usually due within 30 days, often arising from unpaid bills like phone bills or supplier invoices. It is a common liability found on a balance sheet.
3. Revenues are the money a company earns from its core business activities, such as sales or service fees, and expenses are the costs incurred in running the business, like salaries or utilities. Revenues are inflows, and expenses are outflows.
4. Dividends are a portion of a company’s profits that shareholders receive, representing a distribution of earnings. They are a payout to owners of the company if they choose to take money out of the business.
5. A current asset is expected to be used or converted into cash within one year, such as cash or inventory, while a long-term asset is intended for use over multiple years, such as land or equipment. The one-year mark is the distinguishing line.
6. A debit is an accounting term that increases asset, expense, or dividend accounts while decreasing liability, shareholders’ equity, or revenue accounts. The usage of debits and credits is core to the accounting system.
7. The accounting equation, A = L + SE, represents that a company’s total assets are equal to the sum of its liabilities and shareholders’ equity. It’s a foundational concept ensuring the balance of a company’s financial position.
8. A journal entry is the first step in the accounting cycle and records business transactions by detailing debits and credits for at least two accounts. They create a trackable record for every transaction.
9. Accumulated depreciation represents the total amount of an asset’s cost that has been expensed as depreciation over its life to date. It is a contra-asset account that reduces the book value of the related asset.
10. FIFO (first-in, first-out) assumes that the oldest inventory is sold first. LIFO (last-in, first-out) assumes that the newest inventory is sold first. Weighted average uses the average cost of all inventory to determine the cost of goods sold.
Essay Questions

Instructions: Write an essay that thoroughly explores each of the following prompts, drawing on your understanding of the course material.
1. Discuss the importance of understanding the differences between assets, liabilities, and shareholders’ equity for making sound business decisions. Consider how these elements interact and contribute to a company’s overall financial health.
2. Explain the different types of journal entries covered in the source material and how the concept of debits and credits is essential for accurately recording financial transactions. Why is it so important that a journal entry balance?
3. Compare and contrast the straight-line, units of production, and double-declining balance methods of depreciation. Under what circumstances might a business choose one method over another, and why?
4. Describe the components of a cash flow statement and their importance to understanding a company’s overall financial performance. Discuss how the operating, investing, and financing sections are used to evaluate a company’s financial decisions.
5. Explain the different inventory valuation methods (FIFO, LIFO, Weighted Average) and how they can affect a company’s cost of goods sold and net income. What are the implications of using one method over another?
Glossary of Key Terms

Accounts Payable: A short-term liability representing money owed to suppliers for goods or services purchased on credit.

Accounts Receivable: A current asset representing money owed to a company by its customers for goods or services sold on credit.

Accrued Expense: An expense that has been incurred but not yet paid in cash.

Accrued Revenue: Revenue that has been earned but for which payment has not yet been received.

Accumulated Depreciation: The total depreciation expense recorded for an asset to date; a contra-asset account that reduces the book value of an asset.

Asset: Something of value that a company owns or controls, expected to provide future economic benefit.

Balance Sheet: A financial statement that presents a company’s assets, liabilities, and equity at a specific point in time.

Bond: A long-term debt instrument where a company borrows money from investors and promises to pay it back with interest over a specified period.

Cash Flow Statement: A financial statement that summarizes the movement of cash into and out of a company over a specific period.

Common Shares: A type of equity ownership in a company, giving shareholders voting rights and a claim on the company’s residual value.

Contra-Asset Account: An account that reduces the value of a related asset (e.g., accumulated depreciation).

Cost of Goods Sold (COGS): The direct costs of producing goods that a company sells.

Credit: An accounting term that decreases asset, expense, or dividend accounts, while increasing liability, shareholders’ equity, or revenue accounts.

Current Asset: An asset expected to be converted into cash or used within one year.

Current Liability: A liability due within one year.

Debit: An accounting term that increases asset, expense, or dividend accounts, while decreasing liability, shareholders’ equity, or revenue accounts.

Depreciation: The allocation of the cost of a tangible asset over its useful life.

Depreciable Cost: The cost of an asset minus its residual value, which is the amount to be depreciated over the asset’s useful life.

Discount (on a bond): Occurs when a bond is sold for less than its face value. This happens when the market interest rate exceeds the bond’s stated interest rate.

Dividend: A distribution of a company’s profits to its shareholders.

Double-Declining Balance Depreciation: An accelerated depreciation method that applies a multiple of the straight-line rate to an asset’s declining book value.

Equity (Shareholders’ Equity): The owners’ stake in the assets of a company after deducting liabilities.

Expense: A cost incurred in the normal course of business to generate revenue.

FIFO (First-In, First-Out): An inventory valuation method that assumes the first units purchased are the first units sold.

Financial Statements: Reports that summarize a company’s financial performance and position, such as the income statement, balance sheet, and cash flow statement.

General Ledger: A book or electronic file that contains all of the company’s accounts.

Gross Profit (Gross Margin): Revenue minus the cost of goods sold.

Income Statement: A financial statement that reports a company’s revenues, expenses, and profits or losses over a specific period.

Inventory: Goods held by a company for the purpose of resale.

Journal Entry: The recording of business transactions showing the debits and credits to accounts.

Liability: A company’s obligation to transfer assets or provide services to others in the future.

LIFO (Last-In, First-Out): An inventory valuation method that assumes the last units purchased are the first units sold.

Long-Term Asset: An asset that a company expects to use for more than one year.

Long-Term Liability: A liability due in more than one year.

Net Income: Revenue minus expenses; the “bottom line” of the income statement.

Premium (on a bond): Occurs when a bond is sold for more than its face value. This happens when the market interest rate is less than the bond’s stated interest rate.

Preferred Shares: A type of equity ownership in a company, where shareholders have a preference over common shareholders in dividends and liquidation.

Retained Earnings: The cumulative profits of a company that have been retained and not paid out as dividends.

Revenue: Money a company earns from its core business activities.

Residual Value (Salvage Value): The estimated value of an asset at the end of its useful life.

Straight-Line Depreciation: A depreciation method that allocates an equal amount of an asset’s cost to depreciation expense each year of its useful life.

T-Account: A visual representation of an account with a debit side on the left and a credit side on the right.

Units of Production Depreciation: A depreciation method that allocates an asset’s cost based on its actual usage rather than time.

Vertical Analysis: A type of financial statement analysis in which each item in a financial statement is expressed as a percentage of a base amount. On an income statement, it is usually expressed as a percentage of sales. On a balance sheet, it’s usually expressed as a percentage of total assets.

Weighted-Average Method: An inventory valuation method that uses the weighted-average cost of all inventory to determine the cost of goods sold.

Financial Accounting Concepts and Analysis

Okay, here is a detailed briefing document summarizing the key themes and ideas from the provided text, incorporating quotes where relevant.

Briefing Document: Financial Accounting Concepts and Analysis

I. Introduction

This document provides a review of core financial accounting concepts, focusing on assets, liabilities, equity, revenues, expenses, dividends, journal entries, and financial statement analysis. The source material consists of transcribed video lectures from an accounting course, delivered by a professor (likely “Tony”) with a conversational and relatable style.

II. Core Accounting Terms and Concepts

A. Assets: * Defined as “something of value that a company can own or control.” * Value must be “reasonably reliably measured.” * Examples: * Accounts Receivable: “our customer hasn’t paid the bill right we did some work for the customer they haven’t paid us yet we would expect to collect in less than a year” * Inventory: “Walmart expects to sell through any piece of inventory in less than a year” * Long-term investments, land, buildings, and equipment are also assets. * Distinction between Current vs Long-term Assets * Current Assets are expected to be liquidated or used up within one year. * Long-Term Assets are those expected to be used beyond one year.

B. Liabilities: * Defined as “anything that has to be repaid in the future.” * Technical definition: “any future economic obligation.” * Examples: * Accounts Payable: “within typically within 30 days you’ve got to pay it back” * Notes Payable: “bank loans, student loans, mortgages,” all categorized under “note payable” which is a contract promising repayment. * Distinction between Current vs Long-term Liabilities * Current Liabilities are obligations to be repaid within one year. * Long-Term Liabilities are obligations to be repaid over a period longer than a year, such as a mortgage.

C. Shareholders’ Equity: * Represents the owners’ stake in the company. * “If I were to sell them off pay off all my debts what goes into my pocket that is my equity in the company” * Includes common shares and retained earnings.

D. Revenues: * Defined as what a company “earns” when it “does what it does to earn money.” * Examples: Sales revenue, tuition revenue, rent revenue. * “How is the money coming in? It’s the revenue-generating part of the business.”

E. Expenses: * Defined as “costs” associated with running a business. * Examples: Salary expense, utilities expense, maintenance expense.

F. Dividends: * Represent “shareholders pulling profits from the company,” essentially taking cash out of the company’s retained earnings. * Payable when “revenues exceed the expenses” or when the company is profitable. * Shareholders “can keep the money keep those profits in the company or the shareholders can say I’d like some of that money.”

III. Journal Entries

A. The Concept: * Based on Newton’s third law of motion, “for every action there is an equal and opposite reaction.” * “There’s not just one thing happening there’s always kind of equal and opposite forces acting in a journal entry.” * Every transaction has at least one debit and at least one credit and the value of the debits must equal the value of the credits.

B. Debits and Credits: * Debits (Dr) and Credits (Cr) are not related to credit cards or bank accounts, but they are used to increase or decrease different types of accounts. * The basic accounting equation: Assets = Liabilities + Shareholders’ Equity (A=L+SE) * Accounts on the left side (Assets) go up with a debit and down with a credit. Accounts on the right side (Liabilities and Equity) go up with a credit and down with a debit.

C. Journal Entry Table: * The presenter suggests this mnemonic “a equal L + SE” with “up arrow down arrow down arrow up arrow down arrow up Arrow then beneath I write Dr CR Dr CR R Dr CR.” * This is used as a visual aid to determine the correct debits and credits for a transaction.

**D. Journal Entry Elements:**

* Each journal entry must include:

* A date.

* A debit account.

* A credit account.

* The value of the debit and credit, which must be equal.

* A description of the transaction, avoiding the use of dollar signs.

E. Examples * Purchase a car for cash. * Debit: Car Asset * Credit: Cash Asset. * Purchase a car with a car loan. * Debit: Car Asset * Credit: Car loan payable Liability * Purchase a car with part cash and a car loan. * Debit: Car Asset * Credit: Cash Asset * Credit: Car loan payable Liability

IV. Adjusting Journal Entries

A. Types of Adjustments * Prepaids: When expenses are paid in advance, like insurance. The prepaid asset is reduced as the expense is recognized. * Example: Prepaid insurance becomes insurance expense over time. * Depreciation: When a long-term asset’s value is reduced over time. * Example: Vehicles, equipment. * Accrued Expenses: When expenses build up but are not yet paid. This creates a liability. * Example: Accrued interest on a loan. * Accrued Revenues: When revenues are earned but not yet received. This creates a receivable. * Example: Service revenue earned on account.

B. The Purpose: * To ensure financial statements accurately reflect the company’s financial position at the end of the period. * Adjustments are necessary because “the lender isn’t calling me saying hey it’s December 31st where’s my money no they know they’re not getting paid till July so the accountant just has to know oh I’ve got a liability here that’s building up.”

V. Financial Statement Analysis

A. Trial Balance * An unadjusted trial balance is a list of all accounts and their balances before making any adjusting entries. * An adjusted trial balance is created after all adjusting entries are made. B. Income Statement * Shows the company’s revenues and expenses for a period. * Calculates net income: Revenues – Expenses. C. Balance Sheet * Shows the company’s assets, liabilities, and equity at a specific point in time. * The basic accounting equation (Assets=Liabilities + Equity) must always balance. D. Statement of Cash Flows * Categorizes cash flows into operating, investing, and financing activities. * It provides a summary of how cash changed during a given period. * Uses both changes in balance sheet accounts and information in the income statement to create the full picture.

E. Ratio Analysis: * Liquidity Ratios assess a company’s ability to meet its short-term obligations. Includes the current ratio. * Profitability Ratios assess a company’s ability to generate profit. Includes gross profit margin and net profit margin. * Solvency Ratios assess a company’s ability to meet its long-term obligations, such as the debt-to-equity ratio.

F. Vertical Analysis (Common Sized Statements): * Expresses each item on a financial statement as a percentage of a base figure. * On the income statement, each item is expressed as a percentage of total sales. * On the balance sheet, each item is expressed as a percentage of total assets. * Allows for comparison of companies of different sizes or for comparing trends across years.

VI. Other Concepts Covered
- Inventory Costing Methods: FIFO (First-In, First-Out), LIFO (Last-In, First-Out), and Weighted Average methods.
- Bank Reconciliation: Adjusting bank statements and company records to reconcile the different balances in order to identify errors and discrepancies.
- Allowance for Bad Debts: A contra-asset account used to estimate uncollectible receivables.
- Bonds: Accounting for bonds issued at a premium or discount, and the amortization of those premiums or discounts over the life of the bond.
- Shareholders’ Equity: Different types of shares, such as common shares and preferred shares.
- Closing Entries: Resetting revenue, expense, and dividend accounts to zero at the end of an accounting period.
VII. Key Themes
- The Importance of Understanding Journal Entries: “you really need to understand them… if you haven’t understood it well it’s just going to haunt you for the rest of class”
- Financial Accounting is About Tracking Financial Events: “accounting is all about tracking Financial events.”
- Accounting is Logical and Systematic: The goal is to keep track of transactions “in a logical way that’s not going to drive you crazy.”
- Practical Application: Emphasis is placed on real-world examples and applications.
- Mistakes are Opportunities to Learn: “it’s not even the end of the world if you fail a course but really it’s not the end of the world if you fail a test you can put it together you can put yourself together and you can improve.”
VIII. Conclusion This source material provides a detailed explanation of accounting and financial analysis concepts. The speaker employs practical examples and a relatable, conversational teaching style that aims to both inform and engage students, encouraging deep understanding and retention of these core principles.

Financial Accounting Fundamentals

Financial Accounting FAQ
- What is the difference between an asset and a liability in accounting? An asset is something of value that a company owns or controls. This could include tangible items like inventory, buildings, and equipment, or intangible items like patents and trademarks. The key thing to remember is that an asset has economic benefit to the company, and the value can be reasonably and reliably measured. A liability, on the other hand, is an obligation of the company, something it owes to others, that has to be repaid in the future. Examples include bank loans, mortgages, accounts payable (money owed to suppliers), and even unpaid phone bills. Essentially, assets are what a company has and liabilities are what it owes.
- What is the difference between “current” and “long-term” when classifying assets and liabilities? The distinction between “current” and “long-term” depends on the timeframe over which the asset will be converted to cash or the liability will be paid off. A current asset is expected to be liquidated (turned into cash) or used up within one year or less. Examples of current assets include cash, inventory (for companies that expect to sell it quickly), and accounts receivable (money due from customers for short-term credit). A long-term asset, in contrast, is not expected to be liquidated within a year; it includes things like land, buildings, and machinery that are intended for long-term use by the company. A current liability is an obligation that’s expected to be paid within a year, such as short-term debt, accounts payable, or wages payable. A long-term liability is an obligation that’s not due within a year; it includes things like long-term bank loans or mortgages. The one-year line is a key point in financial accounting.
- What are the key components of shareholders’ equity, and how do they relate to the balance sheet? Shareholders’ equity represents the owners’ stake in a company. It’s comprised primarily of two main components. Common shares reflect the original investments made by shareholders in exchange for ownership in the company. Retained earnings represent the accumulated profits that a company has not distributed as dividends to its shareholders but has kept to reinvest in the business. These amounts are listed on the balance sheet under the heading “Shareholder’s Equity” and represent the residual value of the company after all its debts are paid. The basic accounting equation that connects all of these is Assets = Liabilities + Shareholders’ Equity.
- How do revenues, expenses, and dividends affect a company’s profitability? Revenues are the income a company earns from its normal business operations, such as sales, service fees, or rent. They are the “earn” component of the income statement. Expenses are the costs a company incurs to generate revenue. This could include salaries, utilities, rent, cost of goods sold, and so on. If revenues exceed expenses, the company is profitable; if expenses exceed revenues, the company is operating at a loss. Dividends are payments of a portion of a company’s profits that are made to the shareholders (owners) of the business. They are not an expense but are instead a distribution of profits, so while they don’t affect net income, they do affect how much profit the company can keep for reinvestment.
- What are journal entries, and why are they so important in financial accounting? Journal entries are the initial step in recording business transactions. Every journal entry will have at least one debit and at least one credit that balance with each other. They serve to record the financial effects of business transactions (like buying a car, getting a loan, selling services etc) in a formal and organized manner. They adhere to the fundamental accounting equation and follow a consistent debit/credit format so that the effects of each financial transaction are accurately tracked. They create an audit trail and prevent mistakes. Journal entries are very important because, without them, it would be difficult to track where a company’s resources are, what the company owes, and how successful the company is in generating profits. Without a solid understanding of journal entries, it is very difficult to learn more advanced topics in accounting.
- What is a “T-account” and how is it used in accounting? A T-account is a simple visual representation of a general ledger account. It’s literally shaped like the letter T with the account name (e.g. Cash, Accounts Payable) above the T. The left side of the T is the “debit” side, while the right side is the “credit” side. After a transaction has been recorded in a journal entry, the details are transferred to the appropriate T-accounts, a process called “posting.” This helps to track the increases and decreases in every financial account of the company. T accounts are the basis for preparing financial statements and allow accountants to determine the ending balance of every account.
- What are adjusting journal entries and what types are common? Adjusting journal entries are made at the end of an accounting period to correct errors, recognize transactions that have occurred over time but not yet been recognized, or to update the financial records. Common adjusting journal entries include: prepaid expenses, where a company pays for something in advance and uses it up over time, like insurance or rent; depreciation, which is where we record the wearing out of long term assets over time like equipment or buildings; accrued expenses, which are costs that have built up over time but have not yet been paid (think of interest owed or salaries earned by employees); and finally, accrued revenues which are revenues earned that have not been paid by customers yet. The core concept is that some transactions don’t happen in one single moment of time, they happen over a period of time and it is important to reflect this in a company’s financial statements.
- What are closing entries and why are they important in the accounting cycle? Closing entries are made at the end of an accounting period to transfer the balances of temporary accounts (like revenues, expenses, and dividends) into a permanent account, which is normally the retained earnings account. Temporary accounts are used only to track an individual year’s performance. Once closed, they start fresh at zero for the next accounting period. The closing process ensures that revenue and expense information is summarized for each period, that they don’t carry forward from year to year, and that the profit generated by a company (net income) flows into retained earnings. Closing entries are a key part of closing one fiscal year and beginning another.
Financial Accounting Fundamentals

Okay, here is the detailed timeline and cast of characters based on the provided text:

Timeline of Events (as presented in the text):
- General Accounting Concepts Introduced:Discussion of Assets (things of value), Liabilities (obligations to repay), and Equity (what’s left after liabilities are paid from assets).
- Explanation of Current vs. Long-term Assets and Liabilities (one year is the cutoff).
- Explanation of Revenues (earned income), Expenses (costs incurred), and Dividends (shareholder profits taken from retained earnings).
- Example of Account Classification:Categorization of various accounts as Assets, Liabilities, Equity, Revenue, or Expense (e.g., Long-term Investments, Accounts Receivable, Accounts Payable, Common Shares, etc).
- Classification of assets and liabilities as current or long-term.
- Personal Accounting Mistake and Encouragement:The speaker shares a story about getting a very low mark on their first accounting exam (28%) and the subsequent struggle, but ultimate success in the class and eventual career.
- The speaker encourages viewers to keep going and improve if they struggle.
- Introduction to Journal Entries:Explanation of the concept of debits and credits in journal entries, relating them to Newton’s third law (“for every action, there is an equal and opposite reaction”).
- Example of a purchase (car for cash) to demonstrate journal entries (debit cars, credit cash).
- Example of a purchase of a car using a loan (debit cars, credit car loan payable).
- Example of buying a car with both cash and a loan (debit car, credit cash and credit car loan payable).
- Practice with Journal Entries:Recording of several business transactions using journal entries including:
- Share Issuance.
- Payment of Rent.
- Borrowing Money.
- Equipment Purchase (part cash, part payable)
- Purchase of Supplies on Account.
- Completion of a Taxidermy Job on Account.
- Dividend Payment.
- Payment of Utilities Bill.
- Payment for a past Equipment Purchase.
- Receipt of Telephone Bill.
- Collection of Receivable.
- Payment for Supplies (Cash).
- Sale of Taxidermy Services.
- Rent Revenue.
- Payment of Salaries.
- Posting Journal Entries to T-Accounts:Introduction of T-accounts as a way of organizing journal entries into separate accounts (assets, liabilities, equity, revenue, expense).
- Example of transferring debits and credits to T-accounts.
- Adjusting Entries:Introduction to the concept of adjusting journal entries, which are not typically triggered by external transactions.
- Examples of adjusting entries:
- Prepaid Expenses: The example used was insurance, how to use up that asset over the life of the insurance.
- Depreciation: Recording the reduction in value of an asset over time.
- Accrued Expenses: Interest on a loan that is building up (but not yet paid).
- Accrued Revenue: Revenue earned, but cash not received.
- Discussion of how these adjusting entries are necessary for properly representing a company’s financial position.
- Comprehensive Problem 1:A large multi-step problem that combined several concepts:
- Making adjusting journal entries (for supplies, prepaid insurance, unearned revenue, depreciation etc.)
- Preparing an Adjusted Trial Balance.
- Preparing a full set of Financial Statements (Income Statement, Statement of Changes in Equity, Balance Sheet).
- Closing Entries:Explanation of the purpose of closing entries (to reset temporary accounts).
- Demonstration of closing entries with a focus on the income statement accounts.
- Preparation of a Post-Closing Trial Balance.
- Bank Reconciliations:Explanation of the purpose of a bank reconciliation.
- Walk-through of bank reconciliation example.
- Accounts Receivable and Bad Debts:Discussion of accounts receivable and the need for an allowance for uncollectible accounts.
- Calculation and journal entry for bad debts expense and allowance for doubtful accounts.
- Explanation of how a “write off” works to remove a bad debt.
- Inventory and Cost of Goods Sold:Example of a simple inventory purchase and sale with the related journal entries.
- Example of inventory purchases at multiple prices, and their impact on COGS.
- Introduction of different inventory costing methods (FIFO, LIFO, Weighted Average).
- Discussion of the Specific Identification method.
- Inventory Methods (FIFO, LIFO, Weighted Average):Walk-through of inventory record example using FIFO (first in, first out).
- Walk-through of inventory record example using LIFO (last in, first out).
- Walk-through of inventory record example using weighted average method.
- Depreciable Assets and Depreciation Methods: * Discussion of depreciation for assets with an estimated residual value. * Example and calculation of depreciation using straight-line method, including partial-year depreciation. * Example and calculation of depreciation using units of production method. * Example and calculation of depreciation using double declining balance method.
- Sale of Assets:Example of selling a depreciated asset. Calculation of gains and losses on the sale and the related journal entries.
- Bonds PayableDiscussion of Bonds Payable – both at a premium and at a discount, the need for amortization of premiums and discounts.
- Examples of bond issue, interest payment and discount amortization.
- Shareholder EquityDiscussion of preferred shares and their relative advantages to common shares.
- Statement of Cash Flows:Explanation of the purpose of the Statement of Cash Flows and its three categories: Operating, Investing, and Financing.
- Example of the reconciliation of retained earnings to arrive at dividends for the cash flow statement.
- Preparation of a simple statement of cash flows from a balance sheet and income statement.
- Financial Statement Analysis (Vertical Analysis):Introduction to Vertical Analysis and how it is useful to make comparisons between unlike sized companies.
- Examples of preparing a common-sized income statement and a common-sized balance sheet.
- Financial Ratio Analysis:Introduction to the importance and use of financial ratios for analysis.
- Calculation and discussion of several financial ratios (current ratio, acid-test ratio, debt-to-equity ratio, return on equity, gross profit margin, return on assets).
Cast of Characters (Principal People Mentioned):
- The Instructor (Tony Bell): An accounting professor, presumably the narrator of the videos. He shares personal anecdotes about his own struggles with accounting, provides clear explanations of concepts, and guides viewers through the practice problems. He encourages viewer engagement with likes and subscribes.
- Isaac Newton: A famous physicist whose third law is used as an analogy to explain the debit and credit relationship in journal entries.
- Maria: The owner/shareholder of a company, implied in the journal entry example where they take a dividend.
- W. White: The customer that wrote the bad NSF check in the bank reconciliation example.
- The Car Dealer – the entity that sells the car to the instructor in the journal entry example.
- MIT (Massachusetts Institute of Technology) The entity that issues bonds in an illustrative example.
- Harvard University The entity used as a competitive example in the bond discussion.
- Kemp Company: Hypothetical company used in the depreciation examples.
- Bill’s Towing: The hypothetical company used in the asset sale example.
- Tinger Inc. The hypothetical company used in the bond issuance examples.
- Abdan Automart: The hypothetical company used in the inventory method examples.
- Romney Inc.: Hypothetical company used in the combined purchase and sale inventory example.
- Harre Gil & Hussein Inc.: The hypothetical entities compared using Vertical Analysis.
This should give you a solid overview of the content covered in the provided text. Please let me know if you have any other questions or requests.

Understanding the Income Statement

An income statement, also called the statement of operations or profit and loss (P&L) statement, summarizes a company’s revenues and expenses to determine its profitability [1, 2].

Key aspects of the income statement, according to the sources:
- Purpose: To show whether a company was profitable, and if so, how much money it made [1]. It answers the question of whether earnings exceeded costs [2].
- Components:
- Revenues are what a company earns from its business activities [3]. Examples include sales revenue, tuition revenue, and rent revenue [3]. Revenues are considered “earned” [3].
- Expenses are the costs of earning revenue [3]. Examples include salary expense, utilities expense, and maintenance expense [3].
- Net Income or Profit is calculated by subtracting total expenses from total revenues [1].
- Format:
- A proper income statement title includes three lines: the company’s name, the name of the statement, and the date [4].
- The date must specify the time period the statement covers (e.g., “for the year ended”) [4].
- Revenues are listed first, followed by expenses [5].
- A total for expenses is shown [5].
- The net income is double-underlined [6].
- Dollar signs are placed at the top of each column and beside any double-underlined number [6].
- Gross Profit: In a retail business, the income statement includes the cost of goods sold (COGS). Sales revenue minus sales returns and allowances equals net sales. Net sales minus COGS equals gross profit [7, 8].
- A gross profit percentage can be calculated by dividing gross profit by net sales [9].
- Operating Income: The income statement lists operating expenses, which, when subtracted from gross profit, gives the operating income or profit [8, 9].
- Non-operating Items: The income statement may include non-operating expenses, such as interest and income tax [10, 11].
- Usefulness: An income statement is typically one of the first places an analyst will look to assess a company’s performance [2].
It is important to note that the income statement should be compared to prior periods to assess whether a company’s profit is trending up or down [6]. An analyst may also compare the income statement to those of other companies [4].

Statement of Changes in Equity

A statement of changes in equity summarizes how a company’s equity accounts changed over a period of time [1, 2]. The statement details the changes in the owner’s stake in the company [1, 3].

Key aspects of the statement of changes in equity, according to the sources:
- Purpose: The statement shows the changes in equity accounts over a period [2]. It summarizes what happened to the shareholders’ equity accounts during the year [1].
- Components:
- Beginning Balance: The statement begins with the balances of each equity account at the start of the period [2]. For example, the beginning balance of common shares and retained earnings on January 1st [2].
- Changes During the Period: The statement then shows how each equity account changed during the period.
- For common shares, this may include increases from issuing new shares or decreases from repurchasing shares [3, 4].
- For retained earnings, this includes increases from net income, and decreases from dividends [3, 4].
- Ending Balance: The statement ends with the balance of each equity account at the end of the period [4].
- Key Accounts: The main equity accounts that are tracked are:
- Common shares [1, 3] (also called share capital [3]) which represents the basic ownership of the company [3].
- Retained earnings [1, 3] which represents the accumulated profits of the company that have not been distributed to shareholders [3].
- Preferred shares, which are a class of shares that have preferential rights over common shares, such as a preference for dividends [5].
- Dividends:
- Dividends represent the distribution of profits to shareholders [6].
- Cash dividends reduce retained earnings and shareholders’ equity [3].
- A stock dividend involves issuing new shares to existing shareholders [7]. This does not affect the total value of shareholders’ equity [8].
- Format:
- The statement includes a three-line title: company name, the name of the statement, and the date [2].
- The date specifies the period the statement covers (e.g., “for the year ended”) [2].
- Each equity account is listed as a column heading [2].
- Dollar signs are placed at the top of each column and beside any double-underlined numbers [4].
- Relationship to Other Statements:The net income from the income statement is used to calculate the change in retained earnings [4, 9].
- The ending balances of the equity accounts are carried over to the balance sheet [10].
- The changes in retained earnings shown on the statement of changes in equity are captured in the closing journal entries [9].
In summary, the statement of changes in equity provides a detailed view of how the owners’ stake in the company has changed over time, linking the income statement and the balance sheet [1].

Understanding the Balance Sheet

A balance sheet, also called the statement of financial position, is a financial statement that presents a company’s assets, liabilities, and shareholders’ equity at a specific point in time [1, 2]. The balance sheet is based on the fundamental accounting equation: Assets = Liabilities + Shareholders’ Equity [3].

Key aspects of the balance sheet, according to the sources:
- Purpose: To provide a snapshot of what a company owns (assets), what it owes (liabilities), and the owners’ stake in the company (equity) at a specific date. It shows the financial position of the company at that moment in time [2].
- Components:
- Assets: These are things a company owns or controls that have value [4, 5]. They are resources with future economic benefits [5]. Assets are listed in order of liquidity, from most to least liquid [6].
- Current assets are expected to be converted to cash or used up within one year [7]. Examples include cash, accounts receivable, inventory, and office supplies [5, 7, 8].
- Long-term assets, also called property, plant, and equipment (PP&E), are assets that are not expected to be converted to cash or used up within one year. Examples include buildings, land, and equipment [5].
- Assets are recorded at their net book value, which is the original cost minus any accumulated depreciation [9].
- Liabilities: These are obligations of the company to others, or debts that must be repaid in the future [10]. They represent future economic obligations [10]. Liabilities are also categorized as either current or long-term.
- Current liabilities are obligations due within one year [7]. Examples include accounts payable, wages payable, and notes payable [10].
- Long-term liabilities are obligations due in more than one year. Examples include bank loans and mortgages [10].
- Shareholders’ Equity: This represents the owners’ stake in the company, and is the residual interest in the assets of the company after deducting liabilities [3].
- Key accounts include common shares (or share capital) and retained earnings [11].
- Retained earnings are the accumulated profits that have not been distributed to shareholders [11].
- Format:
- The balance sheet has a three-line title: company name, the name of the statement, and the date [2].
- Unlike the income statement or statement of changes in equity, the balance sheet is dated for a specific point in time, not for a period (e.g., “December 31, 2024,” not “for the year ended”) [2].
- Assets are typically listed on the left side, and liabilities and shareholders’ equity are on the right side [6].
- Assets are listed in order of liquidity, from the most current to the least [6].
- Dollar signs are placed at the top of each column and beside any double-underlined numbers [12, 13].
- Relationship to other Statements:
- The ending balances of the equity accounts are taken from the statement of changes in equity [14].
- The balance sheet provides information for the statement of cash flows, particularly for noncash assets and liabilities [15].
- Balancing: The balance sheet must always balance, meaning that total assets must equal total liabilities plus total shareholders’ equity [1, 6].
In summary, the balance sheet provides a fundamental overview of a company’s financial position at a specific point in time, showing the resources it controls, its obligations, and the owners’ stake in the company [2].

Financial Ratio Analysis

Financial ratios are calculations that use data from financial statements to provide insights into a company’s performance and financial health [1]. They are used to analyze and compare a company’s performance over time or against its competitors [1-3].

Here’s a breakdown of key financial ratios discussed in the sources, categorized by the aspects of a company they assess:

I. Liquidity Ratios These ratios measure a company’s ability to meet its short-term obligations [4, 5].
- Current Ratio: Calculated as current assets divided by current liabilities [4, 6]. It indicates whether a company has enough short-term assets to cover its short-term debts [4, 6].
- A general rule of thumb is that a current ratio above 1.5 is considered safe [5]. However, this may not apply to all companies [5].
- A higher ratio generally indicates better liquidity [5].
- Asset Test Ratio (or Quick Ratio): Calculated as (cash + short-term investments + net current receivables) divided by current liabilities [7, 8]. This ratio is a stricter measure of liquidity, focusing on the most liquid assets.
- A general rule of thumb is that an asset test ratio of 0.9 to 1 is desirable [7].
- It excludes inventory and prepaid expenses from current assets [7, 8].
II. Turnover (Efficiency) Ratios These ratios measure how efficiently a company is using its assets [8].
- Inventory Turnover: Calculated as cost of goods sold (COGS) divided by average inventory [8]. It measures how many times a company sells and replaces its inventory during a period [8].
- A higher turnover indicates better efficiency [9].
- Receivables Turnover: Calculated as net sales divided by average net accounts receivable [9]. It measures how many times a company collects its average accounts receivable during a period [9].
- A higher turnover indicates a company is more effective in collecting its debts [9].
- Days to Collect Receivables: Calculated as 365 divided by receivables turnover [9]. It measures the average number of days it takes a company to collect payment from its customers [9].
- A lower number is generally better, as it indicates a company is collecting payments more quickly [9].
III. Long-Term Debt-Paying Ability Ratios These ratios assess a company’s ability to meet its long-term obligations and its leverage [9].
- Debt Ratio: Calculated as total liabilities divided by total assets [9]. It indicates the proportion of a company’s assets that are financed by debt [9].
- A lower debt ratio is generally considered safer, as it indicates less reliance on debt financing [9, 10].
- Times Interest Earned: Calculated as operating income divided by interest expense [10]. It measures a company’s ability to cover its interest expense with its operating income [10].
- A higher ratio indicates a greater ability to pay interest [10].
IV. Profitability Ratios These ratios measure a company’s ability to generate profits from its operations [10].
- Gross Profit Percentage: Calculated as gross profit divided by net sales [11]. It measures a company’s profitability after accounting for the cost of goods sold [11].
- A higher percentage indicates a better ability to generate profit from sales [11].
- Return on Sales: Calculated as net income divided by net sales [11]. It measures how much profit a company generates for each dollar of sales [11].
- A higher percentage indicates better profitability [11].
- Return on Assets (ROA): Calculated as (net income + interest expense) divided by average total assets [11]. It measures how effectively a company is using its assets to generate profit [11].
- A higher ROA indicates better asset utilization and profitability [12].
- Return on Equity (ROE): Calculated as (net income – preferred dividends) divided by average common shareholders’ equity [12]. It measures how much profit a company generates for each dollar of shareholders’ equity [12].
- A higher ROE indicates better returns for shareholders [12].
V. Stock Market Performance Ratios These ratios assess a company’s performance from the perspective of stock market investors [13].
- Price-Earnings Ratio (P/E Ratio): Calculated as market price per share divided by earnings per share [13]. It indicates how much investors are willing to pay for each dollar of a company’s earnings [13].
- A higher P/E ratio may indicate that a stock is overvalued [1, 13].
- Dividend Yield: Calculated as dividends per share divided by market price per share [13]. It indicates the percentage of the stock price that is returned to shareholders as dividends [13].
- A higher yield can be attractive to income-focused investors [13].
Additional Notes:
- Horizontal Analysis compares financial data over different time periods (e.g. year over year) [14].
- Vertical Analysis (or Common-Size Analysis) expresses each item in a financial statement as a percentage of a base number, such as net sales for the income statement or total assets for the balance sheet [3]. This helps compare companies of different sizes [3].
- When analyzing ratios, it is important to compare them to industry averages or to a company’s historical performance to assess if the ratio is considered good or bad [1, 2].
- It is important to note that a ratio may be interpreted differently depending on the company and industry [5, 10].
- Many companies will focus on gross profit percentages, and will be especially interested if costs of goods sold are outpacing sales, impacting margins [2].
- Analysts are typically interested in seeing positive and growing operating cash flows from the statement of cash flows [15].
- A company’s cash flow statement and ratios are often used to determine if the company has enough cash on hand to meet its short-term obligations [16].
Bank Reconciliation: A Comprehensive Guide

A bank reconciliation is a process that compares a company’s cash balance as per its own records (book balance) with the corresponding cash balance reported by its bank (bank balance) [1]. The goal is to identify and explain any differences between these two balances and to correct any errors or omissions [1].

Here are key points about bank reconciliations based on the sources:
- Purpose:
- To identify discrepancies between the bank’s record of cash and the company’s record of cash [1].
- To ensure that a company’s cash records are accurate and up to date.
- To identify errors made by either the company or the bank and make corrections to those errors [1, 2].
- To detect fraud or theft by identifying unauthorized transactions [1, 2].
- To provide better internal control of cash [1].
- Timing: Bank reconciliations are typically prepared monthly [1].
- Format:A bank reconciliation typically starts with the ending balance per bank statement and the ending balance per the company’s books [2].
- It includes adjustments to each of these balances to arrive at an adjusted or reconciled cash balance [2].
- The format of a bank reconciliation resembles a balance sheet, where the left side pertains to the bank’s perspective and the right side pertains to the company’s perspective [3].
- Items Causing Differences:Bank side adjustments: These are items that the bank knows about but the company does not know about until it receives the bank statement.
- Deposits in transit: Deposits made by the company but not yet recorded by the bank [3].
- Outstanding checks: Checks written by the company but not yet cashed by the recipients, and thus not yet deducted from the bank balance [3, 4].
- Book side adjustments: These are items that the company knows about, but that the bank doesn’t know about until it receives the company’s information [5].
- Non-sufficient funds (NSF) checks: Checks received from customers that have bounced due to insufficient funds in the customer’s account [6].
- Bank collections: Amounts collected by the bank on the company’s behalf, such as notes receivable [6].
- Electronic funds transfers (EFT): Payments or collections made electronically that may not yet be recorded by the company [6].
- Bank service charges: Fees charged by the bank [6].
- Interest earned: Interest credited to the company’s account by the bank [6].
- Errors: Mistakes in recording transactions by either the bank or the company [2].
- For example, the company may have recorded a check for an incorrect amount [2]. If a check was recorded for too much, cash needs to be debited by the difference, and vice versa [6, 7].
1. Steps in Preparing a Bank Reconciliation:Start with the ending cash balance per the bank statement and the ending cash balance per the company’s books [3].
2. Identify and list all the deposits in transit and outstanding checks, and make the necessary additions to or subtractions from the bank balance [3, 4].
3. Identify and list all items that need to be adjusted on the book side, such as NSF checks, bank collections, electronic funds transfers, bank service charges, and errors [5-7].
4. Make the necessary additions to or subtractions from the book balance [5-7].
5. Calculate the adjusted or reconciled cash balance on both the bank and book sides [5, 7]. These adjusted balances should be the same if the reconciliation is done correctly.
- Journal Entries:
- Journal entries are required for the adjustments made to the company’s book balance [7].
- These entries are made to correct the company’s cash account for items that the company did not know about, as well as any errors discovered during the bank reconciliation process.
- All of these entries will involve the cash account [7, 8].
In summary, a bank reconciliation is a critical control activity that ensures the accuracy of a company’s cash records. It involves comparing the bank’s records to the company’s records, identifying any discrepancies, and making necessary adjustments to both sets of records. The process helps maintain accurate financial statements and protect the company from errors and fraud [1].

By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog

Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
March 5, 2025
Al Riyadh Newspaper: March 04, 2025 Charitable Campaigns, Diverse Insights, Efforts to Mediate
The provided texts offer diverse insights into recent events and ongoing initiatives in Saudi Arabia and the wider world. Several articles focus on Saudi Arabia’s domestic affairs, highlighting cultural events, economic developments, and charitable campaigns, including those associated with Ramadan. A key focus is on Saudi Arabia’s foreign relations and its role in regional diplomacy, including the Lebanese president’s visit and efforts to mediate in the Russia-Ukraine conflict. Furthermore, some items discuss economic issues such as oil prices, stock market performance, and trade relations, both within the Kingdom and on a global scale. There is also coverage on events surrounding the Israeli-Palestinian conflict, and the impacts of geopolitical issues on oil prices. Finally, one article focuses on how brands such as Bata incorporate sustainable practices in order to foster a good public image.

Study Guide: Saudi Arabia in 2025 – A Deep Dive

Quiz (Short Answer)
1. According to the text, what are the main benefits of the agreement between the US and Ukraine on rare minerals?
2. What role does the Crown Prince of Saudi Arabia play in supporting charitable work, as mentioned in the text?
3. How does the Saudi Vision 2030 relate to the economic cooperation between Saudi Arabia and Lebanon, according to the text?
4. What are some of the consequences of Netanyahu’s actions in Gaza, according to the Israeli analyst Harel?
5. What is the significance of the “Year of Handicrafts 2025” initiative in Saudi Arabia, as presented in the text?
6. What is the “Markaz,” and what is its significance during Ramadan in Jeddah?
7. What is the Saudi Forum for Media intended to do for the Kingdom?
8. What are the two sides hoping to get out of the Saudi Pro League?
9. What is the main impact of the Russian-Ukrainian war for European allies?
10. According to the source, how does Bata integrate sustainable practices in its company culture?
Quiz Answer Key
1. The agreement allows the US to invest in rare minerals needed for industries like AI and renewable energy and provides Ukraine with security assurances against the war. It would give the US investment rights to 50% of Ukraine’s rare minerals.
2. The Crown Prince actively supports charitable initiatives with significant funding, demonstrating the Kingdom’s commitment to improving citizens’ lives and promoting social solidarity. He donated 150 million riyals.
3. Saudi Vision 2030 aligns with efforts to boost Saudi-Lebanese economic cooperation, particularly in sectors like technology and energy, fostering investment and development in Lebanon.
4. Harel suggests that Netanyahu’s policies lead to resumed conflict, increased Israeli casualties, and the potential sacrifice of hostages due to Hamas’ strengthened defensive capabilities.
5. It aims to boost the cultural tourism sector by promoting traditional crafts, supporting local artisans, and raising awareness of Saudi Arabia’s heritage, while integrating them to the new digital economy.
6. The “Markaz” is a traditional gathering place, especially in historic Jeddah neighborhoods, where people gather to socialize, share Ramadan drinks, and enjoy traditional sweets, strengthening social bonds during the holy month.
7. The Saudi Forum for Media intends to promote Saudi Arabia’s interests, enhance its positive image, and foster communication with global media organizations, highlighting the Kingdom’s progress and development in various sectors.
8. The teams hope to overcome challenges and push for victory and the Asian Championship as the teams work towards the elimination stage.
9. The Russian-Ukrainian war has encouraged allies like those in Europe to increase their defense spending in light of the dangers associated with war.
10. Bata strategically integrates sustainability by launching internal awareness campaigns, publishing reports on sustainability initiatives, participating in dialogues with stakeholders, and engaging with media to improve its sustainable initiatives.
Essay Format Questions
1. Analyze the evolving relationship between Saudi Arabia and Lebanon as depicted in the article, considering historical ties and future economic prospects.
2. Discuss the implications of the Israeli-Palestinian conflict, focusing on the role of international actors and the humanitarian consequences, as presented in the provided document.
3. Evaluate the significance of Saudi Arabia’s Vision 2030 in the context of its domestic social and economic reforms and its role in regional and international affairs.
4. Examine the impact of the Russian-Ukrainian war on global energy markets and the diplomatic efforts to resolve the conflict, as presented in the provided news excerpts.
5. Assess the role of media and cultural initiatives in promoting Saudi Arabia’s national identity and its engagement with the international community, using examples from the document.
Glossary of Key Terms
- ولي العهد (Wali al-Ahd): Crown Prince. The designated successor to the throne in a monarchy.
- روؤية 2030 (Ru’yah 2030): Vision 2030. Saudi Arabia’s strategic framework to reduce its dependence on oil, diversify its economy, and develop public service sectors.
- مبادرات (Mubadarat): Initiatives. New programs or plans, often related to development or reform.
- التحديات (Al-Tahadiyat): Challenges. Difficulties or obstacles that need to be overcome.
- السلام (As-Salam): Peace. The condition marked by the absence of war or conflict.
- القضايا المستدامة (Al-Qadaya al-Mustadama): Sustainable issues. Problems or challenges that deal with responsible resource management and environmental consciousness.
- المنظمة الصحية (Al-Munazzama as-Sihiyya): Health Organization. Typically refers to an entity focused on health, like a public health ministry.
- المسجد النبوي (Al-Masjid an-Nabawi): The Prophet’s Mosque. One of the holiest sites in Islam, located in Medina.
- القطاع الصحي (Al-Qita’ as-Sihi): The Health Sector. Encompasses all health-related services and institutions.
- الإعلام (Al-I’lam): Media. Various means of communication, such as newspapers, television, and the internet.
- الحرف اليدوية (Al-Hiraf al-Yadawiyya): Handicrafts. Objects made by hand, often representing traditional culture.
- دبلوماسية (Diblomasiya): Diplomacy. The art and practice of conducting negotiations between representatives of states.
- الإصلاحات (Al-Islahat): Reforms. Changes to improve a system or institution.
Al Riyadh Newspaper Analysis: Themes and Key Ideas

Okay, here’s a briefing document based on the provided Arabic text excerpts.

Briefing Document: Analysis of Themes and Key Ideas

Date: October 26, 2023

Source: Excerpts from “20705.pdf,” a daily newspaper published by Al Yamamah Press Foundation (Issue: 20705, dated March 4, 2025)

Executive Summary:

This document summarizes the main themes and key ideas found in the provided excerpts from the Al Riyadh newspaper. The articles cover a range of topics, including Saudi Arabia’s charitable initiatives, diplomatic relations with Lebanon and Bulgaria, internal matters such as the appointment of new governors, global economics (US-China Trade), and the Russian-Ukraine war, among others. The texts highlight Saudi Arabia’s commitment to domestic and international development, humanitarian aid, and regional stability, while also addressing pressing global issues and their impact on the Kingdom.

Key Themes and Ideas:
1. Saudi Arabia’s Commitment to Charity and Humanitarian Aid:
- The excerpts emphasize the Kingdom’s dedication to charitable work, both domestically and internationally. This is portrayed as a long-standing tradition, particularly amplified during Ramadan.
- Quote: “على اململكة اعتادت نوعية مبادرات واالرتقاء براجمه وتفعيل اخلري، صناعة صدام، ومنهج ممُ بشكل مشروعًا أصول بثمراته إلى أكر عدد من للو الرامج هذه كانت وإذا استفيدين، المل العام، شهور طيلة متواصلة والمشروعات فهي تكون مكثفة خالل شهر مصان المبارك، والأفراد الدولة مؤسسات تتنافس حيث الثواب كسب يف طمعًا اخلري، عمل يف من عند الله، بتشجيع من والة الأمر الذين ال يدخرون جهدًا يف دعم كل املبادرات” (Translation: “The Kingdom is accustomed to quality initiatives, upgrading its programs, and activating charity, creating an impact, and a systematic approach in its projects to deliver its fruits to the largest number of beneficiaries. These programs have been continuous throughout the year, and projects are intensified during the blessed month of Ramadan. Individuals and state institutions compete, where rewards are earned in the pursuit of good deeds, inspired by God, and encouraged by the rulers who spare no effort in supporting all initiatives.”)
- Specific initiatives, such as the King Salman Center for Relief and Humanitarian Aid, are highlighted as models for effective aid delivery.
- Quote: “متثل المناطق جود حملة كانت وإذا صناعة اخلري داخل مناطق ل من نموذجًا للإغاثة، سلمان امللك مركز يبقى اململكة، النموذج املتنوعة ومشروعاته براجمه الأبرز لصناعة اخلري خارج الحدود، ما ابتكار يف اململكة ريادة للجميع يؤكد النبيلة انسانية املبادرات وتنفيذ” (Translation: “If the ‘Joud’ campaign was a good example, then making goodness inside areas for example, the King Salman Center remains the Kingdom’s model for relief, a diverse model with programs and projects being the most prominent for making goodness outside the borders, any innovation in the Kingdom emphasizes its leadership for all noble human initiatives and implementation.”)
- The “Joud” campaign is mentioned as a model, along with King Salman Center which also operates beyond the Kingdom’s borders.
- Prince Faisal bin Salman launches “Ajer Ghair Mamnoun” (“Unrewarded Reward”) campaign to support charitable giving, and “Al Shefa” health waqf fund.
1. Saudi-Lebanese Relations:
- The visit of the Lebanese President to Saudi Arabia is framed as a turning point in relations, marking a new phase of cooperation after a period of challenges.
- Quote: “لبنان بن التاريخية الروابط عمق تعكس خطوة يف واململكة العربية السعودية، حلَّ الرئيس اللبناني جوزيف عون صيفًا على الرياض تلبية لدعوة كرمية من صاحب السمو الملكي الأمير محمد بن سلمان، ويل العهد رئيس مجلس الوزراء. هذه الزيارة متثل نقطة حتول يف مسار العديد شهدت حلقبة حدًا وتضع البلدين، بن العالقات التعاون من جديدة مرحلة ببداية إيذانًا التحديات، من” (Translation: “A step that reflects the depth of the historical ties between Lebanon and the Kingdom of Saudi Arabia, the Lebanese President Joseph Aoun arrived in Riyadh in response to a generous invitation from His Royal Highness Prince Mohammed bin Salman, Crown Prince and Prime Minister. This visit represents a turning point in the course of relations between the two countries, setting a limit to an era that witnessed many challenges, marking the beginning of a new phase of cooperation.”)
- Saudi Arabia’s “Vision 2030” is mentioned as an opportunity for Lebanon, and the Crown Prince’s efforts to support Lebanese stability are highlighted.
- The historical support of Saudi Arabia for Lebanon, dating back to the reign of King Abdulaziz Al Saud and its role in the Taif Agreement, are also referenced.
- The visit renewed hopes for economic ties, particularly in agricultural, industrial, and electronic products.
- Investment opportunities are opening in technology, energy aligning with Vision 2030
1. Saudi-Bulgarian Relations:
- The Saudi leadership sent congratulatory messages to the President of Bulgaria on the occasion of the country’s National Day, indicating positive diplomatic engagement.
1. Local events:
- Princess Fahda bint Falah Al Hithlain sponsored the awarding of the King Salman Prize for memorizing the Quran
- Prince Faisal bin Mishaal visits judges and sheikhs
1. Israeli-Palestinian Conflict:
- The article cites an Israeli analyst who criticizes Netanyahu’s government for taking “adventurous” steps in the Palestinian territories, Syria, and Gaza.
- Quote: “احلكومة رئيس خطوات عواقب من إسرائيليون، حمللون حذر الإسرائيلية، بنيامن نتنياهو، يف الأيام الأخرية، بدعم من إدارة الرئيس الأمريكي، دونالد ترمب، وأشاروا إلى اأن مواقف ترمب احلالية ميكن اأن تتغري وفقا ملصالحه.” (Translation: “Israeli analysts warned of the consequences of the steps of the head of the Israeli government, Benjamin Netanyahu, in recent days, with the support of the administration of US President Donald Trump, and pointed out that Trump’s current positions may change according to his interests.”)
- The analyst alleges violations of agreements, expansion of settlements, and restrictions on humanitarian aid to Gaza.
- The article touches on potential international criticism and accusations of “deliberate starvation” against Israel.
- The capture of 187 Palestinians
1. Global Economic Issues:
- The article addresses the potential impact of a trade war between the US and China, as well as the US with Canada and Mexico.
- Quote: “الأمريكي الرئيس وإصرار اوكرانيا تواجهها التي املصاعب تظهر حجم عليها، وإن الإدارة الأمريكية ترى اأن كييف مدينة لها بهذه التفاقية مقابل” (Translation: “And given the insistence of the American president, the extent of the difficulties Ukraine is facing is evident, and the American administration sees that Kiev is indebted to it by this agreement in return for what was presented for American financial support.”)
- The potential for a “minerals agreement” with Ukraine is discussed in the context of the Russia-Ukraine war and US support.
- The article touches on the need for international partnerships and cooperation in the face of complex economic challenges.
1. Russia-Ukraine War:
- The article discusses French President Macron’s proposal for a truce in Ukraine during the Olympics, focusing on energy infrastructure.
- Concerns about the war’s impact on global oil supplies and prices are mentioned.
- There is an analysis of the conflict’s impact on the world economy.
- Reported that Zelensky offered his resignation in exchange for NATO membership.
1. Oil Market Dynamics:
- The impact of the Trump-Zelensky “shouting match” on the global oil market.
- Reports on attacks on Russian refineries impacting exports.
1. Financial Market Activity:
- Increase in Gold prices as Safe Haven
- Increase in Saudi stock market (TASI)
1. Cultural Initiatives:
- Three films supported by the Red Sea Film Foundation win awards at the Berlin Film Festival
- The “Abu Samel” family returns in “Jack Al-Alam 2”
- Highlight on the “Year of Handicrafts 2025”
- Importance of traditional medicine
1. Other domestic events:
- Arar’s Ramadan traditions
- The inauguration of three free bus stations
1. Sports:
- Al-Nassr ties to Al-Istiqlal
- Al-Ahli faces Al-Rayyan
- Al-Hilal faces Pakhtakor
- Al-Ittihad fails to beat Al-Akhdoud
- The Saudi national weightlifting team travels to Turkey for preparations
Quotes and specific article titles:
- “The Kingdom and Lebanon Close the Page on ‘Challenges’”
- “Saudi Arabia and Lebanon are turning a new page on economic relations. 100,000 Lebanese residents welcome the return of momentum to relations between the two countries”
- “Israeli Analyst: Netanyahu’s Government Behaves Adventurously on All Fronts”
- “Clash between Trump and Zelensky.. Disrupts Oil Markets”
- “Russian-Ukrainian War and the Expected Riyadh Summit”
- “Metals of Ukraine… and the American-Chinese Separation”
- “‘Deaf with Health’… Awareness Efforts for Quality of Life”
- “90% of residents of the Jenin camp displaced”
Potential Implications:
- The Saudi focus on charity and aid reinforces its image as a responsible global actor and leader in the Islamic world.
- Improved Saudi-Lebanese relations could lead to increased economic cooperation and regional stability.
- The concerns raised about Israeli policies may reflect a desire for a more balanced approach to the Israeli-Palestinian conflict.
- Economic analysis suggests a cautious approach to global trade tensions, with a focus on diversification and partnerships.
- Coverage of the Russia-Ukraine war highlights the need for diplomatic solutions and mitigation of economic consequences.
Further Research:
- Investigate the specific details of the “Joud” campaign and other Saudi charitable initiatives.
- Analyze the economic impact of renewed Saudi-Lebanese cooperation.
- Examine Saudi Arabia’s position on the Israeli-Palestinian conflict in greater depth.
- Assess the potential consequences of a US-China trade war on the Saudi economy.
I hope this briefing document is helpful.

Global Affairs and Saudi Arabia’s Initiatives

What is the significance of the Saudi campaign to promote good deeds during Ramadan?

The campaign, supported by Saudi leadership, encourages charitable acts, highlights Islamic values, and fosters social solidarity. It provides support to citizens in need and aims to ensure adequate housing and promote unity. It emphasizes innovative initiatives, and aims to serve as an example for global relief efforts, reinforcing Saudi Arabia’s leadership in noble endeavors.

What does the visit of the Lebanese President to Saudi Arabia signify?

The visit signifies a renewal of economic ties and reflects the deep historical relations between Lebanon and Saudi Arabia. It marks a turning point in the relationship and the beginning of a new phase of cooperation, with the Saudi Vision 2030 offering opportunities for Lebanon’s development. This also hopes to boost financial stability between the two countries.

How is the Israeli government behaving, according to analysts, and what are the potential consequences?

According to Israeli analysts, the Netanyahu government is acting recklessly on all fronts, with the support of the U.S. administration. This includes violating agreements, seizing Syrian territory, threatening intervention in Syria, and restricting aid to Gaza. These actions risk reigniting conflict and sacrificing the well-being of hostages, as well as potentially further destabilizing the region.

What is the “Ajer Ghair Mamnoon” campaign and what are its goals?

The “Ajer Ghair Mamnoon” campaign, launched by Prince Faisal bin Salman, aims to promote charitable giving during Ramadan. It encourages individuals, organizations, and donors to contribute to the Waqf Fund, which supports healthcare initiatives and provides assistance to beneficiaries in Medina and Mecca. The campaign reflects Islamic values and fosters social cohesion.

What are the economic implications of the tension between President Trump and President Zelensky?

The tension between Presidents Trump and Zelensky has broader economic implications, potentially disrupting oil markets and global trade. Trump’s trade policies, including tariffs on goods from China, Mexico, and Canada, could harm the American economy and lead to increased inflation. The article also mentions the importance of stable relationships with oil exporting countries like Russia and Iraq.

How are the arts and cultural programs in Saudi Arabia being promoted?

Saudi Arabia actively promotes arts and culture through initiatives like the Red Sea International Film Festival, which supports local filmmakers and showcases Saudi talent on the global stage. Additionally, the Ministry of Culture’s initiative to recognize 2025 as the “Year of Handicrafts” aims to preserve and promote traditional crafts as a vital part of Saudi cultural heritage and tourism.

How has the conflict between Russia and Ukraine impacted global oil prices and what factors might contribute to price stabilization?

The Russia-Ukraine war disrupted global oil supplies, leading to price volatility. Attacks on Russian refineries further exacerbated concerns about exports of refined products. However, potential factors that could stabilize prices include increased oil production by OPEC+, a potential peace agreement between Russia and Ukraine, and increased U.S. pressure on Iraq to resume exports from the Kurdistan region.

What are the diplomatic efforts aiming to address the Russia-Ukraine conflict, and what are the challenges involved?

Diplomatic efforts include proposals for ceasefires during specific periods, but challenges persist, including Ukraine’s desire to regain territory and concerns over Russia’s territorial control. Negotiations are underway, with the United States playing a key role, but reaching a resolution that satisfies all parties remains difficult. The importance of effective diplomacy to mitigate conflict and promote sustainable solutions is emphasized.

Saudi Arabia: Culture, Diplomacy, and Humanitarian Efforts

Saudi Arabia’s internal and external efforts, plus some traditions, are mentioned throughout the sources:
- Leadership & Philanthropy: Saudi Arabia is recognized for initiating programs, improving them, and activating charitable projects to benefit a large number of people. The state’s institutions compete in doing good, encouraged by the government.
- Humanitarian Aid: The Kingdom has several humanitarian initiatives. The King Salman Center serves as a model for relief efforts inside the Kingdom. It is also considered a leading example for charitable work outside its borders through its diverse programs and projects.
- Relationship with Lebanon: Saudi Arabia plays a pivotal role in supporting Lebanon, with deep-rooted historical ties dating back to the era of King Abdulaziz Al Saud. The Kingdom is portrayed not just as an economic gateway but also as a political partner to Lebanon. The Saudi market is a main source of Lebanese exports.
- Cultural and Religious Significance: مكة (Mecca) is recognized as a central religious hub for Muslims. مدنة (Medina) is a destination for pilgrims. The country emphasizes the values of Islamic authenticity, societal cohesion, and sustained giving.
- Economic Development: Saudi Arabia aims to achieve sustainable development goals by fostering a conducive environment for all citizens.
- Modernization & Vision 2030: The Kingdom’s Vision 2030 aims to boost high-quality linguistic initiatives, strengthen identity, and enrich Arabic content.
- ** رمضان (Ramadan) Celebrations & Traditions:**
- Many people decorate the facades of houses in Jeddah with illuminated lanterns and Ramadan decorations that reflect the spiritual atmosphere of the month of Ramadan.
- In Jazan, presenting ” الماء المبخر” ( Mabkhar water) is a tradition that symbolizes hospitality and generosity.
- In the northern region, نق�ش الحناء (Henna نقش ) is used to encourage young girls to fast.
- Global Diplomacy: Saudi Arabia is emerging as a crucial player in the changing geopolitical landscape, tackling challenges through dialogue. The upcoming summit hosted by Saudi Arabia, with the participation of the Crown Prince, is portrayed as a vital opportunity to stop losses and work towards a fair and lasting peace.
Russia-Ukraine War: Negotiations, Global Impact, and Key Players

Here’s what the sources say about the Russia-Ukraine war:
- Saudi Arabia’s Role: Saudi Arabia is seen as a key regional player in resolving the Russia-Ukraine war.
- US-Ukraine Relations: The US administration sees an agreement with Ukraine as a way to address challenges. However, conflicting reports suggest that the US president and the President of Ukraine had a heated meeting, and that the Ukrainian President left without an anticipated agreement concerning sharing rights to Ukrainian metals.
- Negotiations & Peace Talks:European leaders are proposing a month-long truce in Ukraine.
- There are increasing doubts about a US-brokered peace agreement between Russia and Ukraine.
- The Ukrainian President’s advisor was critical of the US for trying to end the war while, in their view, the Ukrainian President has different goals.
- Global Impact:The potential disruption of a peace agreement between Russia and Ukraine is causing instability.
- The war is contributing to uncertainty and fluctuations in global markets, including oil prices.
- The conflict is a significant concern for the United States and European countries.
- Turkey’s Role:Turkey had halted a pipeline in March 2023 and is ready to resume operations that carry oil exports from the region of Kurdistan.
Saudi Football League: Competition, Teams, and Players

Here’s what the sources say about football leagues:
- Saudi League Importance The Saudi League garners significant attention from followers and football enthusiasts.
- Competition and Excitement The presence of approximately eight teams vying to avoid relegation to the First Division enhances the competitiveness and excitement of the league matches. These teams strive to win to secure their position among the top teams and remain in what is considered one of the greatest Arab leagues.
- Increased Competition The current situation in the league makes the competition stronger among all competing teams.
- Team Efforts and Fan Expectations Team coaches are doing everything they can, but more effort is needed to realize the dreams of sports fans.
- Potential for Upsets There is an acknowledgement that the final weeks of the league could be unpredictable, with potential shifts in team positions.
- Al-Nassr’s Position: If Al-Nassr does not improve to the top position, their current position is threatened. There is a mention of the team being affected by arrogance and poor luck.
- Al-Ahli’s Performance: If Al-Ahli had been in good form from the start, they might have been a strong contender for the title.
- Al-Hilal’s Performance: Al-Hilal is described as facing pressure with potential injuries, absences and exhaustion affecting the team.
- Saudi Teams in the AFC Champions League:Al-Nassr tied a game against Esteghlal of Iran in the AFC Champions League.
- Al-Ahli is preparing to play Al Rayyan of Qatar in the AFC Champions League.
- Al-Hilal is set to face Pakhtakor in the AFC Champions League.
- AFC Champions League Details:The final stages of the AFC Champions League Elite will be held in Jeddah, Saudi Arabia.
- Matches are significant in the Saudi League.
- Player Spotlight:Sami Al-Khabrani, despite being a distinguished player, has not been selected for the national team, prompting questions about the selection criteria.
- There is hope that Al-Khabrani will get an opportunity to prove himself.
- Salem Al-Dawsari shined in the league stage, tying for the top scorer position.
- Riyad Mahrez is recognized as a standout player in Al-Ahli.
Ramadan Traditions, Preparations, and Health Initiatives

Here’s what the sources say about Ramadan events:
- General Atmosphere: Ramadan is characterized by a spiritual atmosphere.
- Traditions: There are several traditions associated with welcoming Ramadan:
- Decorating homes with lights and Ramadan ornaments in areas like Jeddah.
- Presenting ” الماء المبخر” (Mabkhar water) in the Jazan region, as a symbol of hospitality and generosity.
- نق�ش الحناء (Henna نقش) is used to encourage young girls to fast in the northern region.
- Efforts to help people observe Ramadan:
- “حافلات المدينة” (Hafilat Al-Madinah) announced the development of 3 free subsidiary stations to facilitate access to the Prophet’s Mosque.
- قطار الحرمين (Haramain Train) is raising its operating capacity to 1.6 million seats.
- Health Initiatives: There are efforts to promote healthy habits during Ramadan. A campaign titled “صم بصحة” (Seh biseha) aims to promote a healthy lifestyle through Ramadan. It includes awareness of healthy eating, hydration, physical activity, and consultation with doctors to control chronic diseases.
Media’s Influence: Shaping Opinion, Policy, and Global Diplomacy

The sources discuss media power in the context of diplomacy, public perception, and cultural influence:
- Influence on Public Opinion: The media has become a powerful force, capable of shaping public opinion, influencing policies, and affecting countries. The media is not just a means of conveying news, but a tool for directing and reshaping opinion, impacting policies, and influencing countries.
- Media as a Battleground: The presence of journalists can be like a battle, especially when public statements are used to create doubt about something.
- Impact on Political Leaders: The media can affect the standing of a political leader, influence public opinion, and even save or hurt them. The coverage can influence domestic and foreign public opinion.
- Agenda Setting: Governments and leaders use the media to promote their agendas.
- American Media’s Influence: The American media is a political and economic force that extends its influence beyond the United States. America uses its media as a tool to send specific messages to countries, using news channels and newspapers to shape how the global audience views events.
- Examples of Media Influence: The meeting between President Trump and Ukrainian President Zelensky revealed the media’s role in shaping political discourse. The media can turn an event into a political tool and raise questions about the importance and danger of media on the international stage.
- Need for Media Awareness: Because of the power of media, there is a need to be aware of its influence. The modern media is a force that can build or destroy alliances and promote or undermine leaders.
- Sports Media: Media related to sports receives great attention from followers and those interested in the sport.
- Communication strategies: Effective communication strategies include conveying specific messages, promoting interaction with the public, and building trust and transparency.
- Cultural Dissemination: The “Literary Partner” initiative uses cafes to spread culture and literature, raising cultural awareness. The initiative contributes to opening new channels of communication between authors and society through the cultural sector.
By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog

Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
March 4, 2025
True Crime: British Killers – A Prequel by Jason Neal
This excerpt from True Crime: British Killers – A Prequel: Six Disturbing Stories of Some of the UK’s Most Brutal Killers explores the lives and crimes of several notorious British murderers. The book presents detailed accounts of each killer’s background, motives, and methods, and details the investigations and trials. Among those profiled are Anthony Hardy, known as the Camden Ripper, Peter Bryan, the London Cannibal, John George Haigh, known as the Acid Bath Killer, Dena Thompson, the Black Widow, and Levi Bellfield, the Bus Stop Killer, and Steven Wright, the Suffolk Strangler. The text also examines the impact of the crimes on the victims, their families, and society, including potential healthcare failures.

True Crime: British Killers – A Prequel Study Guide

Key Figures and Cases

Chapter 1: The Camden Ripper (Anthony Hardy)
- Anthony Hardy: The “Camden Ripper,” a British serial killer who murdered three women in Camden, London. He was obsessed with Jack the Ripper and struggled with mental illness and violent tendencies.
- Sally Rose White: A developmentally challenged prostitute murdered by Anthony Hardy.
- Elizabeth Valad & Brigette MacClennan: The other two women murdered by Anthony Hardy, whose body parts were found in garbage bags near Hardy’s flat.
- Freddy Patel: The pathologist who initially determined that Sally Rose White died of natural causes.
Chapter 2: The London Cannibal (Peter Bryan)
- Peter Bryan: The “London Cannibal,” a man with a history of mental illness who killed three people and engaged in cannibalism.
- Nisha Sheth: A shop assistant Bryan murdered after she rejected his advances.
- Brian Cherry: A friend of Bryan’s whom he murdered and cannibalized after being transferred to a low-support accommodation.
- Richard Loudwell: An inmate at Broadmoor who was strangled by Bryan.
- Giles Forrester: During the trial, this judge stated, “You killed on these last two occasions because it gave you a thrill and a feeling of power when you ate flesh.”
Chapter 3: The Acid Bath Killer (John George Haigh)
- John George Haigh: Known as the “Acid Bath Killer,” he murdered multiple people and dissolved their bodies in sulfuric acid.
- William McSwan: Haigh’s first victim, whom he killed for financial gain.
- Amy & Donald McSwan: William’s parents, who were also murdered by Haigh.
- Archibald and Rose Henderson: Another wealthy couple murdered by Haigh.
- Olivia Durand-Deacon: A wealthy widow and Haigh’s final victim.
- Dr. Keith Simpson: The forensic pathologist who found traces of Mrs. Durand-Deacon, leading to Haigh’s arrest.
Chapter 4: The Black Widow (Dena Thompson)
- Dena Thompson: A con artist and attempted murderer known as the “Black Widow” for her manipulative relationships and schemes.
- Lee Wyatt: Dena’s first husband, whom she defrauded and falsely accused.
- Julian Webb: Dena’s second husband, whom she murdered with an overdose of drugs.
- Robert Waite: A lover of Dena Thompson who was drugged while on vacation with her.
- Richard Thompson: Dena’s third husband, whom she attempted to murder with a baseball bat.
- Stoyan Kostavj: A Bulgarian native who was in a relationship with Dena Thompson and has been reported missing.
Chapter 5: The Bus Stop Killer (Levi Bellfield)
- Levi Bellfield: The “Bus Stop Killer,” convicted of murdering Milly Dowler, Marsha McDonnell, and Amélie Delagrange, and attempting to murder Kate Sheedy.
- Milly Dowler: A thirteen-year-old girl who was abducted and murdered by Bellfield.
- Marsha McDonnell: A nineteen-year-old woman murdered by Bellfield.
- Amélie Delagrange: A twenty-two-year-old French student murdered by Bellfield.
- Kate Sheedy: A young woman who survived an attempted murder by Bellfield.
- Johanna Collins: Bellfield’s ex-partner who provided crucial information to the police.
- Yusuf Rahim: Levi Bellfield’s name after converting to Islam.
Chapter 7: A Tragic December (Vincent Tabak)
- Joanna Yeates: A landscape architect murdered by her neighbor, Vincent Tabak.
- Greg Reardon: Joanna Yeates’ boyfriend.
- Christopher Jeffries: Joanna Yeates’ landlord, who was initially vilified by the media.
- Vincent Tabak: Joanna Yeates’ murderer, who initially claimed the death was accidental.
Quiz
1. What was Anthony Hardy’s initial job after graduating from Imperial College London, and how did his career progress before his life spiraled downwards?
2. Describe the events that led to Anthony Hardy’s arrest for the murder of Sally Rose White, focusing on the key pieces of evidence and his initial explanations to the police.
3. Explain the circumstances surrounding Peter Bryan’s first murder and why he was originally charged with manslaughter on the grounds of diminished responsibility.
4. What was Peter Bryan’s defense to the killing of Joanna Yeates?
5. Describe John George Haigh’s method for disposing of bodies and his reasoning behind this approach.
6. Explain how John George Haigh was ultimately caught, despite his efforts to destroy all evidence.
7. Describe Dena Thompson’s elaborate scheme to convince her first husband, Lee Wyatt, that he needed to go into hiding.
8. Explain how Julian Webb died.
9. Describe the key evidence that linked Levi Bellfield to the murder of Milly Dowler.
10. Why was Christopher Jeffries initially suspected of the death of Joanna Yeates, and what details were the media focusing on?
Quiz Answer Key
1. Anthony Hardy landed a high-paying job with British Sugar and quickly moved up the corporate ranks. However, a severe economic downturn in the mid-1970s led to him losing his job and suffering from depression, ultimately leading to deviant behavior.
2. After a dispute with his upstairs neighbor, Hardy vandalized her door with graffiti and battery acid, leaving a trail of footprints that led back to his flat. When police searched his apartment, they found the naked body of Sally Rose White in a locked bedroom; Hardy claimed it was his roommate’s room, but police found the key in his coat pocket.
3. Peter Bryan murdered Nisha Sheth after she rejected his advances and he was fired from his job due to theft. He struck her repeatedly with a claw hammer. He pleaded guilty to manslaughter on the grounds of diminished responsibility and was sentenced to a psychiatric unit.
4. Vincent Tabak claimed that he was waving back at Joanna Yeates when she came to her kitchen window. He said that he went inside to chat with her, and when he tried to kiss her, she screamed, so he put his hands around her throat. He said it was not premeditated.
5. John George Haigh dissolved his victims’ bodies in sulfuric acid, believing that if there was no body, there could be no murder conviction. He gained access to sulfuric acid while working in the tinsmith factory in Lincoln Prison.
6. Despite Haigh’s efforts to destroy all evidence, police found traces of Mrs. Durand-Deacon’s sludge in the yard, including gallstones and false teeth. Additionally, bloodstains were found inside the workshop, leading to his arrest.
7. Dena Thompson concocted an elaborate story involving a deal with Disney that supposedly went wrong and involved the mafia. She convinced Lee that the mafia was after him and would eliminate him, so he needed to go into hiding to protect himself.
8. Julian Webb died from an overdose of dothiepin, an anti-depressant, and aspirin in his curry. Dena Thompson spiked the curry with a massive dose of the drugs. The coroner recorded an “Open Verdict” because there was insufficient evidence that Julian died of suicide.
9. The only evidence they had on the murder was the CCTV footage of the red Daewoo Nexia pulling out of Collingwood Place just ten minutes after Milly was last seen. When police realized that Bellfield’s wife owned the Daewoo Nexia, it was clear that he was responsible for that murder as well.
10. Christopher Jeffries was vilified by the tabloid press because he was the landlord of the building. Since there was no sign of forced entry, investigators believed that Joanna had been murdered by someone she knew or someone that had access to the flat; Jeffries had access to the flat.
Essay Questions
1. Discuss the role of mental illness in the cases of Anthony Hardy and Peter Bryan. To what extent did their mental states contribute to their crimes, and how did the legal system address this factor?
2. Compare and contrast the methods used by John George Haigh and Anthony Hardy to attempt to evade detection. What made Haigh’s plan ultimately fail, and what similarities can be drawn between the two cases?
3. Analyze the character of Dena Thompson. What were her primary motivations, and how did she exploit the vulnerabilities of others to achieve her goals?
4. Examine the police investigation of Levi Bellfield. How did they eventually link him to the murders of Milly Dowler, Marsha McDonnell, and Amélie Delagrange, and what role did CCTV footage play in the investigation?
5. Critically evaluate the media coverage of the Joanna Yeates case, focusing on the initial portrayal of Christopher Jeffries. How did the media contribute to public perception, and what were the consequences of their reporting?
Glossary of Key Terms
- Serial Killer: An individual who murders three or more people over a period of more than 30 days, with a “cooling off” period between each murder, and whose motives are often psychological.
- Postmortem Examination (Autopsy): A surgical procedure consisting of a thorough examination of a corpse to determine the cause and manner of death and to evaluate any disease or injury that may be present.
- CCTV: Closed-circuit television, a television system in which signals are not publicly distributed but are monitored, primarily for surveillance and security purposes.
- Forensic Science: The application of scientific methods and techniques to matters of law and criminal justice.
- Sulfuric Acid: A highly corrosive strong mineral acid with the molecular formula H2SO4; John George Haigh used this to dissolve the bodies of his victims.
- Diminished Responsibility: A legal defense that argues a defendant’s mental capacity was impaired, reducing the severity of the charge.
- Red-Light District: A specific area in a city where prostitution and other sexual activities are concentrated.
- Luminol: A chemical that exhibits chemiluminescence, with a striking blue glow, when mixed with an oxidizing agent. It is used by forensic investigators to detect traces of blood, even if it has been cleaned or removed.
- Curfew: A regulation requiring people to remain indoors between specified hours, typically at night.
- Parole: The release of a prisoner temporarily (for a special purpose) or permanently before the completion of their sentence, on the promise of good behavior.
True Crime: British Killers – A Prequel: Six Disturbing Stories

Okay, here is a briefing document summarizing the main themes and key details from the provided excerpts from “True Crime: British Killers – A Prequel: Six Disturbing Stories…”.

Briefing Document: “True Crime: British Killers – A Prequel”

Overall Theme: The book appears to be a collection of true crime stories focusing on various British serial killers and other criminals, exploring their backgrounds, crimes, and the investigations that led to their capture or conviction. It also touches upon the failures and shortcomings of healthcare and justice systems.

Chapter 1: The Camden Ripper (Anthony Hardy)
- Background: Tony Hardy, born in 1952, grew up in a lower-middle-class family. He was driven by a desire for greatness and saw himself as intellectually superior. He attended Imperial College London and eventually became a mechanical engineer.
- Obsession and Decline: He developed an obsession with Jack the Ripper, admiring his ability to evade police. His marriage deteriorated due to his extreme sexual desires, and he suffered a severe economic downturn, which led to depression and violent outbursts. He was diagnosed as bipolar.
- Criminal Behavior: He attempted to murder his wife but was only charged with domestic violence and spent time in a mental hospital. After release, he stalked his ex-wife and hired prostitutes, eventually killing one (Sally Rose White). He was also found guilty of the murders of Elizabeth Valad and Brigitte MacClennan.
- Key Points: Hardy believed he was too intelligent to be caught, mirroring his fascination with Jack the Ripper. Despite his mental illness, he was deemed fit for release from a mental hospital, only to commit murder shortly after.
- Quote: A friend recounted, “Anthony was obsessed with serial killers and we talked about them on several occasions. We had long discussions about Jack the Ripper, and Anthony thought he had a brilliant mind. He reckoned Jack the Ripper was a very clever bloke because he murdered all those prostitutes and never got caught.”
- Forensic Issues: Despite the bizarre staging of Sally Rose White’s body, the initial postmortem examination ruled that she died of natural causes. This highlights potential issues with the initial investigation.
- Outcome: Hardy received three life sentences and was given a whole life tariff in 2012, meaning he will never be released from prison.
Chapter 2: The London Cannibal (Peter Bryan)
- Background: Peter Bryan had a troubled upbringing.
- Crimes: He committed manslaughter and was sent to a psychiatric unit. Eventually, he was moved to a low-security facility and allowed to leave the building unsupervised. He murdered Brian Cherry, dismembering his body and reportedly eating parts of it. He also strangled Richard Loudwell at Broadmoor.
- Key Points: Bryan’s case exemplifies failures in the mental healthcare system. Despite a history of violence and mental health issues, he was repeatedly moved to less secure facilities and given unsupervised access to the community.
- Quote: Bryan said, “I ate his brain with butter. It was really nice.” This shows a lack of remorse and demonstrates his disturbing actions.
- Failures in the System: Reports from the National Health Services point to extreme failures in the healthcare system at every level.
- Outcome: Bryan was sentenced to two life terms and is unlikely to ever be released.
Chapter 3: The Acid Bath Killer (John George Haigh)
- Background: John George Haigh had a strict upbringing and was drawn to crime early on.
- Crimes: He murdered William McSwan, Amy and Donald McSwan and disposed of their bodies using sulfuric acid to fully dissolve the body. Then he murdered Archibald and Rose Henderson and Olivia Durand-Deacon, again attempting to dissolve their bodies in acid.
- Key Points: Haigh believed that if there was no body, there could be no murder conviction.
- Quote: Haigh said, “Mrs. Durand-Deacon no longer exists. I have destroyed her with acid. You will find the sludge which remains at Leopold Road. Every trace of her body has gone. How can you prove a murder if there is no body?”
- Forensic Triumph: Haigh was mistaken, and the police were able to convict Haigh using traces of the victims found in the sludge that remained.
- Outcome: Haigh was found guilty of the murder of Mrs. Durand-Deacon and was hanged at Wandsworth prison.
Chapter 4: The Black Widow (Dena Thompson)
- Deception: Dena Thompson manipulated and deceived multiple men for financial gain. She defrauded her first husband Lee Wyatt, and she poisoned her second husband, Julian Webb.
- Crimes: She was found guilty and sentenced to life in prison with a minimum sentence of sixteen years for the murder of her second husband. She attempted to murder her third husband but was acquitted of the attempted murder charges.
- Parole: After Dena Thompson’s conviction, investigators teamed with Interpol to look at all of her past lovers. She was granted parole and subsequently released from prison.
- Quote: Her third husband said upon news of her parole, “She definitely tried to kill me, and they proved that she murdered her second husband. She would have been a serial killer if she had been successful. God knows what else she has done.”
Chapter 5: The Bus Stop Killer (Levi Bellfield)
- Crimes: Levi Bellfield was convicted of the murders of Amélie Delagrange, Marsha McDonnell, and the attempted murder of Kate Sheedy. He was later found guilty of the murder of Milly Dowler.
- Vehicle Link: A key piece of evidence was a grainy CCTV footage of a red Daewoo Nexia pulling out of Collingwood Place, just ten minutes after Milly Dowler was last seen. The car was owned by Bellfield’s girlfriend.
- Motive and Patterns: Bellfield had an extreme hatred for young blonde women.
Chapter 6: The Suffolk Strangler (Steven Wright)
- Victims: Within a matter of six weeks, five young women had been murdered. The victims were Paula Clennell, Annel Alderton, Gemma Adams, Tania Nicol and Annette Nicholls.
- CCTV and Forensic Evidence: The key to the case was the large amount of CCTV footage that showed Wright in the area of the crimes and the forensic evidence that linked Wright to the victims.
- Quote: During the trial, the prosecutor asked Wright about the coincidences, to which Wright replied “It would seem so, yes.”
- Outcome: Wright was sentenced to life imprisonment with a recommendation of no parole.
Chapter 7: A Tragic December (Vincent Tabak)
- Victim: Joanna Yeates was murdered in December.
- Circumstantial Evidence: Vincent Tabak, the neighbor, was eventually arrested after Joanna’s body was found. Despite Tabak’s attempt to give himself an alibi, detectives found that Tabak had searched Google street view at the precise location on Longwood Lane where Joanna’s body was found just days before her body was found there. Blood was found in the trunk of his car that matched Joanna’s and the DNA that was found on Joanna’s body matched his own.
- Confession and Conviction: Tabak confessed to a prison chaplain that he had killed Joanna. Vincent Tabak was given a life sentence with a minimum term of twenty years in prison.
True Crime Case Studies
- What was Tony Hardy’s early life and background?
Tony Hardy was born in 1952 into a lower-middle-class family in Staffordshire, England. His father worked in the gypsum mines, and Tony was expected to follow in his footsteps. However, from a young age, Tony felt destined for greatness and desired a life beyond that of a laborer.
- How did Tony Hardy’s obsession with Jack the Ripper manifest itself?
While attending Imperial College, Tony developed a fascination with Jack the Ripper, reading every book he could find about the notorious killer. He admired the Ripper’s ability to evade police and considered him highly intelligent. He discussed his obsession with Jack the Ripper often with his friends and family, and spoke of him as being a “brilliant bloke”. After attempting to murder his wife in Tasmania, and subsequently deported back to the United Kingdom he would tell his friends it was an act to avoid jail time. He believed he could outwit everyone, just like Jack the Ripper.
- What were the circumstances surrounding the murder of Sally Rose White and how was Tony Hardy involved?
Tony Hardy’s roommate, Sally Rose White, who was developmentally challenged and worked as a prostitute, was found dead in their apartment. The scene was staged with disturbing elements like a rubber Satan mask, crucifixes, and photo equipment. Initially, a pathologist determined she died of natural causes, but investigators were suspicious due to the staged scene and blood evidence. After further investigation, Tony was arrested for the murder.
- What was John George Haigh’s method for disposing of his victims, and why did he believe it would lead to acquittal?
Haigh used sulfuric acid to dissolve the bodies of his victims. He believed that if there was no body, there could be no murder conviction, operating under the misunderstanding of the Latin term “corpus delicti.”
- How did Dena Thompson manage to deceive her husbands and lovers, and what were her motives?
Dena Thompson was a master manipulator who wove elaborate lies to deceive her husbands and lovers. Her motives were primarily financial, as she sought to enrich herself through insurance money, pension funds, and property. She created false narratives involving the mafia, forged documents, and even convinced one husband to go into hiding, all to maintain her deceit.
- What were some of the key pieces of evidence that linked Levi Bellfield to his crimes?
Key evidence included security camera footage placing his vehicles near the scenes of the crimes, his ex-partner’s testimony about his hatred of blonde women and his ownership of a white Ford cargo van, and DNA evidence linking him to the victims. Fiber analysis also connected carpet fibers from his van to the hair of one of the victims.
- What role did CCTV play in the investigation into Levi Bellfield?
CCTV was a critical component of the investigation into Levi Bellfield. Police used it to track Bellfield’s movements and identify vehicles of interest.
- How was Joanna Yeates’s body discovered, and what was the cause of death?
Joanna Yeates’s body was found on Christmas Day by a couple walking their dog. Her body was discovered in a snow-covered mound, and the cause of death was determined to be manual strangulation. She had been missing for eight days and was found with forty-three cuts and bruises on her body.

UK Serial Killer Cases

The source discusses several serial killer cases in the United Kingdom:
- Anthony Hardy, also known as the Camden Ripper, was responsible for multiple murders of prostitutes in the Camden area of London. He had an obsession with Jack the Ripper and a history of mental illness and violent behavior. In 2012, Hardy received a whole life tariff, meaning he will never be released from prison.
- Peter Bryan, known as the London Cannibal, was convicted of manslaughter for killing a girl with a hammer. Bryan was transferred to a low-security facility and later killed his friend. He was sentenced to two life terms and is unlikely to ever be released.
- John George Haigh, also known as the Acid Bath Killer, murdered multiple victims and disposed of their bodies using sulfuric acid. He was found guilty and hanged in 1949.
- Dena Thompson, known as the Black Widow, was convicted of deception and the murder of her second husband. On May 23, 2022 Dena Thompson was granted parole and subsequently released from prison.
- Levi Bellfield, known as the Bus Stop Killer, was found guilty of the murders of Amélie Delagrange, Marsha McDonnell, and Milly Dowler. He was sentenced to a whole-life tariff.
- Steven Wright, known as the Suffolk Strangler, was convicted of murdering five prostitutes in Ipswich. Wright was sentenced to life imprisonment with no parole.
- Vincent Tabak was found guilty of the murder of Joanna Yeates and was given a life sentence with a minimum of twenty years in prison.
British True Crime Cases: Notorious Killers

The source provides details of several true crime cases involving British Killers.
- Anthony Hardy: Also known as the Camden Ripper, Hardy murdered prostitutes in London. He was obsessed with Jack the Ripper and had mental health issues. He received a life sentence in 2012.
- Peter Bryan: Known as the London Cannibal, Bryan was convicted of manslaughter for killing a girl with a hammer. While in a low-security facility, he killed his friend. Bryan received two life sentences.
- John George Haigh: Also known as the Acid Bath Killer, Haigh murdered victims and disposed of their bodies with sulfuric acid. He was found guilty and hanged in 1949.
- Dena Thompson: Known as the Black Widow, Thompson was convicted of deception and murdering her second husband. She was granted parole on May 23, 2022.
- Levi Bellfield: Known as the Bus Stop Killer, Bellfield was found guilty of murdering Amélie Delagrange, Marsha McDonnell, and Milly Dowler. He received a whole-life tariff.
- Steven Wright: Known as the Suffolk Strangler, Wright was convicted of murdering five prostitutes in Ipswich and received a life sentence with no parole.
- Vincent Tabak: Tabak was found guilty of murdering Joanna Yeates and received a life sentence with a minimum of twenty years.
British Serial Killer Investigations: Case Details

The source provides details about the criminal investigations into several British serial killer cases:
- Anthony Hardy: In December 2002, the police followed a trail of battery acid to Hardy’s door after he vandalized a neighbor’s property. Upon entering his apartment, they found the naked body of Sally Rose White, along with evidence suggesting a simulated rape. Later, investigators found dismembered body parts in garbage bags that Hardy had deposited using a loyalty card from a local Sainsbury’s grocery store.
- John George Haigh: Police became suspicious of Haigh after Mrs. Lane reported Mrs. Durand-Deacon missing. They discovered Haigh had a history of fraud and forgery. A search of his workshop in Crawley revealed tools, chemicals, a gas mask, and a rubber apron with stains. Although Haigh claimed he had destroyed Mrs. Durand-Deacon with acid, police found traces of her remains, including bloodstains, gallstones, and false teeth.
- Levi Bellfield: Police examined security camera footage and identified a silver Vauxhall Corsa stalking Marsha McDonnell. After another attack, police realized they were looking for a serial killer. They found a white Ford cargo van that had driven the route at the time of another murder. Bellfield’s ex-partner identified him as the owner of the van. Police put Bellfield under surveillance and then arrested him.
- Steven Wright: Police discovered that Wright had a prior offense on his record and that his DNA was in the national DNA database. Detectives examined over 10,000 hours of security camera footage to map Wright’s movements. They found footage of Wright’s car in the areas where the victims disappeared. Forensic scientists found DNA from the victims in Wright’s car and home.
- Vincent Tabak: Security cameras showed Tabak driving to a supermarket, going inside, leaving without buying anything, and then returning to buy items. Tabak had searched Google Street View for the location where Joanna Yeates’ body was discovered. Blood was found in the trunk of Tabak’s car, and his DNA matched the DNA on Joanna’s body.
Forensic Investigations: Hardy, Haigh, Bellfield, Wright, and Tabak

The source details how police forensics played a role in the investigations of several cases:
- Anthony Hardy Police used Luminol to find traces of blood in Hardy’s apartment, even after attempts to clean. The police were able to connect Hardy to dismembered body parts found in garbage bags by using security camera footage that showed him depositing the bags using his Sainsbury’s loyalty card.
- John George Haigh Although Haigh tried to dissolve the bodies of his victims using acid, forensic evidence was used to convict him of murder. Despite Haigh’s belief that a murder was unprovable without a body, Forensic Pathologist Dr. Keith Simpson found traces of Mrs. Durand-Deacon in the sludge remaining at Leopold Road. Police found bloodstains inside the workshop. Additionally, police found gallstones and Mrs. Durand-Deacon’s false teeth that were not dissolved by the acid.
- Levi Bellfield Pollen experts analyzed foliage from Milly Dowler’s remains.
- Steven Wright Forensic scientists were able to duplicate DNA samples and develop a full DNA profile. Microscopic comparison of a nylon fiber found in Tania Nicol’s hair matched the carpet in Wright’s car. Forensic tests on work gloves found in Wright’s home revealed DNA from three of the girls. Fabric from Wright’s home was found on four of the five bodies.
- Vincent Tabak Vincent Tabak had searched Google street view at the precise location on Longwood Lane where Joanna’s body was found. Blood was found in the trunk of his car that matched Joanna’s DNA.
British Killers and Their Crimes

The source and conversation history provide details on several British killers and their crimes:
- Anthony Hardy, known as the Camden Ripper, murdered prostitutes in London and was obsessed with Jack the Ripper. In December 2002, police found the naked body of Sally Rose White in his apartment and later discovered dismembered body parts in garbage bags linked to Hardy via his Sainsbury’s loyalty card. Despite a pathologist’s initial assessment of natural causes, investigators found the scene suspicious. Hardy received a life sentence in 2012.
- Peter Bryan, the London Cannibal, was convicted of manslaughter for killing a girl with a hammer. He killed his friend while in a low-security facility. Bryan received two life sentences.
- John George Haigh, the Acid Bath Killer, murdered victims and disposed of their bodies using sulfuric acid. Despite his attempts to destroy the evidence, traces of his victim Mrs. Durand-Deacon were found in the sludge at his workshop, including bloodstains, gallstones, and false teeth. Haigh was found guilty and hanged in 1949.
- Dena Thompson, the Black Widow, was convicted of deception and the murder of her second husband. She was granted parole on May 23, 2022.
- Levi Bellfield, the Bus Stop Killer, was found guilty of the murders of Amélie Delagrange, Marsha McDonnell, and Milly Dowler. Security camera footage showed a silver Vauxhall Corsa stalking Marsha McDonnell, and later, a white Ford cargo van was identified as being at the scene of another murder. Bellfield’s ex-partner identified him as the van’s owner, leading to his arrest. He received a whole-life tariff.
- Steven Wright, the Suffolk Strangler, was convicted of murdering five prostitutes in Ipswich. His DNA was in the national DNA database due to a prior offense. Police used security camera footage to map his movements and found victim DNA in his car and home. Wright received a life sentence with no parole.
- Vincent Tabak was found guilty of the murder of Joanna Yeates and received a life sentence with a minimum of twenty years. He searched Google Street View for the location where her body was discovered. Blood matching Joanna’s DNA was found in his car.
By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog

Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
March 4, 2025
Modern SQL Data Warehouse Project: A Comprehensive Guide
This source details the creation of a modern data warehouse project using SQL. It presents a practical guide to designing data architecture, writing code for data transformation and loading, and creating data models. The project emphasizes real-world implementation, focusing on organizing and preparing data for analysis. The resource covers the ETL process, data quality, and documentation while building bronze, silver, and gold layers. It provides a comprehensive approach to data warehousing, from understanding requirements to creating a professional portfolio project.

Modern SQL Data Warehouse Project Study Guide

Quiz:
1. What is the primary purpose of data warehousing projects? Data warehousing projects focus on organizing, structuring, and preparing data for data analysis, forming the foundation for any data analytics initiatives.
2. Briefly explain the ETL/ELT process in SQL data warehousing. ETL/ELT in SQL involves extracting data from various sources, transforming it to fit the data warehouse schema (cleaning, standardizing), and loading it into the data warehouse for analysis and reporting.
3. According to Bill Inmon’s definition, what are the four key characteristics of a data warehouse? According to Bill Inmon’s definition, the four key characteristics of a data warehouse are subject-oriented, integrated, time-variant, and non-volatile.
4. Why is creating a project plan crucial for data warehouse projects, according to the source? Creating a project plan is crucial for data warehouse projects because they are complex, and a clear plan improves the chances of success by providing organization and direction, reducing the risk of failure.
5. What is the “separation of concerns” principle in data architecture, and why is it important? The “separation of concerns” principle involves breaking down a complex system into smaller, independent parts, each responsible for a specific task, to avoid mixing everything and to maintain a clear and efficient architecture.
6. Explain the purpose of the bronze, silver, and gold layers in a data warehouse architecture. The bronze layer stores raw, unprocessed data directly from the source systems, the silver layer contains cleaned and standardized data, and the gold layer holds business-ready data transformed and aggregated for reporting and analysis.
7. What are metadata columns, and why are they useful in a data warehouse? Metadata columns are additional columns added to tables by data engineers to provide extra information about each record, such as create date or source system, aiding in data tracking and troubleshooting.
8. What is a surrogate key, and why is it used in data modeling? A surrogate key is a system-generated unique identifier assigned to each record to make the record unique. It provides more control over the data model without dependence on source system keys.
9. Describe the star schema data model, including the roles of fact and dimension tables. The star schema is a data modeling approach with a central fact table surrounded by dimension tables. Fact tables contain events or transactions, while dimension tables hold descriptive attributes, related via foreign keys.
10. Explain the importance of clear documentation for end users of a data warehouse, as highlighted in the source.
Clear documentation is essential for end users to understand the data model and use the data warehouse effectively.

Quiz Answer Key:
1. Data warehousing projects focus on organizing, structuring, and preparing data for data analysis, forming the foundation for any data analytics initiatives.
2. ETL/ELT in SQL involves extracting data from various sources, transforming it to fit the data warehouse schema (cleaning, standardizing), and loading it into the data warehouse for analysis and reporting.
3. According to Bill Inmon’s definition, the four key characteristics of a data warehouse are subject-oriented, integrated, time-variant, and non-volatile.
4. Creating a project plan is crucial for data warehouse projects because they are complex, and a clear plan improves the chances of success by providing organization and direction, reducing the risk of failure.
5. The “separation of concerns” principle involves breaking down a complex system into smaller, independent parts, each responsible for a specific task, to avoid mixing everything and to maintain a clear and efficient architecture.
6. The bronze layer stores raw, unprocessed data directly from the source systems, the silver layer contains cleaned and standardized data, and the gold layer holds business-ready data transformed and aggregated for reporting and analysis.
7. Metadata columns are additional columns added to tables by data engineers to provide extra information about each record, such as create date or source system, aiding in data tracking and troubleshooting.
8. A surrogate key is a system-generated unique identifier assigned to each record to make the record unique. It provides more control over the data model without dependence on source system keys.
9. The star schema is a data modeling approach with a central fact table surrounded by dimension tables. Fact tables contain events or transactions, while dimension tables hold descriptive attributes, related via foreign keys.
10. Clear documentation is essential for end users to understand the data model and use the data warehouse effectively.
Essay Questions:
1. Discuss the importance of data quality in a modern SQL data warehouse project. Explain the role of the bronze and silver layers in ensuring high data quality, and provide examples of data transformations that might be performed in the silver layer.
2. Describe the Medan architecture and how it’s implemented using bronze, silver, and gold layers. Discuss the advantages of this architecture, including separation of concerns and data quality management, and explain how data flows through each layer.
3. Explain the process of creating a detailed project plan for a data warehouse project using a tool like Notion. Describe the key phases and stages involved, the importance of defining epics and tasks, and how this plan contributes to project success.
4. Explain the importance of source system analysis in a data warehouse project, and describe the key questions that should be asked when connecting to a new source system.
5. Compare and contrast the star schema with other data modeling approaches, such as snowflake and data vault. Discuss the advantages and disadvantages of the star schema for reporting and analytics, and explain the roles of fact and dimension tables in this model.
Glossary of Key Terms:
- Data Warehouse: A subject-oriented, integrated, time-variant, and non-volatile collection of data designed to support management’s decision-making process.
- ETL (Extract, Transform, Load): A process in data warehousing where data is extracted from various sources, transformed into a suitable format, and loaded into the data warehouse.
- ELT (Extract, Load, Transform): A process similar to ETL, but the transformation step occurs after the data has been loaded into the data warehouse.
- Data Architecture: The overall structure and design of data systems, including databases, data warehouses, and data lakes.
- Data Integration: The process of combining data from different sources into a unified view.
- Data Modeling: The process of creating a visual representation of data structures and relationships.
- Bronze Layer: The first layer in a data warehouse architecture, containing raw, unprocessed data from source systems.
- Silver Layer: The second layer in a data warehouse architecture, containing cleaned and standardized data ready for transformation.
- Gold Layer: The third layer in a data warehouse architecture, containing business-ready data transformed and aggregated for reporting and analysis.
- Subject-Oriented: Focused on a specific business area, such as sales, customers, or finance.
- Integrated: Combines data from multiple source systems into a unified view.
- Time-Variant: Keeps historical data for analysis over time.
- Non-Volatile: Data is not deleted or modified once it enters the data warehouse.
- Project Epic: A large task or stage in a project that requires significant effort to complete.
- Separation of Concerns: A design principle that breaks down complex systems into smaller, independent parts, each responsible for a specific task.
- Data Cleansing: The process of correcting or removing inaccurate, incomplete, or irrelevant data.
- Data Standardization: The process of converting data into a consistent format or standard.
- Metadata Columns: Additional columns added to tables to provide extra information about each record, such as creation date or source system.
- Surrogate Key: A system-generated unique identifier assigned to each record, used to connect data models and avoid dependence on source system keys.
- Star Schema: A data modeling approach with a central fact table surrounded by dimension tables.
- Fact Table: A table in a data warehouse that contains events or transactions, along with foreign keys to dimension tables.
- Dimension Table: A table in a data warehouse that contains descriptive attributes or categories related to the data in fact tables.
- Data Lineage: Tracking the origin and movement of data from its source to its final destination.
- Stored Procedure: A precompiled collection of SQL statements stored under a name and executed as a single unit.
- Data Normalization: The process of organizing data to reduce redundancy and improve data integrity.
- Data Lookup: Joining tables to retrieve specific data, such as surrogate keys, from related dimensions.
- Data Flow Diagram: A visual representation of how data moves through a system.
Modern SQL Data Warehouse Project Guide

Okay, here’s a detailed briefing document summarizing the main themes and ideas from the provided text excerpts.

Briefing Document: Modern SQL Data Warehouse Project

Overview:

This document summarizes the key concepts and practical steps outlined in a guide for building a modern SQL data warehouse. The guide, presented by Bar Zini, aims to equip data architects, data engineers, and data modelers with real-world skills by walking them through the creation of a data warehouse project using SQL Server (though adaptable to other SQL databases). The project emphasizes best practices and provides a professional portfolio piece upon completion.

Main Themes and Key Ideas:
1. Data Warehousing Fundamentals:
- Definition: The project begins by defining a data warehouse using Bill Inmon’s classic definition: “A data warehouse is subject oriented, integrated, time variant, and nonvolatile collection of data designed to support the Management’s decision-making process.”
- Subject Oriented: Focused on business areas (e.g., sales, customers, finance).
- Integrated: Combines data from multiple source systems.
- Time Variant: Stores historical data.
- Nonvolatile: Data is not deleted or modified once entered.
- Purpose: To address the inefficiencies of data analysts extracting and transforming data directly from operational systems, replacing it with an organized and structured data system as a foundation for data analytics projects.
- SQL Data Warehousing in Relation to Other Types of Data Analytics Projects: The guide mentions that SQL Data Warehousing is the foundation of any data analytics projects and that it is the first step before being able to do exploratory data analyzes (EDA) and Advanced analytics projects.
1. Project Structure and Skills Developed:
- Roles: The project is designed to provide experience in three key roles: data architect, data engineer, and data modeler.
- Skills: Participants will learn:
- ETL/ELT processing using SQL.
- Data architecture design.
- Data integration (merging multiple sources).
- Data loading and data modeling.
- Portfolio Building: The guide emphasizes the project’s value as a portfolio piece for demonstrating skills on platforms like LinkedIn.
1. Project Setup and Planning (Using Notion):
- Importance of Planning: The guide stresses that “creating a project plan is the key to success.” This is particularly important for data warehouse projects, where a high failure rate (over 50%, according to Gartner reports) is attributed to complexity.
- Iterative Planning: The planning process is described as iterative. An initial “rough project plan” is created, which is then refined as understanding of the data architecture evolves.
- Project Epics (Main Phases): The initial project phases identified are:
- Requirements analysis.
- Designing the data architecture.
- Project initialization.
- Task Breakdown: The project uses Notion (a free tool) to organize the project into epics and subtasks, enabling a structured approach.
- It is also mentioned the importance of icons to add a personal style to the project and to keep it more organized.
- Project success: One important element of the project to be successful is to be able to visualize the whole picture in the project by closing small chunks of work and tasks that gives a sense of motivation and accomplishment.
1. Data Architecture Design (Using Draw.io):
- Medallion Architecture: The guide advocates for a “Medallion architecture” (Bronze, Silver, Gold layers) within the data warehouse.
- Separation of Concerns: A core architectural principle is “separation of concerns.” This means breaking down the complex system into independent parts, each responsible for a specific task, with no duplication of components. “A good data architect follow this concept this principle.”
- Layer Responsibilities:Bronze Layer (Raw Data): Contains raw data, with no transformations. “In the bronze layer it’s going to be the row data.”
- Silver Layer (Cleaned and Standardized Data): Focuses on data cleansing and standardization. “In the silver you are cleans standard data.”
- Gold Layer (Business-Ready Data): Contains business-transformed data ready for analysis. “For the gos we can say business ready data.”
- Data Flow Diagram: The project utilizes Draw.io (a free diagramming tool) to visualize the data architecture and data lineage.
- Naming Conventions: A naming convention is created to ensure clarity and consistency, creating specific naming rules for tables and columns. Examples include fact_sales for a fact table and dim_customers for a dimension. It is recommended to create clear documentation about each rule and to add examples so that there is a general consensus about how to proceed.
1. Project Initialization and Tools:
- Software: The project uses SQL Server Express (database server) and SQL Server Management Studio (client for interacting with the database). Other tools include GitHub and Draw.io. Notion is used for project management.
- Initial Database Setup: The guide outlines the creation of a new database and schemas (Bronze, Silver, Gold) within SQL Server.
- Git Repository: The project emphasizes the importance of using Git for version control and collaboration. A repository structure is established with folders for data sets, documents, scripts, and tests.
- ReadMe: it is important to create a read me file at the root of the repo where the main characteristics and goal of the repo are specified so that other developers can have a better understanding of the project when collaborating.
1. Building the Bronze Layer
- The process to build the bronze layer is by first doing data analysis about what is to be built. The goal of this first process is to interview source system experts, identify the source of the data, the size of the data to be processed, the performance of the source system so that it is not to be affected and authentication/authorization like access tokens, keys and passwords.
- The project also makes a step-by-step approach from creating all the required queries and stored procedures to loading them efficiently. This step contains steps about testing that the tables have no nulls and that the separator used matches with the data.
1. Building the Silver Layer
- The specifications of the silver layer are to have clean and standardized data and building tables inside the silver layer. The data should be loaded from the bronze layer using full load, truncating and then inserting the data after which we will apply a lot of data transformation.
- In the silver layer, we will implement metadata columns where more data information is stored that doesn’t come directly from the source system. Some examples that can be stored are create and update dates, the source system, and the file location where this data came from. This can help track where there are corrupted data as well as find if there is a gap in the imported data.
1. Building the Gold Layer
*The gold layer is very focused on business goals and should be easy to consume for business reports. That is why we will create a data model for our business area. *When implementing a data model, it should contain two types of tables: fact tables and dimension tables. Dimension tables are descriptive and give some context to the data. One example of a dimension table is to use product info to use the product name, category and subcategories. Fact tables are events like transactions that contain IDs from dimensions. The question to define whether we should use a dimension table or a fact table comes to be: * How much and How many: fact table *Who, What, and Where: dimension table
1. General Data Cleaning
- In the project we will be building data transformations and cleansing where we will be writing insert statements that will have functions where the data will be transformed and cleaned up. This will include data checks in the primary keys, handling unwanted space, identifying the inconsistencies of the cardinality (the number of elements in a table) where we will be replacing null values, and fixing the dates and values of the sales order.
- During the data cleaning process, one tool to check the quality of our data is through quality checks where we can go and select data that is incorrect, and then we can have a quick fix. For any numerical column it is best to validate it against the negative numbers, null values, and against the data type to make sure to convert into the right format. *In the silver layer, some techniques will have to be applied for the data that is old, in that case, it will have to be removed or have a flag, and for the birthday, we can filter data in the future. *To find errors in SQL, it is possible to use try and catch in between code blocks and then print error messages, numbers, and states so that the messages can be handled to find errors easier. *There is a lot of information that might have missing values. The code includes techniques to fill missing values and then also to provide data normalization.
In summary, this guide provides a comprehensive, practical approach to building a modern SQL data warehouse, emphasizing structured planning, sound architectural principles, and hands-on coding experience. The emphasis on building a portfolio project makes it particularly valuable for those seeking to demonstrate their data warehousing skills.

SQL Data Warehouse Fundamentals

# What is a modern SQL data warehouse?

A modern SQL data warehouse, according to the excerpt from “A Journey Through Grief”, is a subject-oriented, integrated, time-variant, and non-volatile collection of data designed to support management’s decision-making process. It consolidates data from multiple source systems, organizes it around business subjects (like sales, customers, or finance), retains historical data, and ensures that the data is not deleted or modified once loaded.

# What are the key roles involved in building a data warehouse project?

According to the excerpt from “A Journey Through Grief”, building a data warehouse involves different roles including:

* **Data Architect:** Designs the overall data architecture following best practices.

* **Data Engineer:** Writes code to clean, transform, load, and prepare data.

* **Data Modeler:** Creates the data model for analysis.

# What are the three types of data analytics projects that can be done using SQL?

The three types of data analytics projects, according to the excerpt from “A Journey Through Grief”, are:

* **Data Warehousing:** Focuses on organizing, structuring, and preparing data for analysis, which is foundational for other analytics projects.

* **Exploratory Data Analysis (EDA):** Involves understanding and uncovering insights from datasets by asking the right questions and finding answers using basic SQL skills.

* **Advanced Analytics Projects:** Uses advanced SQL techniques to answer business questions, such as identifying trends, comparing performance, segmenting data, and generating reports.

# What is the Medici architecture and why is it relevant to designing a data warehouse?

The Medici architecture is a layered approach to data warehousing, which this source calls “Medan” and which is composed of:

* **Bronze Layer:** Raw data “as is” from source systems.

* **Silver Layer:** Cleaned and standardized data.

* **Gold Layer:** Business-ready data with transformed and aggregated information.

The Medici architecture enables separation of concerns, allowing unique sets of tasks for each layer, and helps organize and manage the complexity of data warehousing. It provides a structured approach to data processing, ensuring data quality and consistency.

# What tools are commonly used in data warehouse projects, and why is creating a project plan important?

Common tools used in data warehouse projects include:

* **SQL Server Express:** A local server for the database.

* **SQL Server Management Studio (SSMS):** A client to interact with the database and run queries.

* **GitHub:** For version control and collaboration.

* **draw.io:** A tool for creating diagrams, data models, data architectures and data lineage.

* **Notion:** A tool for project management, planning, and organizing resources.

Creating a project plan is essential for success due to the complexity of data warehouse projects. A clear plan helps organize tasks, manage resources, and track progress.

# What is data lineage, and why is it important in a data warehouse environment?

Data lineage refers to the data’s journey from its origin in source systems, through various transformations, to its final destination in the data warehouse. It provides visibility into the data’s history, transformations, and dependencies. Data lineage is crucial for troubleshooting data quality issues, understanding data flows, ensuring compliance, and auditing data processes.

# What are surrogate keys, and why are they used in data modeling?

Surrogate keys are system-generated unique identifiers assigned to each record in a dimension table. They are used to ensure uniqueness, simplify data relationships, and insulate the data warehouse from changes in source system keys. Surrogate keys provide control over the data model and facilitate efficient data integration and querying.

# What are some essential naming conventions for data warehouse projects, and why are they important?

Essential naming conventions help ensure consistency and clarity across the data warehouse. Examples include:

* Using prefixes to indicate the type of table (e.g., `dim_` for dimension, `fact_` for fact).

* Consistent naming of columns (e.g., surrogate keys ending with `_key`, technical columns starting with `dw_`).

* Standardized naming for stored procedures (e.g., `load_bronze` for bronze layer loading).

These conventions improve collaboration, code readability, and maintenance, enabling efficient data management and analysis.

Data Warehousing: Architectures, Models, and Key Concepts

Data warehousing involves organizing, structuring, and preparing data for analysis and is the foundation for any data analytics project. It focuses on how to consolidate data from various sources into a centralized repository for reporting and analysis.

Key aspects of data warehousing:
- A data warehouse is subject-oriented, integrated, time-variant, and a nonvolatile collection of data designed to support management’s decision-making process.
- Subject-oriented: Focuses on specific business areas like sales, customers, or finance.
- Integrated: Integrates data from multiple source systems.
- Time-variant: Keeps historical data.
- Nonvolatile: Data is not deleted or modified once it’s in the warehouse.
- ETL (Extract, Transform, Load): A process to extract data from sources, transform it, and load it into the data warehouse, which then becomes the single source of truth for analysis and reporting.
- Benefits of a data warehouse:
- Organized data: A data warehouse helps organize data so that the data team is not fighting with the data.
- Single point of truth: Serves as a single point of truth for analyses and reporting.
- Automation: Automates the data collection and transformation process, reducing manual errors and processing time.
- Historical data: Enables access to historical data for trend analysis.
- Data integration: Integrates data from various sources, making it easier to create integrated reports.
- Improved decision-making: Provides fresh and reliable reports for making informed decisions.
- Data Management: Data management is important for making real and good decisions.
- Data Modeling: Data modeling is creating a new data model for analyses.
Different Approaches to Data Warehouse Architecture:
- Inmon Model: Uses a three-layer approach (staging, enterprise data warehouse, and data marts) to organize and model data.
- Kimball Model: Focuses on quickly building data marts, which may lead to inconsistencies over time.
- Data Vault: Adds more standards and rules to the central data warehouse layer by splitting it into raw and business vaults.
- Medallion Architecture: Uses three layers: bronze (raw data), silver (cleaned and standardized data), and gold (business-ready data).
The Medallion architecture consists of the following:
- Bronze Layer: Stores raw, unprocessed data directly from the sources for traceability and debugging.
- Data is not transformed in this layer.
- Typically uses tables as object types.
- Full load method is applied.
- Access restricted to data engineers only.
- Silver Layer: Stores clean and standardized data with basic transformations.
- Focuses on data cleansing, standardization, and normalization.
- Uses tables as object types.
- Full load method is applied.
- Accessible to data engineers, data analysts, and data scientists.
- Gold Layer: Contains business-ready data for consumption by business users and analysts.
- Applies business rules, data integration, and aggregation.
- Uses views as object types for dynamic access.
- Suitable for data analysts and business users.
The ETL Process: Extract, Transform, and Load

The ETL (Extract, Transform, Load) process is a critical component of data warehousing used to extract data from various sources, transform it into a usable format, and load it into a data warehouse. The data warehouse then becomes the single point of truth for analyses and reporting.

The ETL process consists of three key stages:
- Extract: Involves identifying and extracting data from source systems without changing it. The goal is to pull out a subset of data from the source in order to prepare it and load it to the target. This step focuses solely on data retrieval, maintaining a one-to-one correspondence with the source system.
- Transform: Manipulates and transforms the extracted data into a format suitable for analysis and reporting. This stage may include data cleansing, integration, formatting, and normalization to reshape the data into the required format.
- Load: Inserts the transformed data into the target data warehouse. The prepared data from the transformation step is moved into its final destination, such as a data warehouse.
In real-world projects, the data architecture may have multiple layers, and the ETL process can vary between these layers. Depending on the data architecture’s design, it is not always necessary to use the complete ETL process to move data from a source to a target. For example, data can be loaded directly to a layer without transformations or undergo only transformation or loading steps between layers.

Different techniques and methods exist within each stage of the ETL process:

Extraction:
- Methods:
- Pull: The data warehouse pulls data from the source system.
- Push: The source system pushes data to the data warehouse.
- Types:
- Full Extraction: All records from the source tables are extracted.
- Incremental Extraction: Only new or changed data is extracted.
- Techniques:
- Manual extraction
- Querying a database
- Parsing a file
- Connecting to an API
- Event-based streaming
- Change data capture (CDC)
- Web scraping
Transformation:
- Data enrichment
- Data integration
- Deriving new columns
- Data normalization
- Applying business rules and logic
- Data aggregation
- Data cleansing:
- Removing duplicates
- Data filtering
- Handling missing data
- Handling invalid values
- Removing unwanted spaces
- Casting data types
- Detecting outliers
Load:
- Processing Types:
- Batch Processing: Loading the data warehouse in one large batch of data.
- Stream Processing: Processing changes as soon as they occur in the source system.
- Methods:
- Full Load:
- Truncate and insert
- Upsert (update and insert)
- Drop, create, and insert
- Incremental Load:
- Upsert
- Insert (append data)
- Merge (update, insert, delete)
- Slowly Changing Dimensions (SCD):
- SCD0: No historization; no changes are tracked.
- SCD1: Overwrite; records are updated with new information, losing history.
- SCD2: Add historization by inserting new records for each change and inactivating old records.
Data Modeling for Warehousing and Business Intelligence

Data modeling is the process of organizing and structuring raw data into a meaningful way that is easy to understand. In data modeling, data is put into new, friendly, and easy-to-understand formats like customers, orders, and products. Each format is focused on specific information, and the relationships between those objects are described. The goal is to create a logical data model.

For analytics, especially in data warehousing and business intelligence, data models should be optimized for reporting, flexible, scalable, and easy to understand.

Different Stages of Data Modeling:
- Conceptual Data Model: Focuses on identifying the main entities (e.g., customers, orders, products) and their relationships without specifying details like columns or attributes.
- Logical Data Model: Specifies columns, attributes, and primary keys for each entity and defines the relationships between entities.
- Physical Data Model: Includes technical details like data types, lengths, and database-specific configurations for implementing the data model in a database.
Data Models for Data Warehousing and Business Intelligence:
- Star Schema: Features a central fact table surrounded by dimension tables. The fact table contains events or transactions, while dimensions contain descriptive information. The relationship between fact and dimension tables forms a star shape.
- Snowflake Schema: Similar to the star schema but breaks down dimensions into smaller sub-dimensions, creating a more complex, snowflake-like structure.
Comparison of Star and Snowflake Schemas:
- Star Schema:
- Easier to understand and query.
- Suitable for reporting and analytics.
- May contain duplicate data in dimensions.
- Snowflake Schema:
- More complex and requires more knowledge to query.
- Optimizes storage by reducing data redundancy through normalization.
- The star schema is commonly used and perfect for reporting.
Types of Tables:
- Fact Tables: Contain events or transactions and include IDs from multiple dimensions, dates, and measures. They answer questions about “how much” or “how many”.
- Dimension Tables: Provide descriptive information and context about the data, answering questions about “who,” “what,” and “where”.
In the gold layer, data modeling involves creating new structures that are easy to consume for business reporting and analyses.

Data Transformation: ETL Process and Techniques

Data transformation is a key stage in the ETL (Extract, Transform, Load) process where extracted data is manipulated and converted into a format that is suitable for analysis and reporting. It occurs after data has been extracted from its source and before it is loaded into the target data warehouse. This process is essential for ensuring data quality, consistency, and relevance in the data warehouse.

Here’s a detailed breakdown of data transformation, drawing from the sources:

Purpose and Importance
- Data transformation changes the shape of the original data.
- It is a heavy working process that can include data cleansing, data integration, and various formatting and normalization techniques.
- The goal is to reshape and reformat original data to meet specific analytical and reporting needs.
Types of Transformations There are various types of transformations that can be performed:
- Data Cleansing:
- Removing duplicates to ensure each primary key has only one record.
- Filtering data to retain relevant information.
- Handling missing data by filling in blanks with default values.
- Handling invalid values to ensure data accuracy.
- Removing unwanted spaces or characters to ensure consistency.
- Casting data types to ensure compatibility and correctness.
- Detecting outliers to identify and manage anomalous data points.
- Data Enrichment: Adding value to data sets by including relevant information.
- Data Integration: Bringing multiple sources together into a unified data model.
- Deriving New Columns: Creating new columns based on calculations or transformations of existing ones.
- Data Normalization: Mapping coded values to user-friendly descriptions.
- Applying Business Rules and Logic: Implementing criteria to build new columns based on business requirements.
- Data Aggregation: Aggregating data to different granularities.
- Data Type Casting: Converting data from one data type to another.
Data Transformation in the Medallion Architecture In the Medallion architecture, data transformation is strategically applied across different layers:
- Bronze Layer: No transformations are applied. The data remains in its raw, unprocessed state.
- Silver Layer: Focuses on basic transformations to clean and standardize data. This includes data cleansing, standardization, and normalization.
- Gold Layer: Focuses on business-related transformations needed for the consumers, such as data integration, data aggregation, and the application of business logic and rules. The goal is to provide business-ready data that can be used for reporting and analytics.
SQL Server for Data Warehousing

The sources mention SQL Server as a tool used for building data warehouses. It is a platform that can run locally on a PC where a database can reside.

Here’s what the sources indicate about using SQL Server in the context of data warehousing:
- Building a data warehouse: SQL Server can be used to develop a modern data warehouse.
- Project platform: In at least one of the projects described in the sources, the data warehouse was built completely in SQL Server.
- Data loading: SQL Server is used to load data from source files, such as CSV files, into database tables. The BULK INSERT command is used to load data quickly from a file into a table.
- Database and schema creation: SQL scripts are used to create a database and schemas within SQL Server to organize data.
- SQL Server Management Studio: SQL Server Management Studio is a client tool used to interact with the database and run queries.
- Three-layer architecture: The SQL Server database is organized into three schemas corresponding to the bronze, silver, and gold layers of a data warehouse.
- DDL scripts: DDL (Data Definition Language) scripts are created and executed in SQL Server to define the structure of tables in each layer of the data warehouse.
- Stored procedures: Stored procedures are created in SQL Server to encapsulate ETL processes, such as loading data from CSV files into the bronze layer.
- Data quality checks: SQL queries are written and executed in SQL Server to validate data quality, such as checking for duplicates or null values.
- Views in the gold layer: Views are created in the gold layer of the data warehouse within SQL Server to provide a business-ready, integrated view of the data.
SQL Data Warehouse from Scratch | Full Hands-On Data Engineering Project

The Original Text

hey friends so today we are diving into something very exciting Building Together modern SQL data warehouse projects but this one is not any project this one is a special one not only you will learn how to build a modern Data Warehouse from the scratch but also you will learn how I implement this kind of projects in Real World Companies I’m bar zini and I have built more than five successful data warehouse projects in different companies and right now I’m leading big data and Pi Projects at Mercedes-Benz so that’s me I’m sharing with you real skills real Knowledge from complex projects and here’s what you will get out of this project as a data architect we will be designing a modern data architecture following the best practices and as a data engineer you will be writing your codes to clean transform load and prepare the data for analyzis and as a data Modell you will learn the basics of data moding and we will be creating from the scratch a new data model for analyzes and my friends by the end of this project you will have a professional portfolio project to Showcase your new skills for example on LinkedIn so feel free to take the project modify it and as well share it with others but it going to mean the work for me if you share my content and guess what everything is for free so there are no hidden costs at all and in this project we will be using SQL server but if you prefer other databases like my SQL or bis don’t worry you can follow along just fine all right my friends so now if you want to do data analytics projects using SQL we have three different types the first type of projects you can do data warehousing it’s all about how to organize structure and prepare your data for data analysis it is the foundations of any data analytics projects and in The Next Step you can do exploratory data analyzes Eda and all what you have to do is to understand and cover insights about our data sets in this kind of project you can learn how to ask the right questions and how to find the answer using SQL by just using basic SQL skills now moving on to the last stage where you can do Advanced analytics projects where you going to use Advanced SQL techniques in order to answer business questions like finding Trends over time comparing the performance segmenting your data into different sections and as well generate reports for your stack holders so here you will be solving real business questions using Advanced SQL techniques now what we’re going to do we’re going to start with the first type of projects SQL data warehousing where you will gain the following skills so first you will learn how to do ETL elt processing using SQL in order to prepare the data you will learn as well how to build data architecture how to do data Integrations where we can merge multiple sources together and as well how to do data load and data modeling so if I got you interested grab your coffee and let’s jump to the projects all right my friends so now before we Deep dive into the tools and the cool stuff we have first to have good understanding about what is exactly a data warehouse why the companies try to build such a data management system so now the question is what is a data warehouse I will just use the definition of the father of the data warehouse Bill Inon a data warehouse is subject oriented integrated time variance and nonvolatile collection of data designed to support the Management’s decision-making process okay I I know that might be confusing subject oriented it means thata Warehouse is always focused on a business area like the sales customers finance and so on integrated because it goes and integrate multiple Source systems usually you build a warehouse not only for one source but for multiple sources time variance it means you can keep historical data inside the data warehouse nonvolatile it means once the data enter the data warehouse it is not deleted or modified so this is how build and mod defined data warehouse okay so now I’m going to show you the scenario where your company don’t have a real data management so now let’s say that you have one system and you have like one data analyst has to go to this system and start collecting and extracting the data and then he going to spend days and sometimes weeks transforming the row data into something meaningful then once they have the report they’re going to go and share it and this data analyst is sharing the report using an Excel and then you have like another source of data and you have another data analyst that she is doing maybe the same steps collecting the data spending a lot of time transforming the data and then share at the end like a report and this time she is sharing the data using PowerPoint and a third system and the same story but this time he is sharing the data using maybe powerbi so now if the company works like this then there is a lot of issues first this process it take too way long I saw a lot of scenarios where sometimes it takes weeks and even months until the employee manually generating those reports and of course what going to happen for the users they are consuming multiple reports with multiple state of the data one report is 40 days old another one 10 days and a third one is like 5 days so it’s going to be really hard to make a real decision based on this structure a manual process is always slow and stressful and the more employees you involved in the process the more you open the door for human errors and errors of course in reports leads to bad decisions and another issue of course is handling the Big Data if one of your sources generating like massive amount of data then the data analyst going to struggle collecting the data and maybe in some scenarios it will not be any more possible to get the data so the whole process can breaks and you cannot generate any more fresh data for specific reports and one last very big issue with that if one of your stack holders asks for an integrated report from multiple sources well good luck with that because merging all those data manually is very chaotic timec consuming and full of risk so this is just a picture if a company is working without a proper data management without a data leak data warehouse data leak houses so in order to make real and good decisions you need data management so now let’s talk about the scenario of a data warehouse so the first thing that can happen is that you will not have your data team collecting manually the data you’re going to have a very important component called ETL ETL stands for extract transform and load it is a process that you do in order to extract the data from the sources and then apply multiple Transformations on those sources and at the end it loads the data to the data warehouse and this one going to be the single point of Truth for analyzes and Reporting and it is called Data Warehouse so now what can happen all your reports going to be consuming this single point of Truth so with that you create your multiple reports and as well you can create integrated reports from multiple sources not only from one single source so now by looking to the right side it looks already organized right and the whole process is completely automated there is no more manual steps which of course it ru uses the human error and as well it is pretty fast so usually you can load the data from the sources until the reports in matter of hours or sometimes in minutes so there is no need to wait like weeks and months in order to refresh anything and of course the big Advantage is that the data warehouse itself it is completely integrated so that means it goes and bring all those sources together in one place which makes it really easier for reporting and not only integrate you can build in the data warehouse as well history so we have now the possibility to access historical data and what is also amazing that all those reports having the same data status so all those reports can have the same status maybe sometimes one day old or something and of course if you have a modern Data Warehouse in Cloud platforms you can really easily handle any big data sources so no need to panic if one of your sources is delivering massive amount of data and of course in order to build the data warehouse you need different types of Developers so usually the one that builds the ATL component and the data warehouse is the data engineer so they are the one that is accessing the sources scripting the atls and building the database for the data warehouse and now for the other part the one that is responsible for that is the data analyst they are the one that is consuming the data warehouse building different data models and reports and sharing it with the stack holders so they are usually contacting the stack holders understanding the requirements and building multiple reports based on the data warehouse so now if you have a look to those two scenarios this is exactly why we need data management your data team is not wasting time and fighting with the data they are now more organized and more focused and with like data warehouse and you are delivering professional and fresh reports that your company can count on in order to make good and fast decisions so this is why you need a data management like a data warehouse think about data warehouse as a busy restaurant every day different suppliers bring in fresh ingredients vegetables spices meat you name it they don’t just use it immediately and throw everything in one pot right they clean it shop it and organize everything and store each ingredients in the right place fridge or freezer so this is the preparing face and when the order comes in they quickly grab the prepared ingredients and create a perfect dish and then serve it to the customers of the restaurant and this process is exactly like the data warehouse process it is like the kitchen where the raw ingredients your data are cleaned sorted and stored and when you need a report or analyzes it is ready to serve up exactly like what you need okay so now we’re going to zoom in and focus on the component ETL if you are building such a project you’re going to spend almost 90% just building this component the ATL so it is the core element of the data warehouse and I want you to have a clear understanding what is exactly an ETL so our data exist in a source system and now what we want to do is is to get our data from the source and move it to the Target source and Target could be like database tables so now the first step that we have to do is to specify which data we have to load from the source of course we can say that we want to load everything but let’s say that we are doing incremental loads so we’re going to go and specify a subset of the data from The Source in order to prepare it and load it later to the Target so this step in the ATL process we call it extract we are just identifying the data that we need we pull it out and we don’t change anything it’s going to be like one to one like the source system so the extract has only one task to identify the data that you have to pull out from the source and to not change anything so we will not manipulate the data at all it can stay as it is so this is the first step in the ETL process the extracts now moving on to the stage number two we’re going to take this extract data and we will do some manipulations Transformations and we’re going to change the shape of those data and this process is really heavy working we can do a lot of stuff like data cleansing data integration and a lot of formatting and data normalizations so a lot of stuff we can do in this step so this is the second step in the ETL process the transformation we’re going to take the original data and reshape it transformat into exactly the format that we need into a new format and shapes that we need for anal and Reporting now finally we get to the last step in the ATL process we have the load so in this step we’re going to take this new data and we’re going to insert it into the targets so it is very simple we’re going to take this prepared data from the transformation step and we’re going to move it into its final destination the target like for example data warehouse so that’s ETL in the nutshell first extract the row data then transform it into something meaningful and finally load it to a Target where it’s going to make a difference so that’s that’s it this is what we mean with the ETL process now in real projects we don’t have like only source and targets our thata architecture going to have like multiple layers depend on your design whether you are building a warehouse or a data lake or a data warehouse and usually there are like different ways on how to load the data between all those layers and in order now to load the data from one layer to another one there are like multiple ways on how to use the ATL process so usually if you are loading the data from the source to the layer number one like only the data from the source and load it directly to the layer number one without doing any Transformations because I want to see the data as it is in the first layer and now between the layer number one and the layer number two you might go and use the full ETL so we’re going to extract from the layer one transform it and then load it to the layer number two so with that we are using the whole process the ATL and now between Layer Two and layer three we can do only transformation and then load so we don’t have to deal with how to extract the data because it is maybe using the same technology and we are taking all data from Layer Two to layer three so we transform the whole layer two and then load it to layer three and now between three and four you can use only the L so maybe it’s something like duplicating and replicating the data and then you are doing the transformation so you load to the new layer and then transform it of course this is not a real scenario I’m just showing you that in order to move from source to a Target you don’t have always to use a complete ETL depend on the design of your data architecture you might use only few components from the ETL okay so this is how ETL looks like in real projects okay so now I would like to show you an overview of the different techniques and methods in the etls we have wide range of possibilities where you have to make decisions on which one you want to apply to your projects so let’s start first with the extraction the first thing that I want to show you is we have different methods of extraction either you are going to The Source system and pulling the data from the source or the source system is pushing the data to the data warehouse so those are the two main methods on how to extract data and then we have in the extraction two types we have a full extraction everything all the records from tables and every day we load all the data to the data warehouse or we make more smarter one where we say we’re going to do an incremental extraction where every day we’re going to identify only the new changing data so we don’t have to load the whole thing only the new data we go extract it and then load it to the data warehouse and in data extraction we have different techniques the first one is like manually where someone has to access a source system and extract the data manually or we connect ourself to a database and we have then a query in order to extract the data or we have a file that we have to pass it to the data warehouse or another technique is to connect ourself to API and do their cods in order to extract the data or if the data is available in streaming like in kfka we can do event based streaming in order to extract the data another way is to use the change data capture CDC is as well something very similar to streaming or another way is by using web scrapping where you have a code that going to run and extract all the informations from the web so those are the different techniques and types that we have in the extraction now if you are talking on the transformation there are wide range of different Transformations that we can do on our data like for example doing data enrichment where we add values to our data sets or we do a data integration where we have multiple sources and we bring everything to one data model or we derive a new of columns based on already existing one another type of data Transformations we have the data normalization so the sources has values that are like a code and you go and map it to more friendly values for the analyzers which is more easier to understand and to use another Transformations we have the business rules and logic depend on the business you can Define different criterias in order to build like new columns and what belongs to Transformations is the data aggregation so here we aggregate the data to a different granularity and then we have type of transformation called Data cleansing there are many different ways on how to clean our data for example removing the duplicates doing data filtering handling the missing data handling invalid values or removing unwanted spaces casting the data types and detecting the outliers and many more so we have different types of data cleansing that we can do in our data warehouse and this is very important transformation so as you can see we have different types of Transformations that we can do in our data warehouse now moving on to the load so what do we have over here we have different processing types so either we are doing patch processing or stream processing patch processing means we are loading the data warehouse in one big patch of data that’s going to run and load the data warehouse so it is only one time job in order to refresh the content of the data warehouse and as well the reports so that means we are scheduling the data warehouse in order to load it in the day once or twice and the other type we have the stream processing so this means if there is like a change in the source system we going to process this change as soon as possible so we’re going to process it through all the layers of the data warehouse once something changes from The Source system so we are streaming the data in order to have real time data warehouse which is very challenging things to do in data warehousing and if you are talking about the loads we have two methods either we are doing a full load or incremental load it’s a same thing as extraction right so for the full load in databases there are like different methods on how to do it like for example we trate and then insert that means we make the table completely empty and then we insert everything from the scratch or another one you are doing an update insert we call it upsert so we can go and update all the records and then insert the new one and another way is to drop create an insert so that means we drop the whole table and then we create it from scratch and then we insert the data it is very similar to the truncate but here we are as well removing and drubbing the whole table so those are the different methods of full loads the incremental load we can use as well the upserts so update and inserts so we’re going to do an update or insert statements to our tables or if the source is something like a log we can do only inserts so we can go and Abend the data always to the table without having to update anything another way to do incremental load is to do a merge and here it is very similar to the upsert but as well with a delete so update insert delete so those are the different methods on how to load the data to your tables and one more thing in data warehousing we have something called slowly changing Dimensions so here it’s all about the hyz of your table and there are many different ways on how to handle the Hyer in your table the first type is sd0 we say there is no historization and nothing should be changed at all so that means you are not going to update anything the second one which is more famous it is the sd1 you are doing an override so that means you are updating the records with the new informations from The Source system by overwriting the old value so we are doing something like the upsert so update and insert but you are losing of course history another one we have the scd2 and here you want to add historization to your table so what we do so what we do each change that we get from The Source system that means we are inserting new records and we are not going to overwrite or delete the old data we are just going to make it inactive and the new record going to be active one so there are different methods on how to do historization as well while you are loading the data to the data warehouse all right so those are the different types and techniques that you might encounter in data management projects so now what I’m going to show you quickly which of those types we will be using in our projects so now if we are talking about the extraction over here we will be doing a pull extraction and about the full or incremental it’s going to be a full extraction and about the technique we are going to be passsing files to the data warehouse and now about the data transformation well this one we will cover everything all those types of Transformations that I’m showing you now is going to be part of the project because I believe in each data project you will be facing those Transformations now if we have a look to the load our project going to be patch processing and about the load methods we will be doing a full load since we have full extraction and it’s going to be trunk it and inserts and now about the historization we will be doing the sd1 so that means we will be updating the content of the thata Warehouse so those are the different techniques and types that we will be using in our ETL process for this project all right so with that we have now clear understanding what is a data warehouse and we are done with the theory parts so now the next step we’re going to start with the projects the first thing that you have to do is to prepare our environment to develop the projects so let’s start with that all right so now we go to the link in the description and from there we’re going to go to the downloads and and here you can find all the materials of all courses and projects but the one that we need now is the SQL data warehouse projects so let’s go to the link and here we have bunch of links that we need for the projects but the most important one to get all data and files is this one download all project files so let’s go and do that and after you do that you’re going to get a zip file where you have there a lot of stuff so let’s go and extract it and now inside it if you go over here you will find the reposter structure from git and the most important one here is the data ass sets so you have two sources the CRM and the Erp and in each one of them there are three CSV files so those are the data set for the project for the other stuffs don’t worry about it we will be explaining that during the project so go and get the data and put it somewhere at your PC where you don’t lose it okay so now what else do we have we have here a link to the get repository so this is the link to my repository that I have created through the projects so you can go and access it but don’t worry about it we’re going to explain the whole structure during the project and you will be creating your own repository and as well we have the link to the notion here we are doing the project management here you’re going to find the main steps the main phes of the SQL projects that we will do and as well all the task that we will be doing together during the projects and now we have links to the project tools so if you don’t have it already go and download the SQL Server Express so it’s like a server that going to run locally at your PC where your database going to live another one that you have to download is the SQL Server management Studio it is just a client in order to interact with the database and there we’re going to run all our queries and then link to the GitHub and as well link to the draw AO if you don’t have it already go and download it it is free and amazing tool in order to draw diagrams so through the project we will be drawing data models the data architecture a data lineage so a lot of stuff we’ll be doing using this tool so go and download it and the last thing it is nice to have you have a link to the notion where you can go and create of course free account accounts if you want to build the project plan and as well Follow Me by creating the project steps and the project tasks okay so that’s all those are all the links for the projects so go and download all those stuff create the accounts and once you are ready then we continue with the projects all right so now I hope that you have downloaded all the tools and created the accounts now it’s time to move to very important step that’s almost all people skip while doing projects and then that is by creating the project plan and for that we will be using the tool notion notion is of course free tool and it can help you to organize your ideas your plans and resources all in one place I use it very intensively for my private projects like for example creating this course and I can tell you creating a project plan is the key to success creating a data warehouse project is usually very complex and according to Gardner reports over 50% of data warehouse projects fail and my opinion about any complex project the key to success is to have a clear project plan so now at this phase of the project we’re going to go and create a rough project plan because at the moment we don’t have yet clear understanding about the data architecture so let’s go okay so now let’s create a new page and let’s call it data warehouse projects the first thing is that we have to go and create the main phases and stages of the projects and for that we need a table so in order to do that hit slash and then type database in line and then let’s go and call it something like data warehouse epic and we’re going to go and hide it because I don’t like it and then on the table we can go and rename it like for example project epics something like that and now what we’re going to do we’re going to go and list all the big task of the projects so an epic is usually like a large task that needs a lot of efforts in order to solve it so you can call it epics stages faces of the project whatever you want so we’re going to go and list our project steps so it start with the requirements analyzes and then designing data architecture and another one we have the project initialization so those are the three big task in the project first and now what do we need we need another table for the small chunks of the tasks the subtasks and we’re going to do the same thing so we’re going to go and hit slash and we’re going to search for the table in line and we’re going to do the same thing so first we’re going to call it data warehouse tasks and then we’re going to hide it and over here we’re going to rename it and say this is the project tasks so now what we’re going to do we’re going to go to the plus icon over here and then search for relation this one over here with the arrow and now we’re going to search for the name of the first table so we called it data warehouse iix so let’s go and click it and we’re going to say as well two-way relation so let’s go and add the relation so with that we got a fi in the new table called Data Warehouse iix this comes from this table and as well we have here data warehouse tasks that comes from from the below table so as you can see we have linked them together now what I’m going to do I’m going to take this to the left side and then what we’re going to do we’re going to go and select one of those epics like for example let’s take design the data architecture and now what we’re going to do we’re going to go and break down this Epic into multiple tasks like for example choose data management approach and then we have another task what we’re going to do we’re going to go and select as well the same epic so maybe the next step is brainstorm and design the layers and then let’s go to another iic for example the project initialization and we say over here for example create get repo prepare the structure we can go and make another one in the same epic let’s say we’re going to go and create the database and the schemas so as you can see I’m just defining the subtasks of those epics so now what we’re going to do we’re going to go and add a checkbox in order to understand whether we have done the task or not so we go to the plus and search for check we need the check box and what we’re going to do we’re going to make it really small like this and with that each time we are done with the task we’re going to go and click on it just to make sure that we have done the task now there is one more thing that is not really working nice and that is here we’re going to have like a long list of tasks and it’s really annoying so what we’re going to do we’re going to go to the plus over here and let’s search for roll up so let’s go and select it so now what we’re going to do we have to go and select the relationship it’s going to be that data warehouse task and after that we’re going to go to the property and make it as the check box so now as you can see in the first table we are saying how many tasks is closed but I don’t want to show it like this what you going to do we’re going to go to the calculation and to the percent and then percent checked and with that we can see the progress of our project and now instead of the numbers we can have really nice bar great so as well we can go and give it a name like progress so that’s it and we can go and hide the data warehouse tasks and now with that we have really nice progress bar for each epic and if we close all the tasks of this epic we can see that we have reached 100% so this is the main structure now we can go and add some cosmetics and rename stuff in order to make things looks nicer like for example if I go to the tasks over here I can go and call it tasks and as well go and change the icon to something like this and if you’d like to have an icon for all those epics what we going to do we’re going to go to the Epic for example design data architecture and then if you hover on top of the title you can see add an icon and you can go and pick any icon that you want so for example this one and now now as you can see we have defined it here in the top and the icon going to be as well in the pillow table okay so now one more thing that we can do for the project tasks is that we can go and group them by the epics so if you go to the three dots and then we go to groups and then we can group up by the epics and as you can see now we have like a section for each epic and you can go and sort the epics if you want if you go over here sort then manual and you can go over here and start sorting the epics as you want and with that you can expand and minimize each task if you don’t want to see always all tasks in one go so this is really nice way in order to build like data management for your projects of course in companies we use professional Tools in order to do projects like for example Gyra but for private person projects that I do I always do it like this and I really recommend you to do it not only for this project for any project that you are doing CU if you see the whole project in one go you can see the big picture and closing tasks and doing it like this these small things can makes you really satisfied and keeps you motivated to finish the whole project and makes you proud okay friends so now I just went and added few icons a rename stuff and as well more tasks for each epic and this going to be our starting point in the project and once we have more informations we’re going to go and add more details on how exactly we’re going to build the data warehouse so at the start we’re going to go and analyze and understand the requirements and only after that we’re going to start designing the data architecture and here we have three tasks first we have to to choose the data management approach and after that we’re going to do brainstorming and designing the layers of the data warehouse and at the end we’re going to go and draw a data architecture so with that we have clear understanding how the data architecture looks like and after that we’re going to go to the next epic where we’re going to start preparing our projects so once we have clear understanding of the data architecture the first task here is to go and create detailed project tasks so we’re going to go and add more epes and more tasks and once we are done then we’re going to go and create the naming conventions for the project just to make sure that we have rules and standards in the whole project and next we’re going to go and create a repository in the git and we can to prepare as well the structure of the repository so that we always commit our work there and then we can start with the first script where we can create a database and schemas so my friends this is the initial plan for the project now let’s start with the first epic we have the requirements analyzes now analyzing the requirement it is very important to understand which type of data wehous you’re going to go and build because there is like not only one standard on how to build it and if you go blindly implementing the data warehouse you might be doing a lot of stuff that is totally unnecessary and you will be burning a lot of time so that’s why you have to sit with the stockholders with the department and understand what we exactly have to build and depend on the requirements you design the shape of the data warehouse so now let’s go and analyze the requirement of this project now the whole project is splitted into two main sections the first section we have to go and build a data warehouse so this is a data engineering task and we will go and develop etls and data warehouse and once we have done that we have to go and build analytics and reporting business intelligence so we’re going to do data analysis but now first we will be focusing on the first part building the data warehouse so what do you have here the statement is very simple it says develop a modern data warehouse using SQL Server to consolidate sales data enabling analytical reporting and informed decision making so this is the main statements and then we have specifications the first one is about the data sources it says import data from two Source systems Erb and CRM and they are provided as CSV files and now the second task is talking about the data quality we have to clean and fix data quality issues before we do the data analyses because let’s be real there is no R data that is perfect is always missing and we have to clean that up now the next task is talking about the integration so it says we have to go and combine both of the sources into one single userfriendly data model that is designed for analytics and Reporting so that means we have to go and merge those two sources into one single data model and now we have here another specifications it says focus on the latest data sets so there is no need for historization so that means we don’t have to go and build histories in the the database and the final requirement is talking about the documentation so it says provide clear documentations of the data model so that means the last product of the data warehouse to support the business users and the analytical teams so that means we have to generate a manual that’s going to help the users that makes lives easier for the consumers of our data so as you can see maybe this is very generic requirements but it has a lot of information already for you so it’s saying that we have to use the platform SQL Server we have two Source systems using using the CSV files and it sounds that we really have a bad data quality in the sources and as well it wants us to focus on building completely new data model that is designed for reporting and it says we don’t have to do historization and it is expected from us to generate documentations of the system so these are the requirements for the data engineering part where we’re going to go and build a data warehouse that fulfill these requirements all right so with that we have analyzed the requirements and as well we have closed at the first easiest epic so we are done with this let’s go and close it and now let’s open another one here we have to design the data architecture and the first task is to choose data management approach so let’s go now designing the data architecture it is exactly like building a house so before construction starts an architect going to go and design a plan a blueprint for the house how the rooms will be connected how to make the house functional safe and wonderful and without this blueprint from The Architects the builders might create something unstable inefficient or maybe unlivable the same goes for data projects a data architect is like a house architect they design how your data will flow integrate and be accessed so as data Architects we make sure that the data warehouse is not only functioning but also scalable and easy to maintain and this is exactly what we will do now we will play the role of the data architect and we will start brainstorming and designing the architecture of the data warehouse so now I’m going to show you a sketch in order to understand what are the different approaches in order to design a data architecture and this phase of the projects usually is very exciting for me because this is my main role in data projects I am a data architect and I discuss a lot of different projects where we try to find out the best design for the projects all right so now let’s go now the first step of building a data architecture is to make very important decision to choose between four major types the first approach is to build a data warehouse it is very suitable if you have only structured data and your business want to build solid foundations for reporting and business intelligence and another approach is to build a data leak this one is way more flexible than a data warehouse where you can store not only structured data but as well semi and unstructured data we usually use this approach if you have mixed types of data like database tables locks images videos and your business want to focus not only on reporting but as well on Advanced analytics or machine learning but it’s not that organized like a data warehouse and data leaks if it’s too much unorganized can turns into Data swamp and this is where we need the next approach so the next one we can go and build data leak house so it is like a mix between data warehouse and data leak you get the flexibility of having different types of data from the data Lake but you still want to structure and organiz your data like we do in the data warehouse so you mix those two words into one and this is a very modern way on how to build data Architects and this is currently my favorite way of building data management system now the last and very recent approach is to build data Mish so this is a little bit different instead of having centralized data management system the idea now in the data Mish is to make it decentralized you cannot have like one centralized data management system because always if you say centralized then it means bottleneck so instead you have multiple departments and multiple domains where each one of them is building a data product and sharing it with others so now you have to go and pick one of those approaches and in this project we will be focusing on the data warehouse so now the question is how to build the data warehouse well there is as well four different approaches on how to build it the first one is the inone approach so again you have your sources and the first layer you start with the staging where the row data is landing and then the next layer you organize your data in something called Enterprise data Warehouse where you go and model the data using the third normal format it’s about like how to structure and normalize your tables so you are building a new integrated data model from the multiple sources and then we go to the third layer it’s called the data Mars where you go and take like small subset of the data warehouse and you design it in a way that is ready to be consumed from reporting and it focus on only one toque like for example the customers sales or products and after that you go and connect your bi tool like powerbi or Tableau to the data Mars so with that you have three layers to prepare the data before reporting now moving on to the next one we have the kle approach he says you know what building this Enterprise data warehouse it is wasting a lot of time so what we can do we can jump immediately from the stage layer to the final data marks because building this Enterprise data warehouse it is a big struggle and usually waste a lot of time so he always want you to focus and building the data marks quickly as possible so it is faster approach than Inon but with the time you might get chaos in the data Mars because you are not always focusing in the big picture and you might be repeating same Transformations and Integrations in different data Mars so there is like trade-off between the speed and consistent data warehouse now moving on to the third approach we have the Data Vault so we still have the stage and the data Mars but it says we still need this Central Data Warehouse in the middle but this middle layer we’re going to bring more standards and rules so it tells you to split this middle layer into two layers the row Vault and the business vault in the row Vault you have the original data but in the business Vault you have all the business rules and Transformations that prepares the data for the data Mars so Data Vault it is very similar to the in one but it brings more standards and rules to the middle layer now I’m going to go and add a fourth one that I’m going to call it Medallion architecture and this one is my favorite one because it is very easy to understand and to build so it says you’re going to go and build three layers bronze silver and gold the bronze layer it is very similar to the stage but we have understood with the time that the stage layer is very important because having the original data as it is it going to helps a lot by tracebility and finding issues then the next layer we have the silver layer it is where we do Transformations data cleansy but we don’t apply yet any business rules now moving on to the last layer the gold layer it is as well very similar to the data Mars but there we can build different typ type of objects not only for reporting but as well for machine learning for AI and for many different purposes so they are like business ready objects that you want to share as a data product so those are the four approaches that you can use in order to build a data warehouse so again if you are building a data architecture you have to specify which approach you want to follow so at the start we said we want to build a data warehouse and then we have to decide between those four approaches on how to build the data warehouse and in this project we will be using using The Medallion architecture so this is a very important question that you have to answer as the first step of building a data architecture all right so with that we have decided on the approach so we can go and Mark it as done the next step we’re going to go and design the layers of the data warehouse now there is like not 100% standard way and rules for each layer what you have to do as a data architect you have to Define exactly what is the purpose of each layer so we start with the bronze layer so we say it going to store row and unprocessed data as it is from the sources and why we are doing that it is for tracebility and debugging if you have a layer where you are keeping the row data it is very important to have the data as it is from the sources because we can go always back to the pron layer and investigate the data of specific Source if something goes wrong so the main objective is to have row untouched data that’s going to helps you as a data engineer by analyzing the road cause of issues now moving on to the silver layer it is the layer where we’re going to store clean and standardized data and this is the place where we’re going to do basic transformations in order to prepare the data for the final layer now for the good layer it going to contain business ready data so the main goal here is to provide data that could be consumed by business users and analysts in order to build reporting and analytics so with that we have defined the main goal for each layer now next what I would like to do is to to define the object types and since we are talking about a data warehouse in database we have here generally two types either a table or a view so we are going for the bronze layer and the silver layer with tables but for the gold layer we are going with the views so the best practice says for the last layer in your data warehouse make it virtual using views it going to gives you a lot of dynamic and of course speed in order to build it since we don’t have to make a load process for it and now the next step is that we’re going to go and Define the load method so in this project I have decided to go with the full load using the method of trating and inserting it is just faster and way easier so we’re going to say for the pron layer we’re going to go with the full load and you have to specify as well for the silver layer as well we’re going to go with the full load and of course for the views we don’t need any load process so each time you decide to go with tables you have to define the load methods with full load incremental loads and so on now we come to the very interesting part the data Transformations now for the pron layer it is the easiest one about this topic because we don’t have any transformations we have to commit ourself to not touch the data do not manipulate it don’t change anything so it’s going to stay as it is if it comes bad it’s going to stay bad in the bronze layer and now we come to the silver layer where we have the heavy lifting as we committed in the objective we have to make clean and standardized data and for that we have different types of Transformations so we have to do data cleansing data standardizations data normalizations we have to go and derive new columns and data enrichment so there are like bunch of trans transformation that we have to do in order to prepare the data our Focus here is to transform the data to make it clean and following standards and try to push all business transformations to the next layer so that means in the god layer we will be focusing on business Transformations that is needed for the consumers for the use cases so what we do here we do data Integrations between Source system we do data aggregations we apply a lot of business Logics and rules and we build a data model that is ready for for example business inions so here we do a lot of business Transformations and in the silver layer we do basic data Transformations so it is really here very important to make the fine decisions what type of transformations to be done in each layer and make sure that you commit to those rules now the next aspect is about the data modeling in the bronze layer and the silver layer we will not break the data model that comes from the source system so if the source system deliver five tables we’re going to have here like five tables and as well in the silver layer we will not go and D normalize or normalize or like make something new we’re going to leave it exactly like it comes from the source system because what we’re going to do we’re going to build the data model in the gold layer and here you have to Define which data model you want to follow are you following the star schema the snowflake or are you just making aggregated objects so you have to go and make a list of all data models types that you’re going to follow in the gold layer and at the end what you can specify in each layer is the target audience and this is of course very important decision in the bronze layer you don’t want to give access access to any end user it is really important to make sure that only data Engineers access the bronze layer it makes no sense for data analysts or data scientist to go to the bad data because you have a better version for that in the silver layer so in the silver layer of course the data Engineers have to have an access to it and as well the data analysts and the data scientist and so on but still you don’t give it to any business user that can’t deal with the row data model from the sources because for the business users you’re going to get a bit layer for them and that is the gold layer so the gold layer it is suitable for the data analyst and as well the business users because usually the business users don’t have a deep knowledge on the technicality of the Sero layer so if you are designing multiple layers you have to discuss all those topics and make clear decision for each layer all right my friends so now before we proceed with the design I want to tell you a secret principle Concepts that each data architect must know and that is the separation of concerns so what is that as you are designing an architecture you have to make sure to break down the complex system into smaller independent parts and each part is responsible for a specific task and here comes the magic the component of your architecture must not be duplicated so you cannot have two parts are doing the same thing so the idea here is to not mix everything and this is one of the biggest mistakes in any big projects and I have sewn that almost everywhere so a good data architect follow this concept this principle so for example if you are looking to our data architecture we have already done that so we have defined unique set of tasks for each layer so for example we have said in the silver layer we do data cleansing but in the gold layer we do business Transformations and with that you will not be allowing to do any business transformations in the silver layer and the same thing goes for the gold layer you don’t do in the gold layer any data cleansing so each layer has its own unique tasks and the same thing goes for the pron layer and the silver layer you do not allow to load data from The Source systems directly to the silver layer because we have decided the landing layer the first layer is the pron layer otherwise you will have like set of source systems that are loaded first to the pron layer and another set is skipping the layer and going to the silver and with that we have overlapping you are doing data inje in two different layers so my friends if you have this mindsets separation of concerns I promise you you’re going to be a data architect so think about it all right my friends so with that we have designed the layers of the data warehouse we can go and close it the next step we’re going to go to draw o and start drawing the data architecture so there is like no one standard on how to build a data architecture you can add your style and the way that you want so now the first thing that we have to show in data architecture is the different layers that we have the first layer is the source system layer so let’s go and take a box like this and make it a little bit bigger and I’m just going to go and make the design so I’m going to remove the fill and make the line dotted one and after dots I’m going to go and change maybe the color to something like this gray so now we have like a container for the first layer and then we have to go and add like a text on top of it so what I’m going to do I’m going to take another box let’s go and type inside it sources and I’m going to go and style it so I’m going to go to the text and make it maybe 24 and then remove the lines like this make it a little bit smaller and put it on top so this is the first layer this is where the data come from and then the data going to go inside a data warehouse so I’m just going to go and duplicate this one this one is the data warehouse all right so now the third layer what is going to be it’s going to be the consumers who will be consuming this data warehouse so I’m going to put another box and say this is the consume layer okay so those are the three containers now inside the data warehouse we have decided to build it using the Medan architecture so we’re going to have three layers inside the warehouse so I’m going to take again another box I’m going to call this one this is the bronze layer and now we have to go and put a design for it so I’m going to go with this color over here and then the text and maybe something like 20 and then make it a little bit smaller and just put it here and beneath that we’re going to have the component so this is just a title of a container so I’m going to have it like this this remove the text from inside it and remove the filling so this container is for the bronze layer let’s go and duplicate it for the next one so this one going to be the silver layer and of course we can go and change the coloring to gray because it is silver and as well the lines and remove the filling great and now maybe I’m going to make the font as bold all right now the third layer going to be the gold layer and we have to go and pick it color for that so style and here we have like something like yellow the same thing for the container I remove the filling so with that we are showing now the different layers inside our data warehouse now those containers are empty what we’re going to do we’re going to go inside each one of them and start adding contents so now in the sources it is very important to make it clear what are the different types of source system that you are connecting to the data warehouse because in real project there are like multiple types you might have a database API files CFA and here it’s important to show those different types in our projects we have folders and inside those folders We have CSV files so now what you have to do we have to make it clear in this layer that the input for our project is CSV file so it really depend how you want to show that I’m going to go over here and say maybe folder and then I’m going to go and take the folder and put it here inside and then maybe search for file more results and go pick one of those icons for example I’m going to go with this one over here so I’m going to make it smaller and add it on top of the folder so with that we make it clear for everyone seeing the architecture that the sources is not a database is not an API it is a file inside the folder so now very important here to show is the source systems what are the sources that is involved in the project so here what we’re going to do we’re going to go and give it a name for example we have one source called CRM B like this and maybe make the icon and we have another source called Erp so we going to go and duplicate it put it over here and then rename it Erp so now it is for everyone clear we have two sources for the this project and the technology is used is simply a file so now what we can do as well we can go and add some descriptions inside this box to make it more clear so what I’m going to do I’m going to take a line because I want to split the description from the icons something like this and make it gray and then below it we’re going to go and add some text and we’re going to say is CSV file and the next point and we can say the interface is simply files in folder and of course you can go and add any specifications and explanation about the sources if it is a database you can see the type of the database and so on so with that we made it in the data architecture clear what are the sources of our data warehouse and now the next step what we’re going to do we’re going to go and design the content of the bronze silver and gold so I’m going to start by adding like an icon in each container it is to show about that we are talking about database so what we’re going to do we’re going to go and search for database and then more result more results I’m going to go with this icon over here so let’s go and make it it’s bigger something like this maybe change the color of that so we’re going to have the bronze and as well here the silver and the gold so now what we’re going to do we’re going to go and add some arrows between those layers so we’re going to go over here so we can go and search for Arrow and maybe go and pick one of those let’s go and put it here and we can go and pick a color for that maybe something like this and adjust it so now we can have this nice Arrow between all the layers just to explain the direction of our architecture right so we can read this from left to right and as well between the gold layer and the consume okay so now what I’m going to do next we’re going to go and add one statement about each layer the main objective so let’s go and grab a text and put it beneath the database and we’re going to say for example for the bl’s layer it’s going to be the row data maybe make the text bigger so you are the row data and then the next one in the silver you are cleans standard data and then the last one for the gos we can say business ready data so with that we make the objective clear for each layer now below all those icons what we going to do we’re going to have a separator again like this make it like colored and beneath it we’re going to add the most important specifications of this layer so let’s go and add those separators in each layer okay so now we need a text below it let’s take this one here so what is the object type of the bronze layer it’s going to be a table and we can go and add the load methods we say this is patch processing since we are not doing streaming we can say it is a full load we are not doing incremental load so we can say here Tran and insert and then we add one more section maybe about the Transformations so we can say no Transformations and one more about the data model we’re going to say none as is and now what I’m going to do I’m going to go and add those specifications as well for the silver and gold so here what we have discussed the object type the load process the Transformations and whether we are breaking the data model or not the same thing for the gold layer so I can say with that we have really nice layering of the data warehouse and what we are left is with the consumers over here you can go and add the different use cases and tools that can access your data warehouse like for example I’m adding here business intelligence and Reporting maybe using poweri or Tau or you can say you can access my data warehouse in order to do atoc analyzes using the SQ queries and this is what we’re going to focus on the projects after we buil the data warehouse and as well you can offer it for machine learning purposes and of course it is really nice to add some icons in your architecture and usually I use this nice websites called Flat icon it has really amazing icons that you can go and use it in your architecture now of course we can go and keep adding icons and stuff to explain the data architecture and as well the system like for example it is very important here to say which tools you are using in order to build this data warehouse is it in the cloud are you using Azure data breaks or maybe snowflake so we’re going to go and add for our project the icon of SQL Server since we are building this data warehouse completely in the SQL Server so for now I’m really happy about it as you can see we have now a plan right all right guys so with that we have designed the data architecture using the drw O and with that we have done the last step in this epic and now with that we have a design for the data architecture and we can say we have closed this epic now let’s go to the next one we will start doing the first step to prepare our projects and the first task here is to create a detailed project plan all right my friends so now it’s clear for us that we have three layers and we have to go and build them so that means our big epic is going to be after the layers so here I have added three more epics so we have build bronze layer build silver layer and gold layer and after that I went and start defining all the different tasks that we have to follow in the projects so at the start will be analyzing then coding and after that we’re going to go and do testing and once everything is ready we’re going to go and document stuff and at the end we have to commit our work in the get repo all those epics are following the same like pattern in the tasks so as you can see now we have a very detailed project structure and now things are more cleared for us how we going to build the data warehouse so with that we are done from this task and now the next task we have to go and Define the naming Convention of the projects all right so now at this phase of the projects we usually Define the naming conventions so what is that it a set of rules that you define for naming everything in the projects whether it is a database schema tables start procedures folders anything and if you don’t do that at the early phase of the project I promise you chaos can happen because what going to happen you will have different developers in your projects and each of those developers have their own style of course so one developer might name a tabled Dimension customers where everything is lowercase and between them underscore and you have another developer creating another table called Dimension products but using the camel case so there is no separation between the words and the first character is capitalized and maybe another one using some prefixes like di imore categories so we have here like a shortcut of the dimension so as you can see there are different designs and styles and if you leave the door open what can happen in the middle of the projects you will notice okay everything looks inconsistence and you can define a big task to go and rename everything following specific role so instead of wasting all this time at this phase you go and Define the naming conventions and let’s go and do that so we will start with a very important decision and that is which naming convention we going to follow in the whole project so you have different cases like the camel case the Pascal case the Kebab case and the snake case and for this project we’re going to go with the snake case where all the letters of award going to be lowercase and the separation between wordss going to be an underscore for example a table name called customer info customer is lowercased info is as well lowercased and between them an underscore so this is always the first thing that you have to decide for your data project the second thing is to decide the language so for example I work in Germany and there is always like a decision that we have to make whether we use Germany or English so we have to decide for our project which language we’re going to use and a very important general rule is that avoid reserved words so don’t use a square reserved word as an object name like for example table don’t give a table name as a table so those are the general principles so those are the general rules that you have to follow in the whole project this applies for everything for tables columns start procedures any names that you are giving in your scripts now moving on we have specifications for the table names and here we have different set of rules for each layer so here the rule says Source system uncore entity so we are saying all the tables in the bronze layer should start first with the source system name like for example CRM or Erb and after that we have an underscore and then at the end we have the entity name or the table name so for example we have this table name CRM uncore so that means this table comes from the source system CRM and then we have the table name the entity name customer info so this is the rule that we’re going to follow in naming all tables in the pron layer then moving on to the silver layer it is exactly like the bronze because we are not going to rename anything we are not going to build any new data model so the naming going to be one to one like the bronze so it is exactly the same rules as the bronze but if we go to the gold here since we are building new data model we have to go and rename things and since as well we are integrating multi sources together we will not be using the source system name in the tables because inside one table you could have multiple sources so the rule says all the names must be meaningful business aligned names for the tables starting with the category prefix so here the rule says it start with category then underscore and then entity now what is category we have in the go layer different types of tables so we could build a table called a fact table another one could be a dimension a third type could be an aggregation or report so we have different types of tables and we can specify those types as a perect at the start so for example we are seeing here effect uncore sales so the category is effect and the table name called sales and here I just made like a table with different type of patterns so we could have a dimension so we say it start with the di imore for example the IM customers or products and then we have another type called fact table so it starts with fact underscore or aggregated table where we have the fair three characters like aggregating the customers or the sales monthly so as you can see as you are creating a naming convention you have first to make it clear what is the rule describe each part of the rule and start giving examples so with that we make it clear for the whole team which names they should follow so we talked here about the table naming convention then you can as well go and make naming convention for the columns like for example in the gold layer we’re going to go and have circuit keys so we can Define it like this the circuit key should start with a table name and then underscore a key like for example we can call it customer underscore key it is a surrogate key in the dimension customers the same thing for technical columns as a data engineer we might add our own columns to the tables that don’t come from the source system and those columns are the technical columns or sometimes we call them metadata columns now in order to separate them from the original columns that comes from the source system we can have like a prefix for that like for example the rule says if you are building any technical or metadata columns the column should start with dwore and then that column name for example if you want the metadata load date we can have dwore load dates so with that if anyone sees that column starts with DW we understand this data comes from a data engineer and we can keep adding rules like for example the St procedure over here if you are making an ETL script then it should should start with the prefix load uncore and then the layer for example the St procedure that is responsible for loading the bronze going to be called load uncore bronze and for the Silver Load uncore silver so those are currently the rules for the St procedure so this is how I do it usually in my projects all right my friends so with do we have a solid namey conventions for our projects so this is done and now the next with that we’re going to go to git and you will create a brand new repository and we’re going to prepare its structure so let’s go go all right so now we come to as well important step in any projects and that’s by creating the git repository so if you are new to git don’t worry about it it is simpler than it sounds so it’s all about to have a safe place where you can put your codes that you are developing and you will have the possibility to track everything happen to the codes and as well you can use it in order to collaborate with your team and if something goes wrong you can always roll back and the best part here once you are done with the project you can share your reposter as a part of your portfolio and it is really amazing thing if you are applying for a job by showcasing your skills that you have built a data warehouse by using well documented get reposter so now let’s go and create the reposter of the project now we are at the overview of our account so the first thing that you have to do is to go to the repos stories over here and then we’re going to go to this green button and click on you the first thing that we have to do is to give Theory name so let’s call it SQL data warehouse project and then here we can go and give it a description so for example I’m saying building a modern data warehouse with SQL Server now the next option whether you want to make it public and private I’m going to leave it as a public and then let’s go and add here a read me file and then here about the license we can go over here and select the MIT MIT license gives everyone the freedom of using and modifying your code okay so I think I’m happy with the setup let’s go and create the repost story and with that we have our brand new reposter now the next step that I usually do is to create the structure of the reposter and usually I always follow the same patterns in any projects so here we need few folders in order to put our files right so what I usually do I go over here to add file create a new file and I start creating the structure over here so the first thing is that we need data sets then slash and with that the repos you can understand this is a folder not a file and then you can go and add anything like here play holder just an empty file this just can to help me to create the folders so let’s go and commit so commit the changes and now if you go back to the main projects you can see now we have a folder called data sets so I’m going to go and keep creating stuff so I will go and create the documents placeholder commit the changes and then I’m going to go and create the scripts Place holder and the final one what I usually add is the the tests something like this so with that as you can see now we have the main folders of our repository now what I usually do the next with that I’m going to go and edit the main readme so you can see it over here as well so what we’re going to do we’re going to go inside the read me and then we’re going to go to the edit button here and we’re going to start writing the main information about our project this is really depend on your style so you can go and add whatever you want this is the main page of your repository and now as you can see the file name here ismd it stands for markdown it is just an easy and friendly format in order to write a text so if you have like documentations you are writing a text it is a really nice format in order to organize it structure it and it is very friendly so what I’m going to do at the start I’m going to give a few description about the project so we have the main title and then we have like a welcome message and what this reposter is about and in the next section maybe we can start with the project requirements and then maybe at the end you can say few words about the licensing and few words about you so as you can see it’s like the homepage of the project and the repository so once you are done we’re going to go and commit the changes and now if you go to the main page of the repository you can see always the folder and files at the start and then below it we’re going to see the informations from the read me so again here we have the welcome statement and then the projects requirements and at the end we have the licensing and about me so my friends that’s that’s it we have now a repost story and we have now the main structure of the projects and through the projects as we are building the data warehouse we’re going to go and commit all our work in this repository nice right all right so with that we have now your repository ready and as we go in the projects we will be adding stuff to it so this step is done and now the last step finally we’re going to go to the SQL server and we’re going to write our first scripts where we’re going to create a database and schemas all right now the first step is we have to go and create brand new database so now in order to do that first we have to switch to the database master so you can do it like this use master and semicolon and if you go and execute it now we are switched to the master database it is a system database in SQL Server where you can go and create other databases and you can see from the toolbar that we are now logged into the master database now the next step we have to go and create our new database so we’re going to say say create database and you can call it whatever you want so I’m going to go with data warehouse semicolon let’s go and execute it and with that we have created our database let’s go and check it from the object Explorer let’s go and refresh and you can see our new data warehouse this is our new database awesome right now to the next step we’re going to go and switch to the new database so we’re going to say use data warehouse and semicolon so let’s go and switch to it and you can see now now we are logged into the data warehouse database and now we can go and start building stuff inside this data warehouse so now the first step that I usually do is I go and start creating the schemas so what is the schema think about it it’s like a folder or a container that helps you to keep things organized so now as we decided in the architecture we have three layers bronze silver gold and now we’re going to go and create for each layer a schema so let’s go and do that we’re going to start with the first one create schema and the first one is bronze so let’s do it like this and a semicolon let’s go and create the first schema nice so we have new schema let’s go to our database and then in order to check the schemas we go to the security and then to the schemas over here and as you can see we have the bronze and if you don’t find it you have to go and refresh the whole schemas and then you will find the new schema great so now we have the first schema now what we’re going to do we’re going to go and create the others two so I’m just going to go and duplicate it so the next one going to be the silver and the third one going to be the golds so let’s go and execute those two together we will get an error and that’s because we are not having the go in between so after each command let’s have a go and now if I highlight the silver and gold and then execute it will be working the go in SQL it is like separator so it tells SQL first execute completely the First Command before go to the next one so it is just separator now let’s go to our schemas refresh and now we can see as well we have the gold and the silver so with this we have now a database we have the three layers and we can start developing each layer individually okay so now let’s go and commit our work in the git so now since it is a script and code we’re going to go to the folder scripts over here and then we’re going to go and add a new file let’s call it init database.sql and now we’re going to go and paste our code over here so now I have done few modifications like for example before we create the database we have to check whether the database exists this is an important step if you are recreating the database otherwise if you don’t do that you will get an error where it’s going going to say the database already exists so first it is checking whether the database exist then it drops it I have added few comments like here we are saying creating the data warehouse creating the schemas and now we have a very important step we have to go and add a header comment at the start of each scripts to be honest after 3 months from now you will not be remembering all the details of these scripts and adding a comment like this it is like a sticky note for you later once you visit this script again and it is as well very important for the other developers in the team because each time you open a scripts the first question going to be what is the purpose of this script because if you or anyone in the team open the file the first question going to be what is the purpose of these scripts why we are doing these stuff so as you can see here we have a comment saying this scripts create a new data warehouse after checking if it already exists if the database exists it’s going to drop it and recreate it and additionally it’s going to go and create three schemas bronze silver gold so that it gives Clarity what this script is about and it makes everyone life easier now the second reason why this is very important to add is that you can add warnings and especially for this script it is very important to add these notes because if you run these scripts what’s going to happen it’s going to go and destroy the whole database imagine someone open the script and run it imagine an admin open the script and run it in your database everything going to be destroyed and all the data will be lost and this going to be a disaster if you don’t have any backup so with that we have nice H our comment and we have added few comments in our codes and now we are ready to commit our codes so let’s go and commit it and now we have our scripts in the git as well and of course if you are doing any modifications make sure to update the changes in the Gs okay my friends so with that we have an empty database and schemas and we are done with this task and as well we are done with the whole epic so we have completed the project initialization and now we’re going to go to the interesting stuff we will go and build the bronze layer so now the first task is to analyze the source systems so let’s go all right so now the big question is how to build the bronze layer so first thing first we do analyzing as you are developing anything you don’t immediately start writing a code so before we start coding the bronze layer what we usually do is we have to understand the source system so what I usually do I make an interview with the source system experts and ask them many many questions in order to understand the nature of the source system that I’m connecting to the data warehouse and once you know the source systems then we can start coding and the main focus here is to do the data ingestion so that means we have to find a way on how to load the data from The Source into the data warehouse so it’s like we are building a bridge between the source and our Target system the data warehouse and once we have the code ready the next step is we have to do data validation so here comes the quality control it is very important in the bronze layer to check the data completeness so that means we have to compare the number of Records between the source system and the bronze layer just to make sure we are not losing any data in between and another check that we will be doing is the schema checks and that’s to make sure that the data is placed on the right position and finally we don’t have to forget about documentation and committing our work in the gits so this is the process that we’re going to follow to build the bronze layer all right my friends so now before connecting any Source systems to our data warehouse we have to make very important step is to understand the sources so how I usually do it I set up a meeting with the source systems experts in order to interview them to ask them a lot of stuff about the source and gaining this knowledge is very important because asking the right question will help you to design the correct scripts in order to extract the data and to avoid a lot of mistakes and challenges and now I’m going to show you the most common questions that I usually ask before connecting anything okay so we start first by understanding the business context and the ownership so I would like to understand the story behind the data I would like to understand who is responsible for the data which it departments and so on and then it’s nice to understand as well what business process it supports does it support the customer transactions the supply chain Logistics or maybe Finance reporting so with that you’re going to understand the importance of your data and then I ask about the system and data documentation so having documentations from the source is your learning materials about your data and it going to saves you a lot of time later when you are working and designing maybe new data models and as well I would like always to understand the data model for the source system and if they have like descript I of the columns and the tables it’s going to be nice to have the data catalog this can helps me a lot in the data warehouse how I’m going to go and join the tables together so with that you get a solid foundations about the business context the processes and the ownership of the data and now in The Next Step we’re going to start talking about the technicality so I would like to understand the architecture and as well the technology stack so the first question that I usually ask is how the source system is storing the data do we have the data on the on Prem like an SQL Server Oracle or is it in the cloud like Azure lws and so on and then once we understand that then we can discuss what are the integration capabilities like how I’m going to go and get the data do the source system offer apis maybe CFA or they have only like file extractions or they’re going to give you like a direct connection to the database so once you understand the technology that you’re going to use in order to extract the data then we’re going to Deep dive into more technical questions and here we can understand how to extract the data from The Source system and and then load it into the data warehouse so the first things that we have to discuss with the experts can we do an incremental load or a full load and then after that we’re going to discuss the data scope the historization do we need all data do we need only maybe 10 years of the data are there history is already in the source system or should we build it in the data warehouse and so on and then we’re going to go and discuss what is the expected size of the extracts are we talking here about megabytes gigabytes terabytes and this is very important to understand whether we have the right tools and platform to connect the source system and then I try to understand whether there are any data volume limitations like if you have some Old Source systems they might struggle a lot with performance and so on so if you have like an ETL that extracting large amount of data you might bring the performance down of the source system so that’s why you have to try to understand whether there are any limitations for your extracts and as well other aspects that might impact the performance of The Source system this is very important if they give you an access to the database you have to be responsible that you are not bringing the performance of the database down and of course very important question is to ask about the authentication and the authorization like how you going to go and access the data in the source system do you need any tokens Keys password and so on so those are the questions that you have to ask if you are connecting new source system to the data warehouse and once you have the answers for those questions you can proceed with the next steps to connect the sources to the that Warehouse all right my friends so with that you have learned how to analyze a new source systems that you want to connect to your data warehouse so this STP is done and now we’re going to go back to coding where we’re going to write scripts in order to do the data ingestion from the CSV files to the Bros layer and let’s have quick look again to our bronze layer specifications so we just have to load the data from the sources to the data warehouse we’re going to build tables in the bronze layer we are doing a full load so that means we are trating and then inserting the data there will be no data Transformations at all in the bronze layer and as well we will not be creating any data model so this is the specifications of the bronze layer all right now in order to create the ddl script for the bronze layer creating the tables of the bronze we have to understand the metadata the structure the schema of the incoming data and here either you ask the technical experts from The Source system about these informations or you can go and explore the incoming data and try to define the structure of your tables so now what we’re going to do we’re going to start with the First Source system the CRM so let’s go inside it and we’re going to start with the first table that customer info now if you open the file and check the data inside it you see we have a Header information and that is very good because now we have the names of the columns that are coming from the source and from the content you can Define of course the data types so let’s go and do that first we’re going to say create table and then we have to define the layer it’s going to be the bronze and now very important we have to follow the naming convention so we start with the name of the source system it is the CRM underscore and then after that the table name from The Source system so it’s going to be the costore info so this is the name of our first table in the bronze layer then the next step we have to go and Define of course the columns and here again the column names in the bronze layer going to be one to one exactly like the source system so the first one going to be the ID and I will go with the data type integer then the next one going to be the key invar Char and the length I will go with [Music] 50 and the last one going to be the create dates it’s going to be date so with that we have covered all the columns available from The Source system so let’s go and check and yes the last one is the create date so that’s it for the first table now semicolon of course at the end let’s go and execute it and now we’re going to go to the object Explorer over here refresh and we can see the first table inside our data warehouse amazing right so now next what you have to do is to go and create a ddl statement for each file for those two systems so for the CRM we need three ddls and as well for the other system the Erp we have as well to create three ddls for the three files so at the ends we’re going to have in the bronze ler Six Tables six ddls so now pause the video go create those ddls I will be doing the same as well and we will see you soon all right so now I hope you have created all those details I’m going to show you what I have just created so the second table in the source CRM we have the product informations and the third one is the sales details then we go to the second system and here we make sure that we are following the naming convention so first The Source system Erb and then the table name so the second system was really easy you can see we have only here like two columns and for the customers like only three and for the categories only four columns all right so after defining those stuff of course we have to go and execute them so let’s go and do that and then we go to the object Explorer over here refresh the tables and with that you can see we have six empty tables in the bronze layer and with that we have all the tables from the two Source systems inside our database but still we don’t have any data and you can see our naming convention is really nice you see the first three tables comes from the CRM Source system and then the other three comes from the Erb so we can see in the bronze layer the things are really splitted nicely and you can identify quickly which table belonged to which source system now there is something else that I usually add to the ddl script is to check whether the table exists before creating so for example let’s say that you are renaming or you would like to change the data type of specific field if you just go and run this Square you will get an error because the database going to say we have already this table so in other databases you can say create or replace table but in the SQL Server you have to go and build a tsql logic so it is very simple first we have to go and check whether the object exist in the database so we say if object ID and then we have to go and specify the table name so let’s go and copy the whole thing over here and make sure you get exactly the same name as a table name so there is see like space I’m just going to go and remove it and then we’re going to go and Define the object type so going to be the U it stands for user it is the user defined tables so if this table is not null so this means the database did find this object in the database so what can happen we say go and drop the table so the whole thing again and semicolon so again if the table exist in the database is not null then go and drop the table and after that go and created so now if you go and highlight the whole thing and then execute it it will be working so first drop the table if it exist then go and create the table from scratch now what you have to do is to go and add this check before creating any table inside our database so it’s going to be the same thing for the next table and so on I went and added all those checks for each table and what can happen if I go and execute the whole thing it going to work so with that I’m recreating all the tables in the bronze layer from the scratch now the methods that we’re going to use in order to load the data from the source to the data warehouse is the bulk inserts bulk insert is a method of loading massive amount of data very quickly from files like CSV files or maybe a text file directly into a database it’s is not like the classical normal inserts where it’s going to go and insert the data row by row but instead the PK insert is one operation that’s going to load all the data in one go into the database and that’s what makes it very fast so let’s go and use this methods okay so now let’s start writing the script in order to load the first table in the source CRM so we’re going to go and load the table customer info from the CSV file to the database table so the syntax is very simple we’re going to start to saying pulk insert so with that SQL understand we are doing not a normal insert we are doing a pulk insert and then we have to go and specify the table name so it is bronze. CRM cost info so now now we have to specify the full location of the file that we are trying to load in this table so now what we have to do is to go and get the path where the file is stored so I’m going to go and copy the whole path and then add it to the P insert exactly like where the data exists so for me it is in csql data warehouse project data set in the source CRM and then I have to specify the file name so it’s going to be the costore info. CSV you have to get it exactly like like the path of your files otherwise it will not be working so after the path now we come to the with CLA now we have to tell the SQL Server how to handle our file so here comes the specifications there is a lot of stuff that we can Define so let’s start with the very important one is the row header now if you check the content of our files you can see always the first row includes the Header information of the file so those informations are actually not the data it’s just the column names the ACT data starts from the second row and we have to tell the database about this information so we’re going to say first row is actually the second row so with that we are telling SQL to skip the first row in the file we don’t need to load those informations because we have already defined the structure of our table so this is the first specifications the next one which is as well very important and loading any CSV file is the separator between Fields the delimiter between Fields so it’s really depend on the file structure that you are getting from the source as you can see all those values are splitted with a comma and we call this comma as a file separator or a delimiter and I saw a lot of different csvs like sometime they use a semicolon or a pipe or special character like a hash and so on so you have to understand how the values are splitted and in this file it’s splitted by the comma and we have to tell SQL about this info it’s very important so we going to say fill Terminator and then we’re going to say it is the comma and basically those two informations are very important for SQL in order to be able to read your CSV file now there are like many different options that you can go and add for example tabe lock it is an option in order to improve the performance where you are locking the entire table during loading it so as SQL is loading the data to this table it going to go and lock the whole table so that’s it for now I’m just going to go and add the semicolon and let’s go and insert the data from the file inside our pron table let’s execute it and now you can see SQL did insert around 880,000 rows inside our table so it is working we just loaded the file into our data Bas but now it is not enough to just write the script you have to test the quality of your bronze table especially if you are working with files so let’s go and just do a simple select so from our new table and let’s run it so now the first thing that I check is do we have data like in each column well yes as you can see we have data and the second thing is do we have the data in the correct column this is very critical as you are loading the data from a file to a database do we have the data in the correct column so for example here we have the first name which of course makes sense and here we have the last name but what could happen and this mistakes happens a lot is that you find the first name informations inside the key and as well you see the last name inside the first name and the status inside the last name so there is like shifting of the data and this data engineering mistake is very common if you are working with CSV files and there are like different reasons why it happens maybe the definition of your table is wrong or the filled separator is wrong maybe it’s not a comma it’s something else or the separator is a bad separator because sometimes maybe in the keys or in the first name there is a comma and the SQL is not able to split the data correctly so the quality of the CSV file is not really good and there are many different reasons why you are not getting the data in the correct column but for now everything looks fine for us and the next step is that I go and count the rows inside this table so let’s go and select that so we can see we have 18,490 and now what we can do we can go to our CSV file and check how many rows do we have inside this file and as you can see we have 18,490 we are almost there there is like one extra row inside the file and that’s because of the header the first Header information is not loaded inside our table and that’s why always in our tables we’re going to have one less row than the original files so everything looks nice and we have done this step correctly now if I go and run it again what’s going to happen we will get dcat inside the bronze layer so now we have loaded the file like twice inside the same table which is not really correct the method that we have discussed is first to make the table empty and then load trate and then insert in order to do that before the bulk inserts what we’re going to do we’re going to say truncate table and then we’re going to have our table and that’s it with a semicolon so now what we are doing is first we are making the table empty and then we start loading from the scratch we are loading the whole content of the file inside the table and this is what we call full load so now let’s go and Mark everything together and execute and again if you go and check the content of the table you can see we have only 18,000 rows let’s go and run it again the count of the bronze layer you can see we still have the 18,000 so each time you run this script now we are refreshing the table customer info from the file into the database table so we are refreshing the bronze layer table so that means if there is like now any changes in the file it will be loaded to the table so this is how you do a full load in the bronze layer by trating the table and then doing the inserts and now of course what we have to do is to Bow the video and go and write WR the same script for all six files so let’s go and do [Music] that okay back so I hope that you have as well written all those scripts so I have the three tables in order to load the First Source system and then three sections in order to load the Second Source system and as I’m writing those scripts make sure to have the correct path so for the Second Source system you have to go and change the path for the other folder and as well don’t forget the table name on the bronze layer is different from the file name because we start always with the source system name with the files we don’t have that so now I think I have everything is ready so let’s go and execute the whole thing perfect awesome so everything is working let me check the messages so we can see from the message how many rows are inserted in each table and now of course the task is to go through each table and check the content so that means now we have really ni script in order to load the bronze layer and we will use this script in daily basis every day we have to run it in order to get a new content to the data warehouse and as you learned before if you have like a script of SQL that is frequently used what we can do we can go and create a stored procedure from those scripts so let’s go and do that it’s going to be very simple we’re going to go over here and say create or alter procedure and now we have to define the name of the Sol procedure I’m going to go and put it in the schema bronze because it belongs to the bronze layer so then we’re going to go and follow the naming convention the S procedure starts with load underscore and then the bronze layer so that’s it about the name and then very important we have to define the begin and as well the end of our SQL statements so here is the beginning and let’s go to the end and say this is the end and then let’s go highlight everything in between and give it one push with tab so with that it is easier to read so now next one we’re going to do we’re going to go and execute it so let’s go and create this St procedure and now if you want to go and check your St procedure you go to the database and then we have here folder called programmability and then inside we have start procedure so if you go and refresh you will see our new start procedure let’s go and test it so I’m going to go and have new query and what we’re going to do we’re going to say execute bronze. load bronze so let’s go and execute it and with that we have just loaded completely the pron layer so as you can see SQL did go and insert all the data from the files to the bronze layer it is way easier than each time running those scripts of course all right so now the next step is that as you can see the output message it is really not having a lot of informations the message of your ETL with s procedure it will not be really clear so that’s why if you are writing an ETL script always take care of the messaging of your code so let me show you a nice design let’s go back to our St procedure so now what we can do we can go and divide the message p based on our code so now we can start with a message for example over here let’s say print and we say what you are doing with this thir procedure we are loading the bronze ler so this is the main message the most important one and we can go and play with the separators like this so we can say print and now we can go and add some nice separators like for example the equals at the start and at the end just to have like a section so this is just a nice message at the start so now by looking to our code we can see that our code is splited into two sections the first section we are loading all the tables from The Source system CRM and the second section is loading the tables from the Erp so we can split the prints by The Source system so let’s go and do that so we’re going to say print and we’re going to say loading CRM tables this is for the first section and then we can go and add some nice separators like the one let’s take the minus and of course don’t forget to add semicolons like me so we can to have semicolon for each print same thing over here I will go and copy the whole thing because we’re going to have it at the start and as well at the end let’s go copy the whole thing for the second section so for the Erp it starts over here and we’re going to have it like this and we’re going to call it loading Erp so with that in the output we can see nice separation between loading each Source system now we go to the next step where we go and add like a print for each action so for example here we are Tran getting the table so we say print and now what we can do we can go and add two arrows and we say what we are doing so we are trating the table and then we can go and add the table name in the message as well so this is the first action that we are doing and we can go and add another print for inserting the data so we can say inserting data into and then we have the table name so with that in the output we can understand what SQL is doing so let’s go and repeat this for all other tables Okay so I just added all those prints and don’t forget the semicolon at the end so I would say let’s go and execute it and check the output so let’s go and do that and then maybe at the start just to have quick output execute our stored procedure like this so let’s see now if you check the output you can see things are more organized than before so at the start we are reading okay we are loading the bronze layer now first we are loading the source system CRM and then the second section is for the Erp and we can see the actions so we trating inserting trating inserting for each table and as well the same thing for the Second Source so as you can see it is nice and cosmetic but it’s very important as you are debugging any errors and speaking of Errors we have to go and handle the errors in our St procedure so let’s go and do that it’s going to be the first thing that we do we say begin try and then we go to the end of our scripts and we say before the last end we say end try and then the next thing we have to add the catch so we’re going to say begin catch and end catch so now first let’s go and organize our code I’m going to take the whole codes and give it one more push and as well the begin try so it is more organized and as you know the try and catch is going to go and execute the try and if there is like any errors during executing this script the second section going to be executed so the catch will be executed only if the SQL failed to run that try so now what we have to do is to go and Define for SQL what to do if there’s like an error in your code and here we can do multiple stuff like maybe creating a logging tables and add the messages inside this table or we can go and add some nice messaging to the output like very example we can go and add like a section again over here so again some equals and we can go and repeat it over here and then add some content in between so we can start with something like to say error Accord during loading bronze layer and then we can go and add many stuff like for example we can go and add the error message and here we can go and call the function error message and we can go and add as well for example the error number so error number and of course the output of this going to be in number but the error message here is a text so we have to go and change the data type so we’re going to do a cast as in VAR Char like this and then there is like many functions that you can add to the output like for example the error States and so on so you can design what can happen if there is an error in the ETL now what else is very important in each ETL process is to add the duration of each like step so for example I would like to understand how long it takes to load this table over here but looking to the output I don’t have any informations how long is taking to load my tables and this is very important because because as you are building like a big data warehouse the ATL process is going to take long time and you would like to understand where is the issue where is the bottleneck which table is consuming a lot of time to be loaded so that’s why we have to add those informations as well to the output or even maybe to protocol it in a table so let’s go and add as well this step so we’re going to go to the start and now in order to calculate the duration you need the starting time and the end time so we have to understand when we started loaded and when we ended loading the table so now the first thing is we have to go and declare the variables so we’re going to say declare and then let’s make one called start time and the data type of this going to be the date time I need exactly the second when it started and then another one for the end time so another variable end time and as well the same thing date time so with that we have declared the variables and the next step is to go and use them so now let’s go to the first table to the customer info and at the start we’re going to say set start time equal to get date so we will get the exact time when we start loading this table and then let’s go and copy the whole thing and go to the end of loading over here so we’re going to say set this time the end time equal as well to the get dates so with that now we have the values of when we start loading this table and when we completed loading the table and now the next step is we have to go and print the duration those informations so over here we can go and say print and we can go and have as again the same design so two arrows and we can say very simply load duration and then double points and space and now what we have to do is to calculate the duration and we can do that using the date and time function date diff in order to find the interval between two dates so we’re going to say plus over here and then use date diff and here we have to Define three arguments first one is the unit so you can Define second minute hours and so on so we’re going to go with a second and then we’re going to define the start of the interval it’s going to be the start time and then the last argument is going to be the end of the boundary it’s going to be the end time and now of course the output of this going to be in number that’s why we have to go and cast it so we’re going to say cast as enar Char and then we’re going to close it like this and maybe at the ends we’re going to say plus space seconds in order to have a nice message so again what we have done we have declared the two variables and we are using them at the start we we are getting the current date and time and at the end of loading the table we are getting the current date and time and then we are finding the differences between them in order to get the load duration and in this case we are just priting this information and now we can go of course and add some nice separator between each table so I’m going to go and do it like this just few minuses not a lot of stuff so now what we have to do is to go and add this mechanism for each table in order to measure the speed of the ETL for each one of [Music] them okay so now I have added all those configurations for each table and let’s go and run the whole thing now so let’s go and edit the stor procedure this and we’re going to go and run it so let’s go and execute so now as you can see we have here one more info about the load durations and it is everywhere I can see we have zero seconds and that’s because it is super fast of loading those informations we are doing everything locally at PC so loading the data from files to database going to be Mega fast but of course in real projects you have like different servers and networking between them and you have millions of rods in the tables of course the duration going to be not like 0 seconds things going to be slower and now you can see easily how long it takes to load each of your tables and now of course what is very interesting is to understand how long it takes to load the whole pron lier so now your task is is as well to print at the ends informations about the whole patch how long it took to load the bronze [Music] layer okay I hope we are done now I have done it like this we have to Define two new variables so the start time of the batch and the end time of the batch and the first step in the start procedure is to get that date and time informations for the first variable and exactly at the end the last thing that we do in the start procedure we’re going to go and get the date and time informations for the end time so we say again set get date for the patch in time and then all what you have to do is to go and print a message so we are saying loading bronze layer is completed and then we are printing total load duration and the same thing with a date difference between the patch start time and the end time and we are calculating the seconds and so on so now what you have to do is to go and execute the whole thing so let’s go and refresh the definition of the S procedure and then let’s go and execute it so in the output we have to go to the last message and we can see loading pron layer is completed and the total load duration is as well 0 seconds because the execution time is less than 1 seconds so with that you are getting now a feeling about how to build an ETL process so as you can see the data engineering is not all about how to load the data it’s how to engineer the whole pipeline how to measure the speed of loading the data what can happen happen if there’s like an error and to print each step in your ETL process and make everything organized and cleared in the output and maybe in the logging just to make debugging and optimizing the performance way easier and there is like a lot of things that we can add we can add the quality measures and stuff so we can add many stuff to our ETL scripts to make our data warehouse professional all right my friends so with that we have developed a code in order to load the pron layer and we have tested that as well and now in the next step we we’re going to go back to draw because we want to draw a diagram about the data flow so let’s go so now what is a data flow diagram we’re going to draw a Syle visual in order to map the flow of your data where it come froms and where it ends up so we want just to make clear how the data flows through different layers of your projects and that’s help us to create something called the data lineage and this is really nice especially if you are analyzing an issue so if you have like multiple layers and you don’t have a real data lineage or flow it’s going to be really hard to analyze the scripts in order to understand the origin of the data and having this diagram going to improve the process of finding issues so now let’s go and create one okay so now back to draw and we’re going to go and build the flow diagram so we’re going to start first with the source system so let’s build the layer I’m going to go and remove the fill dotted and then we’re going to go and add like a box saying sources and we’re going to put it over here increase the size 24 and as well without any lines now what do we have inside the sources we have like folder and files so let’s go and search for a folder icon I’m going to go and take this one over here and say you are the CRM and we can as well increase the size and we have another source we have the Erp okay so this is the first layer let’s go and now have the bronze layer so we’re going to go and grab another box and we’re going to go and make the coloring like this and instead of Auto maybe take the hatch maybe something like this whatever you know so rounded and then we can go and put on top of it like the title so we can say you are the bronze layer and increase as well the size of the font so now what you’re going to do we’re going to go and add boxes for each table that we have in the bronze layer so for example we have the sales details we can go and make it little bit smaller so maybe 16 and not bold and we have other two tables from the CRM we have the customer info and as well the product info so those are the three tables that comes from the CRM and now what we’re going to do we’re going to go and connect now the source CRM with all three tables so what we going to do we’re going to go to the folder and start making arrows from the folder to the bronze layer like this and now we have to do the same thing for the Erp source so as you can see the data flow diagram shows us in one picture the data lineage between the two layers so here we can see easily those three tables actually comes from the CRM and as well those three tables in the bronze layer are coming from the Erp I understand if we have like a lot of tables it’s going to be a huge Miss but if you have like small or medium data warehouse building those diagrams going to make things really easier to understand how everything is Flowing from the sources into the different layers in your data warehouse all right so with that we have the first version of the data flow so this step is done and the final step is to commit our code in the get repo okay so now let’s go and commit our work since it is scripts we’re going to go to the folder scripts and here we’re going to have like scripts for the bronze silver and gold that’s why maybe it makes sense to create a folder for each layer so let’s go and start creating the bronze folder so I’m going to go and create a new file and then I’m going to say pron slash and then we can have the DL script of the pron layer dot SQL so now I’m going to go and paste the edal codes that we have created so those six tables and as usual at the start we have a comment where we are explaining the purpose of these scripts so we are saying these scripts creates tables in the pron schema and by running the scripts you are redefining the DL structure of the pron tables so let’s have it like that and I’m going to go and commit the changes all right so now as you can see inside the scripts we have a folder called bronze and inside it we have the ddl script for the bronze layer and as well in the pron layer we’re going to go and put our start procedure so we’re going to go and create a new file let’s call it proc load bronze. SQL and then let’s go and paste our scripts and as usual I have put it at the start an explanation about the sord procedure so we are seeing this St procedure going to go and load the data from the CSV files into the pron schema so it going go and truncate first the tables and then do a pulk inserts and about the parameters this s procedure does not accept any parameter or return any values and here a quick example how to execute it all right so I think I’m happy with that so let’s go and commit it all right my friends so with that we have committed our code into the gch and with that we are done building the pron layer so the whole is done now we’re going to go to the next one this one going to be more advanced than the bronze layer because the there will be a lot of struggle with cleaning the data and so on so we’re going to start with the first task where we’re going to analyze and explore the data in the source systems so let’s go okay so now we’re going to start with the big question how to build the silver layer what is the process okay as usual first things first we have to analyze and now the task before building anything in the silver layer we have to go and explore the data in order to understand the content of our sources once we have it what we’re going to do we will be starting coding and here the transformation that we’re going to do is data cleansing this is usually process that take really long time and I usually do it in three steps the first step is to check first the data quality issues that we have in the pron layer so before writing any data Transformations first we have to understand what are the issues and only then I start writing data transformations in order to fix all those quality issues that we have in the bronze and the last step once I have clean results what we’re going to do we’re going to go and inserted into the silver layer and those are the three faces that we will be doing as we are writing the code for the silver layer and the third step once we have all the data in the server layer we have to make sure that the data is now correct and we don’t have any quality issues anymore and if you find any issues of course what you going to do we’re going to go back to coding we’re going to do the data cleansing and again check so it is like a cycle between validating and coding once the quality of the silver layer is good we cannot skip the last phase where we going to document and commit our work in the Gs and here we’re going to have two new documentations we’re going to build the data flow diagram and as well the data integration diagram after we understood the relationship between the sources from the first step so this is the process and this is how we going to build the server layer all right so now exploring the data in the pron layer so why it is very important because understanding the data it is the key to make smart decisions in the server layer it was not the focus in the BR layer to understand the content of the data at all we focused only how to get the data to the data warehouse so that’s why we have now to take a moment in order to explore and understand the tables and as well how to connect them what are the relationship between these tables and it is very important as you are learning about a new source system is to create like some kind of documentation so now let’s go and explore the sources okay so now let’s go and explore them one by one we can start with the first one from the CRM we have the customer info so right click on it and say select top thousand rows and this is of course important if you have like a lot of data don’t go and explore millions of rows always limit your queries so for example here we are using the top thousands just to make sure that you are not impacting the system with your queries so now let’s have a look to the content of this table so we can see that we have here customer informations so we have an ID we have a key for the customer we have first name last name my Ral status gender and the creation date of the customer so simply this is a table for the customer customer information and a lot of details for the customers and here we have like two identifiers one it is like technical ID and another one it’s like the customer number so maybe we can use either the ID or the key in order to join it with other tables so now what I usually do is to go and draw like data model or let’s say integration model just to document and visual what I am understanding because if you don’t do that you’re going to forget it after a while so now we go and search for a shape let’s search for table and I’m going to go and pick this one over here so here we can go and change the style for example we can make it rounded or you can go make it sketch and so on and we can go and change the color so I’m going to make it blue then go to the text make sure to select the whole thing and let’s make it bigger 26 and then what I’m going to do for those items I’m just going to select them and go to arrange and maybe make it 40 something like this so now what we’re going to do we’re going to just go and put the table name so this is the one that we are now learning about and what I’m going to do I’m just going to go and put here the primary key I will not go and list all the informations so the primary key was the ID and I will go and remove all those stuff I don’t need it now as you can see the table name is not really friendly so I can go and bring a text and put it here on top and say this is the customer information just to make it friendly and do not forget about it and as well going to increase the size to maybe 20 something like this okay with that we have our first table and we’re going to go and keep exploring so let’s move to the second one we’re going to take the product information right click on it and select the top thousand rows I will just put it below the previous query query it now by looking to this table we can see we have product informations so we have here a primary key for the product and then we have like key or let’s say product number and after that we have the full name of the product the product costs and then we have the product line and then we have like start and end well this is interesting to understand why we have start and ends let’s have a look for example for those three rows all of those three having the same key but they have different IDs so it is the same product but with different costs so for 2011 we have the cost of 12 then 2012 we have 14 and for the last year 2013 we have 13 so it’s like we have like a history for the changes so this table not only holding the current affirmations of the product but also history informations of the products and that’s why we have those two dates start and end now let’s go back and draw this information over here so I’m just going to go and duplicate it so the name of this table going to be the BRD info and let’s go and give it like a short description current and history products information something like this just to not forget that we have history in this table and here we have as well the PRD ID and there is like nothing that we can use in order to join those two tables we don’t have like a customer ID here or in the other table we don’t have any product ID okay so that’s it for this table let’s jump to the third table and the last one in the CRM so let’s go and select I just made other queries as well short so let’s go and execute so what do you have over here we have a lot of informations about the order the sales and a lot of measures order number we have the product key so this is something that we can use in order to join it with the product table we have the customer ID we don’t have the customer key so here we have like ID and here we have key so there’s like two different ways on how to join tables and then we have here like dates the order dates the shipping date the due date and then we have the sales amount the quantity and the price so this is like an event table it is transactional table about the orders and sales and it is great table in order to connect the customers with the products and as well with the orders so let’s document this new information that we have so the table name is the sales details so we can go and describe it like this transactional records about sales and orders and now we have to go and describe how we can connect this table to the other two so we are not using the product ID we are using the product key and now we need a new column over here so you can hold control and enter or you can go over here and add a new row and the other row is going to be the customer ID so now for the the customer ID it is easy we can gr and grab an arrow in order to connect those two tables but for the product key we are not using the ID so that’s why I’m just going to go and remove this one and say product key let’s have here again a check so this is a product key it’s not a product ID and if we go and check the old table the products info you can see we are using this key and not the primary key so what we’re going to do now we will just go and Link it like this and maybe switch those two tables so I will put the customer below just perfect it looks nice okay so let’s keep moving let’s go now to the other source system we have the Erp and the first one is ARB cost and we have this cryptical name let’s go and select the data so now here it’s small table and we have only three informations so we have here something called C and then we have something I think this is the birthday and the gender information so we have here male female and so on so it looks again like the customer informations but here we have like extra data about the birthday and now if you go and compare it to the customer table that we have from the other source system let’s go and query it you can see the new table from the Erb don’t have IDs it has actually the customer number or the key so we can go and join those two tables using the customer key let’s go and document this information so I will just go and copy paste and put it here on the right side I will just go and change the color now since we are now talking about different Source system and here the table name going to be this one and the key called C ID now in order to join this table with the customer info we cannot join it with the customer ID we need the customer key that’s why here we have to go and add a new row so contrl enter and we’re going to say customer key and then we have to go and make a nice Arrow between those two keys so we’re going to go and give it a description customer information and here we have the birth dates okay so now let’s keep going we’re going to go to the next one we have the Erp location let’s go and query this table so what do you have over here we have the CID again and as you can see we have country informations and this is of course again the customer number and we have only this information the country so let’s go and docment this information this is the customer location table name going to be like this and we still have the same ID so we have here still the customer ID and we can go and join it using the customer key and we have to give it the description locate of customers and we can say here the country okay so now let’s go to the last table and explore it we have the Erp PX catalog so let’s go and query those informations so what do we have here we have like an ID a category a subcategory and the maintenance here we have like either yes and no so by looking to this table we have all the categories and the subcategories of the products and here we have like special identifier for those informations now the question is how to join it so I would like to join it actually with the product informations so let’s go and check those two tables together okay so in the products we don’t have any ID for the categories but we have these informations actually in the product key so the first five characters of the product key is actually the category ID so we can use this information over here in order to join it with the categories so we can go and describe this information like this and then we have to go and give it a name and then here we have the ID and the ID could be joined using the product key so that means for the product information we don’t need at all the product ID the primary key all what we need is the product key or the product number and what I would like to do is like to group those informations in a box so let’s go grab like any boxes here on the left side and make it bigger and then make the edges a little bit smaller let’s remove move the fill and the line I will make a dotted line and then let’s grab another box over here and say this is the CRM and we can go and increase the size maybe something like 40 smaller 35 bold and change the color to Blue and just place it here on top of this box so with that we can understand all those tables belongs to the source system CRM and we can do the same stuff for the right side as well now of course we have to go and add the description here so it’s going to be the product categories all right so with that we have now clear understanding how the tables are connected to each others we understand now the content of each table and of course it can to help us to clean up the data in the silver layer in order to prepare it so as you can see it is very important to take time understanding the structure of the tables the relationship between them before start writing any code all right so with that we have now clear understanding about the sources and with that we have as well created a data integration in the dro so with that we have more understanding about how to connect the sources and now in the next two task we will go back to SQL where we’re going to start checking the quality and as well doing a lot of data Transformations so let’s go okay so now let’s have a quick look to the specifications of the server layer so the main objective to have clean and standardized data we have to prepare the data before going to the gold layer and we will be building tables inside the silver layer and the way of loading the data from the bronze to the silver is a full load so that means we’re going to trate and then insert and here we’re going to have a lot of data Transformations so we’re going to clean the data we’re going to bring normalizations standardizations we’re going to derive new columns we will be doing as well data enrichment so a lot of things to be done in the data transformation but we will not be building any new data model so those are the specifications and we have to commit ourself to this scope okay so now building the ddl script for the layer going to be way easier than the bronze because the definition and the structure of each table in the silver going to be identical to the bronze layer we are not doing anything new so all what you have to do is to take the ddl script from the bronze layer and just go and search and replace for the schema I’m just using the notepad++ for the scripts so I’m going to go over here and say replace the bronze dots with silver dots and I’m going to go and replace all so with that now all the ddl is targeting the schema silver layer which is exactly what we need all right now before we execute our new ddl script for the silver we have to talk about something called the metadata columns they are additional columns or fields that the data Engineers add to each table that don’t come directly from the source systems but the data Engineers use it in order to provide extra informations for each record like we can add a column called create date is when the record was loaded or an update date when the the record got updated or we can add the source system in order to understand the origin of the data that we have or sometimes we can add the file location in order to understand the lineage from which file the data come from those are great tool if you have data issue in your data warehouse if there is like corrupt data and so on this can help you to track exactly where this issue happens and when and as well it is great in order to understand whether I have Gap in my data especially if you are doing incremental mod it is like putting labels on everything and you will thank yourself later when you start using them in hard times as you have an issue in your data warehouse so now back to our ddl scripts and all what you have to do is to go and do the following so for example for the first table I will go and add at the end one more extra column so it start with the prefix DW as we have defined in the naming convention and then underscore let’s have the create dates and the data tabe going to be date time to and now what we can do is we can go and add a default value for it I want the database to generate these informations automatically we don’t have to specify that in any ETL scripts so which value it’s going to be the get datee so each record going to be inserted in this table will get automatically a value from the current date and time so now as you can see the naming convention it is very important all those columns comes from the source system and only this one column comes from the data engineer of the data warehouse okay so that’s it let’s go and repeat the same thing for all other tables so I will just go and add this piece of information for each ddl all right so I think that’s it all what you have to do is now to go and execute the whole ddl script for the silver layer let’s go into that all right perfect there’s no errors let’s go and refresh the tables on the object Explorer and with that as you can see we have six tables for the silver layer it is identical to the bronze layer but we have one extra column for the metadata all right so now in the server layer before we start writing any data Transformations and cleansing we have first to detect the quality issues in the pron without knowing the issues we cannot find solution right we will explore first the quality issues only then we start writing the transformation scripts so let’s [Music] go okay so now what we’re going to do we’re going to go through all the tables over the bronze layer clean up the data and then insert it to the server layer so let’s start with the first table the first bronze table from The Source CRM so we’re going to go to the bronze CRM customer info so let’s go and query the data over here now of course before writing any data and Transformations we have to go and detect and identify the quality issues of this table so usually I start with the first check where we go and check the primary key so we have to go and check whether there are nulls inside the primary key and whether there are duplicates so now in order to detect the duplicates in the primary key what we have to do is to go and aggregate the primary key if we find any value in the primary key that exist more than once that means it is not unique and we have duplicates in the table so let’s go and write query for that so what we’re going to do we’re going to go with the customer ID and then we’re going to go and count and then we have to group up the data so Group by based on the primary key and of course we don’t need all the results we need only where we have an issue so we’re going to say having counts higher than one so we are interested in the values where the count is higher than one so let’s go and execute it now as you can see we have issue in this table we have duplicates because all those IDs exist more than one in the table which is completely wrong we should have the primary key unique and you can see as well we have three records where the primary key is empty which is as well a bad thing now there is an issue here if we have only one null it will not be here at the result so what I’m going to do I’m going to go over here and say or the primary key is null just in case if we have only one null I’m still interested to see the results so if I go and run it again we’ll get the same results so this is equality check that you can do on the table and as you can see it is not meeting the expectation so that means we have to do something about it so let’s go and create a new query so here what we’re going to do we can to start writing the query that is doing the data transformation and the data cleansing so let’s start again by selecting the [Music] data and excuse it again so now what I usually do I go and focus on the issue so for example let’s go and take one of those values and I focus on it before start writing the transformation so we’re going to say where customer ID equal to this value all right so now as you can see we have here the issue where the ID exist three times but actually we are interested only on one of them so the question is how to pick one of those usually we search for a timestamp or date value to help us so if you check the creation date over here we can understand that this record this one over here is the newest one and the previous two are older than it so that means if I have to go and pick one of those values I would like to get the latest one because it holds the most fresh information so what we have to do is we have to go and rank all those values based on the create dates and only pick the highest one so that means we need a ranking function and for that in scale we have the amazing window functions so let’s go and do that we will use the function row number over and then Partition by and here we have to divide the table by the customer ID so we’re going to divide it by the customer ID and in order now to rank those rows we have to sort the data by something so order by and as we discussed we want to sort the data by the creation date so create date and we’re going to sort it descending so the highest first then the lowest so let’s go and do that and now we’re going to go and give it the name flag last so now let’s go and executed now the data is sorted by the creation date and you can see over here that this record is the number one then the one that is older is two and the oldest one is three of course we are interested in the rank number one now let’s go and moove the filter and check everything so now if you have a look to the table you can see that on the flag we have everywhere like one and that’s because the those primary Keys exist only one but sometimes we will not have one we will have two three and so on if there’s like duplicates we can go of course and do a double check so let’s go over here and say select star from this query we’re going to say where flag last is in equal to one so let’s go and query it and now we can see all the data that we don’t need because they are causing duplicates in the primary key and they have like an old status so what we’re going to do we’re going to say equal to one and with that we guarantee that our primary key is unique and each value exist only once so if I go and query it like this you will see we will not find any duplicate inside our table and we can go and check that of course so let’s go and check this primary key and we’re going to say and customer ID equal to this value and you can see it exists now only once and we are getting the freshest data from this key so with that we have defined like transformation in order to remove any D Kates okay so now moving on to the next one as you can see in our table we have a lot of values where they are like string values now for these string values we have to check the unwanted spaces so now let’s go and write a query that’s going to detect those unwanted spaces so we’re going to say select this column the first name from our table bronze customer information so let’s go and query it now by just looking to the data it’s going to be really hard to find those unwanted spaces especially if they are at the end of the world but there is a very easy way in order to detect those issues so what we’re going to do we’re going to do a filter so now we’re going to say the first name is not equal to the first name after trimming the values so if you use the function trim what it going to do it’s going to go and remove all the leading and trailing spaces so the first name so if this value is not equal to the first name after trimming it then we have an issue so it is very simple let’s go and execute it so now in the result we will get the list of all first names where we have spaces either at the start or at the end so again the expectation here is no results and the same thing we can go and check something else like for example the last name so let’s go and do that over here and here let’s go and execute it we see in the result we have as well customers where they have like space in their last name which is not really good and we can go and keep checking all the string values that you have inside the table so for example the gender so let’s go and check that and execute now as you can see we don’t have any results that means the quality of the gender is better and we don’t have any unwanted spaces so now we have to go and write transformation in order to clean up those two columns now what I’m going to do I’m just going to go and list all the column in the query instead of the star all right so now I have a list of all the columns that I need and now what we have to do is to go to those two columns and start removing The Unwanted spaces so we’ll just use the trim it’s very simple and give it a name of course the same name and we will trim as well the last name so let’s go and query this and with that we have cleaned up those two colums from any unwanted spaces okay so now moving on we have those two informations we have the marital status and as well the gender if you check the values inside those two columns as you can see we have here low cardinality so we have limited numbers of possible values that is used inside those two columns so what we usually do is to go and check the data consistency inside those two columns so it’s very simple what we’re going to do we’re going to do the following we’re going to say distinct and we’re going to check the values let’s go and do that and now as you can see we have only three possible values either null F or M which is okay we can stay like this of course but we can make a rule in our project where we can say we will not be working with data abbreviations we will go and use only friendly full names so instead of having an F we’re going to have like a full word female and instead of M we’re going to have like male and we make it as a rule for the whole project so each time we find the gender informations we try to give the full name of it so let’s go and map those two values to a friendly one so we’re going to go to the gender of over here and say case when and we’re going to say the gender is equal to F then make it a female and when it is equal to M then M it to male and now we have to make decision about the nulls as you can see over here we have nulls so do we want to leave it as a null or we want to use always the value unknown so with that we are replacing the missing values with a standard default value or you can leave it as a null but let’s say in our project that we are replacing all the missing value with a default value so let’s go and do that we going to say else I’m going to go with the na not available or you can go with the unknown of course so that’s for the gender information like this and we can go and remove the old one and now there is one thing that I usually do in this case where sometimes what happens currently we are getting the capital F and the capital M but maybe in the the time something changed and you will get like lower M and lower F so just to make sure in those cases we still are able to map those values to the correct value what we’re going to do we’re going to just use the function upper just to make sure that if we get any lowercase values we are able to catch it so the same thing over here as well and now one more thing that you can add as well of course if you are not trusting the data because we saw some unwanted spaces in the first name and the last name you might not trust that in the future you will get here as well unwanted spaces you can go and make sure to trim everything just to make sure that you are catching all those cases so that’s it for now let’s go and excute now as you can see we don’t have an m and an F we have a full word male and female and if we don’t have a value we don’t have a null we are getting here not available now we can go and do the same stuff for the Merial status you can see as well we have only three possibil ities the S null and an M we can go and do the same stuff so I will just go and copy everything from here and I will go and use the marital status I just remove this one from here and now what are the possible values we have the S so it’s going to be single we have an M for married and we have as well a null and with that we are getting the not available so with that we are making as well data standardizations for this column so let’s go and execute it now as you can see we don’t have those short values we have a full friendly value for the status and as well for the gender and at the same time we are handling the nulls inside those two columns so with that we are done with those two columns and now we can go to the last one that create date for this type of informations we make sure that this column is a real date and not as a string or barar and as we defined it in the data type it is a date which is completely correct so nothing to do with this column and now the next step is that we’re going to go and write the insert statement so how we’re going to do it we’re going to go to the start over here and say insert into silver do SRM customer info now we have to go and specify all the columns that should be inserted so we’re going to go and type it so something like this and then we have the query over here let’s go and execute it so let’s do that so with that we have inserted clean data inside the silver table so now what we’re going to do we’re going to go and take all the queries that we have used used in order to check the quality of the bronze and let’s go and take it to another query and instead of having bronze we’re going to say silver so this is about the primary key let’s go and execute it perfect we don’t have any results so we don’t have any duplicates the same thing for the next one so the silver and it was for the first name so let’s go and check the first name and run it as you can see there is no results it is perfect we don’t have any issues you can of course go and check the last name and run it again we don’t have any result over here and now we can go and check those low cardinality columns like for example the gender let’s go and execute it so as you can see we have the not available or the unknown male and female so perfect and you can go and have a final look to the table to the silver customer info let’s go and check that so now we can have a look to all those columns as you can see everything looks perfect and you can see it is working this metadata information that we have added to the table definition now it says when we have inserted all those three cords to the table which is really amazing information to have a track and audit okay so now by looking to the script we have done different types of data Transformations the first one is with the first name and the last name here we have done trimming removing unwanted spaces this is one of the types of data cleansing so we remove unnecessary spaces or unwanted characters to to ensure data consistency now moving on to the next transformation we have this casewin so what we have done here is data normalization or we call it sometimes data standardization so this transformation is type of data cleansing where we can map coded values to meaningful userfriendly description and we have done the same transformation as well to the agender another type of transformation that we have done as well in the same case when is that we have handled the missing values so instead of nulls we can have not available so handling missing data is as well type of data cleansing where we are filling the blanks by adding for example a default value so instead of having an empty string or a null we’re going to have a default value like the not available or unknown another type of data and Transformations that we have done in this script is we have removed the duplicates so removing duplicates is as well type of data cleansing where we ensure only one record for each primary key by identifying and retaining only the most relevant role to ensure there is no duplicates inside our data and as we are removing the duplicates of course we are doing data filtering so those are the different types of data Transformations that we have done in this script all right moving on to the second table in the bronze layer from the CRM we have the product info and of course as usual before we start writing any Transformations we have to search for data quality issues and we start with the first one we have to check the primary key so we have to check whether we have duplicates or nulls inside this key so what you have to do we have to group up the data by the primary key or check whether we have nulls so let’s go and execute it so as you can see everything is safe we don’t have dcat or nulls in the primary key now moving on to the next one we have the product key here we have in this column a lot of informations so now what you have to do is to go and split this string into two informations so we are deriving new two columns so now let’s start with the first one is the category ID the first five characters they are actually the category ID and we can go and use the substring function in order to extract part of a string it needs three arguments the first one going to be the column that we want to extract from and then we have to define the position where to extract and since the first part is on the left side we going to start from the first position and then we have to specify the length so how many characters we want to extract we need five characters so 1 2 3 4 five so that’s set for the category ID category ID let’s go and execute it now as you can see we have a new column called the category ID and it contains the first part of the string and in our database from the other source system we have as well the category ID now we can go and double check just in order to make sure that we can join data together so we’re going to go and check the ID from the pron table Erp and this can be from the category so in this table we have the category ID and you can see over here those are the IDS of the category and in the C layer we have to go and join those two tables but here we still have an issue we have here an underscore between the category and the subcategory but in our table we have actually a minus so we have to replace that with an underscore in order to have matching informations between those two tables otherwise we will not be able to join the tables so we’re going to use the function replace and what we are replacing we are replacing the m with an underscore something like this and if you go now and execute it we will get an underscore exactly like the other table and of course we can go and check whether everything is matching by having very simple query where we say this new information not in and then we have this nice subquery so we are trying to find any category ID that is not available in the second table so let’s go and execute it now as you can see we have only one category that is not matching we are not finding it in this table which is maybe correct so if you go over here you will not find this category I just make it a little bit bigger so we are not finding this one category from this table which is fine so our check is okay okay so with that we have the first part now we have to go and extract the second part and we’re going to do the same thing so we’re going to use the substring and the three argument the product key but this time we will not start cutting from the first position we have to be in the middle so 1 2 2 3 4 5 6 7 so we start from the position number seven and now we have to define the length how many characters to be extracted but if you look over here you can see that we have different length of the product keys it is not fixed like the category ID so we cannot go and use specified number we have to make something Dynamic and there is Trick In order to do that we can to go and use the length of the whole column with that we make sure that we are always getting enough characters to be extra Ed and we will not be losing any informations so we will make it Dynamic like this we will not have it as a fixed length and with that we have the product key so let’s go and execute it as you can see we are now extracting the second part from this string now why we need the product key we need it in order to join it with another table called sales details so let’s go and check the sales details so let me just check the column name it is SLS product key so from bronze CRM sales let’s go and check the data over here and it looks wonderful so actually we can go and join those informations together but of course we can go and check that so we’re going to say where and we’re going to take our new column and we’re going to say not in the subquery just to make sure that we are not missing anything so let’s go and execute so it looks like we have a lot of products that don’t have any orders well I don’t have a nice feelings about it let’s go and try something like this one here and we say where LS BRD key like this value over here so I’ll just cut the last three just to search inside this table so we really don’t have such a keys let me just cut the second one so let’s go and search for it we don’t have it as well so anything that starts with the FK we don’t have any order with the product where it starts with the F key so let’s go and remove it but still we are able to join the tables right so if I go and say in instead of not in so with that you are able to match all those products so that means everything is fine actually it’s just products that don’t have any orders so with that I’m happy with this transformation now moving on to the next one we have here the name of the product we can go and check whether there is unwanted spaces so let’s go to our quality checks make sure to use the same table and we’re going to use the product name and check whether we find any unmatching after trimming so let’s go and do it well it looks really fine so we don’t have to trim anything this column is safe now moving on to the next one we have the costs so here we have numbers and we have to check the quality of the numbers so what we can do we can check whether we have nulls or negative numbers so negative costs or negative prices which is not really realistic depend on the business of course so let’s say in our business we don’t have any negative costs so it’s going to be like this let’s go and check whether is something less than zero or whether we have costs that is null so let’s go and check those informations well as you can see we don’t have any negative values but we have nulls so we can go and handle that by replacing the null with a zero of course if the business allow that so in SQL server in order to replace the null with a zero we have a very nice function called is null so we are saying if it is null then replace this value with a zero it is very simple like this and we give it a name of course so let’s go and execute it and as you can see we don’t have any more nulls we have zero which is better for the calculations if you are later doing any aggregate functions like the average now moving on to the next one we have the product line This is again abbreviation of something and the cardinality is low so let’s go and check all possible values inside this column so we’re just going to use the distinct going to be BRD line so let’s go and execute it and as you can see the possible values are null Mr rst and again those are abbreviations but in our data warehouse we have decided to give full nice names so we have to go and replace those codes those abbreviations with a friendly value and of course in order to get those informations I usually go and ask the expert from the The Source system or an expert from the process so let’s start building our case win and then let’s use the upper and as well the trim just to make sure that we are having all the cases so the BRD line is equal to so let’s start with the first value the M then we will get the friendly value it’s going to be Mountain then to the next one so I will just copy and paste here if it is an R then it is rods and another one for let me check what do we have here we have Mr and then s the S stands for other sales and we have the T so let’s go and get the T so the T stands for touring we have at the end an else for unknown not available so we don’t need any nulls so that’s it and we’re going to name it as before so product line so let’s remove the old one and let’s execute it and as you can see we don’t have here anymore those shortcuts and the abbreviations we have now full friendly value but I will go and have here like capital O it looks nicer so that we have nice friendly value now by looking to this case when as you can see it is always like we are mapping one value to another value and we are repeating all time upper time upper time and so on we have here a quick form in the case when if it is just a simple mapping so the syntax is very simple we say case and then we have the column so we are evaluating this value over here and then we just say when without the equal so if it is an M then make it Mountain the same thing for the next one and so so with that we have the functions only once and we don’t have to go and keep repeating the same function over and over and this one only if you are mapping values but if you have complex conditions you can do it like this but for now I’m going to stay with the quick form of the case wi it looks nicer and shorter so let’s go and execute it we will get the same results okay so now back to our table let’s go to the last two columns we have the start and end date so it’s like defining an interval we have start and end so let’s go and check the quality of the start and end dates we’re going to go and say select star from our bronze table and now we’re going to go and search it like this we are searching for the end date that is smaller than the starts so PRT start dates so let’s let’s go and query this so you can see the start is always like after the end which makes no sense at all so we have here data issue with those two dates so now for this kind of data Transformations what I usually do is I go and grab few examples and put it in Excel and try to think about how I’m going to go and fix it so here I took like two products this one and this one over here and for that we have like three rows for each one of them and we have this situation over here so the question now how we going to go and fix it I will go and make like a copy of one solution where we’re going to say it’s very simple let’s go and switch the start date with the end date so if I go and grab the end dates and put it at the starts things going to look way nicer right so we have the start is always younger than the end but my friends the data now makes no sense because we say it starts from 2007 and ends by 2011 the price was 12 but between 2018 and 2012 we have 14 which is not really good because if you take for example the year 2010 for 2010 it was 12 and at the same time 14 so it is really bad to have an overlapping between those two dates it should start from 2007 and end with 11 and then start febe from 12 and end with something else there should be no overlapping between years so it’s not enough to say the start should be always smaller than the end but as well the end of the first history should be younger than the start of the next records this is as well a rule in order to have no overlapping this one has no start but has already an end which is not really okay because we have always to have a starts each new record in historization has to has a start so for this record over here this is as well wrong and of course it is okay to have the start without an end so in this scenario it’s fine because this indicate this is the current informations about the costs so again this solution is not working at all so now for for the solution to what we can say let’s go and ignore completely the end date and we take only the start dates so let’s go and paste it over here but now we go and rebuild the end date completely from the start date following the rules that we have defined so the rule says the end of date of the current records comes from the start date from the next records so here this end date comes from this value over here from the next record so that means we take the next start date and put it at the end date for the previous records so with that as you can see it is working the end date is higher than the start dates and as well we are making sure this date is not overlapping with the next record but as well in order to make it way nicer we can subtract it with one so we can take the previous day like this so with that we are making sure the end date is smaller than the next start now for the next record this one over here the end date going to come from the next start date so we will take this one for here and put it as an end Ag and subtract it with one so we will get the previous day so now if you compare those two you can see it’s still higher than the start and if you compare it with the NY record this one over here it is still smaller than the next one so there is no overlapping and now for the last record since we don’t have here any informations it will be a null which is totally fine so as you can see I’m really happy with this scenario over here of course you can go and validate this with an exp from The Source system let’s say I’ve done that and they approved it and now I can go and clean up the data using this New Logic so this is how I usually brainstorm about fixing an issues if I have like a complex stuff I go and use Excel and then discuss it with the expert using this example it’s way better than showing a database queries and so on it just makees things easier to explain and as well to discuss so now how I usually do it I usually go and make a focus on only the columns that I need and take only one two scenarios while I’m building the logic and once everything is ready I go and integrate it in the query so now I’m focusing only on these columns and only for these products so now let’s go and build our logic now in SQL if you are at specific record and you want to access another information from another records and for that we have two amazing window functions we have the lead and lag in this scenario we want to access the next records that’s why we have to go with the function lead so let’s go and build it lead and then what do we need we need the lead or the start date so we want the start date of the next records and then we say over and we have to partition the data so the window going to be focusing on only one product which is the product key and not the product ID so we are dividing the data by product key and of course we have to go and sort the data so order by and we are sorting the data by the start dates and ascending so from the lowest to the highest and let’s go and give it another name so as let’s say test for example just to test the data so let’s go and execute and I think I missed something here it say Partition by so let’s go and execute again and now let’s go and check the results for the first partition over here so the start is 2011 and the end is 2012 and this information came from the next record so this data is moved to the previous record over here and the same thing for this record so the end date comes from the next record so our logic is working and the last record over here is null because we are at the end of the window and there is no next data that’s why we will get null and this is perfect of course so it looks really awesome but what is missing is we have to go and get the previous day and we can do that very simply using minus one we are just subtracting one day so we have no overlapping between those two dates and the same thing for those two dates so as you can see we have just buil a perfect end date which is way better than the original data that we got from the source system now let’s take this one over here and put it inside our query so we don’t need the end H we need our new end dat we just remove that test and execute now it looks perfect all right now we are not done yet with those two dates actually we are saying all time dates because we don’t have here any informations about the time always zero so it makes no sense to have these informations inside our data so what we can do we can do a very simple cast and we make this column as a date instead of date time so this is for the first one and as well for the next one as dates so let’s try that out and as you can see it is nicer we don’t have the time informations of course we can tell the source systems about all those issues but since they don’t provide the time it makes no sense to have date and time okay so it was a long run but we have now cleaned product informations and this is way nicer than the original product information that we got from the source CRM so if you grab the ddl of the server table you can see that we don’t have a category ID so we have product ID and product key and as well those two columns we just change the data type so it’s date time here but we have changed that to a date so that means we have to go and do few modifications to the ddl so what we going to do we’re going to go over here and say category ID and I will be using the same data type and for the start and end this time it’s going to be date and not date and time so that’s it for now let’s go ah and execute it in order to repair the ddl and this is what happen in the silver layer sometimes we have to adjust the metadata if the quality of the data types and so on is not good or we are building new derived informations in order later to integrate the data so it will be like very close to the bronze layer but with few modifications so make sure to update your ddl scripts and now the next step is that we’re going to go and insert the data into the table and now the next step we’re going to go and insert the result of this query that is cleaning up the bronze table into the silver table so as we’ done it before insert into silver the product info and then we have to go and list all the columns I’ve just prepared those columns so with that we can go and now run our query in order to insert the data so now as you can see SQL did insert the data and the very important step is now to check the quality of the silver table so we go back to our data quality checks and we go switch to the silver so let’s check the primary key there is no issues and we can go and check for example here the the trims there is as well no issue and now let’s go and check the costs it should not be negative or null which is perfect let’s go and check the data standardizations as you can see they are friendly and we don’t have any nulls and now very interesting the order of the dates so let’s go and check that as you can see we don’t have any issues and finally what I do I go and have a final look to the silver table and as we can see everything is inserted correctly in the correct color colums so all those columns comes from the source system and the last one is automatically generated from the ddl indicate when we loaded this table now let’s sit back and have a look to our script what are the different types of data Transformations that we have done here is for example over here the category ID and the product key we have derived new columns so it is when we create a new column based on calculations or transformations of an existing one so sometimes we need columns only for analytics and we cannot each time go to the source system and ask them to create it so instead of that we derive our own columns that we need for the analytics another transformation we have is that is null over here so we are handling here missing information instead of null we’re going to have a zero and one more transformation we have over here for the product line we have done here data normalization instead of having a code value we have a friendly value and as well we have handled the missing data for example over here instead of having a null we’re going to have not available all right moving on to another data transformation we have done data type casting so we are converting the data type from one to another and this considered as well to be a data transformation and now moving on to the last one we are doing as well data type casting but what’s more important we are doing data enrichment this type of transformation it’s all about adding a value to your data so we are adding a new relevant data to our data sets so those are the different types of data Transformations that we have done for this table okay so let’s keep going we have the sales details and this is the last table in the CRM so what do you have over here we have the order number and this is a string of course we can go and check whether we have an issue with the unwanted spaces so we can search whether we’re going to find something so we can say trim and something like this and let’s go and execute it so we can see that we don’t have any unwanted spaces that means we don’t have to transform this column so we can leave it as it is now the next two columns they are like keys and ideas is in order to connect it with the other tables as we learned before we are using the product key in order to connect it with the product informations and we are connecting the customer ID with the customer ID from the customer info so that means we have to go and check whether everything is working perfectly so we can go and check the Integrity of those columns where we say the product key Nots in and then we make a subquery and this time we can work with the silver layer right so we can say the product key from Silver do product info so let’s go and query this and as you can see we are not getting any issue that means all the product keys from the sales details can be used and connected with the product info the same thing we can go and check the Integrity of the customer ID and we can use not the products we can go to the customer info and the name was CST ID so let’s go and query that and the same thing we don’t have here any issues so that means we can go and connect the sales with the customers using the customer ID and we don’t have to do any Transformations for it so things looks really nice for those three columns now we come to the challenging one we have here the dates now those dates are not actual dates they are integer so those are numbers and we don’t want to have it like this we would like to clean that up we have to change the data type from integer to a DAT now if you want to convert an integer to a date we have to be careful with the values that we have inside each of those columns so now let’s check the quality for example of the order dates let’s say where order dates is less than zero for example something negative well we don’t have any negative values which is good let’s go and check whether we have any zeros well this is bad so we have here a lot of zeros now what we can do we can replace those informations with a null we can use of course the null IF function like this we can say null if and if it is zero then make it null so let’s execute it and as you can see now all those informations are null now let’s go and check again the data so now this integer has the years information at the start then the months and then the day so here we have to have like 1 2 3 4 5 so the length of each number should be H and if the length is less than eight or higher than eight then we have an issue let’s go and check that so we’re going to say or length sales order is not equal to eight that means less or higher let’s go and execute it now let’s go and check the results over here and those two informations they don’t look like dates so we cannot go and make from these informations a real dates they are just bad data and of course you can go and check the boundaries of a DAT like for example it should not be higher than for example let’s go and get this value 2050 and then I need for the month and the date so let’s go and execute it and if we just remove those informations just to make sure so we don’t have any date that is outside of the boundaries that you have in your business or you go for example and say the boundary should be not less than depend when your business started maybe something like this we are getting of course those values because they are less than n but if you have values around these dates you will get it as well in the query so we can go and add the rests so all those checks like validate the column that has date informations and it has the data type integer so again what are the issues over here we have zeros and sometimes we have like strange numbers that cannot be converted to a dates so let’s go and fix that in our query so we can say case when the sales order the order date is equal to zero or of the order date is not equal to 8 then null right we don’t want to deal with those values they are just wrong and they are not real dates otherwise we say else it’s going to be the order dates now what we’re going to do we’re going to go and convert this to a date we don’t want this as an integer so how we can do that we can go and cast it first to varar because we cannot cast from integer to date in SQL Server first you have to convert it to a varar and then from varar you go to a dates well this is how we do it in scq server so we cast it first to a varar and then we cast it to a date like this that’s it so we have end and we are using the same column name so this is how we transform an integer to a date so let’s go and query this and as you can see the order date now is a real date it is not a number so we can go and get rid of the old column now we have to go and do the same stuff for the shipping dates so we can go over here and replace everything with the shipping date and let’s go query well as you can see the shipping date is perfect we don’t have any issue with this column but still I don’t like that we found a lot of issues with the order dates so what we’re going to do just in case this happens for the shipping date in the future I will go and apply the same rules to the shipping dates oh let’s take the shipping date like this and if you don’t want to apply it now you have always to build like quality checks that runs every day in order to detect those issues and once you detect it then you can go and do the Transformations but for now I’m going to apply it right away so that is for the shipping date now we go to the due date and we will do the same test let’s go and execute it and as well it is perfect so still I’m going to apply the same rules so let’s get the D everywhere here in the query just make sure you don’t miss anything here so let’s go and execute now perfect as you can see we have the order date shipping date and due date and all of them are date and don’t have any wrong data inside those columns now still there is one more check that we can do and is that the order date should be always smaller than the shipping date or the due date because it’s makes no sense right if you are delivering an item without an order so first the order should happen then we are shipping the items so there is like an order of those dates and we can go and check that so we are checking now for invalid date orders where we going to say the order date is higher than the shipping date or we are searching as well for an order where the order date date is higher than the due dates so we going to have it like this due dates so let’s go and check well that’s really good we don’t have such a mistake on the data and the quality looks good so the order date is always smaller than the shipping date or the due dates so we don’t have to do any Transformations or cleanup okay friends now moving on to the last three columns we have the sales quantity and the price all those informations are connected to each others so we have a business rule or calculation it says the sales must be equal to quantity multiplied by the price and all sales quantity and price informations must be positive numbers so it’s not allowed to be negative zero or null so those are the business rules and we have to check the data consistency in our table does all those three informations following our rules so we’re going to start first with our rule right so we’re going to say if the sales is not equal to quantity multiplied by the price so we are searching where the result is not matching our expectation and as well we can go and check other stuff like the nulls so for example we can say or sales is null or quantity is null and the last one for the price and as well we can go and check whether they are negative numbers or zero so we can go over here and say less or equal to zero and apply it for the other columns as well so with that we are checking the calculation and as well we are checking whether we have null0 Z or negative numbers let’s go and check our informations I’m going to have here A distinct so let’s go and query it and of course we have here bad data but we can go and sort the data by the sales quantity and the price so let’s do it now by looking to the data we can see in the sales we have nulls we have negative numbers and zeros so we have all bad combinations and as well we have here bad calculations so as you can see the price here is 50 the quantity is one but the sales is two which is not correct and here we have as well wrong calculations here we have to have a 10 and here nine or maybe the price is wrong and by looking to the quantity now you can see we don’t have any nulls we don’t have any zeros or negative numbers so the quantity looks better than the sales and if you look to the prices we have nulls we have negatives and yeah we don’t have zeros so that means the quality of the sales and the price is wrong the calculation is not working and we have these scenarios now of course how I do it here I don’t go and try now to transform everything on my own I usually go and talk to an expert maybe someone from the business or from the source system and I show those scenarios and discuss and usually there is like two answers either they going to tell me you know what I will fix it in my source so I have to live with it there is incoming bad data and the bad data can be presented in the warehouse until the source system clean up those issues and the other answer you might get you know what we don’t have the budget and those data are really old and we are not going to do anything so here you have to decide either you leave it as it is or you say you know what let’s go and improve the quality of the data but here you have to ask for the experts to support you solving these issues because it really depend on their rules different rules makes different Transformations so now let’s say that we have the following rules if the sales informations are null or negative or zero then use the calculation the formula by multiplying the quality with the price and now if the prices are wrong for example we have here null or zero then go and calculate it from the sales and a quantity and if you have a price that is a minus like minus 21 a negative number then you have to go and convert it to a 21 so from a negative to a positive without any calculations so those are the rules and now we’re going to go and build the Transformations based on those rules so let’s do it step by step I will go over here and we’re going to start building the new sales so what is the rule Sals case when of course as usual if the sales is null or let’s say the sales is negative number or equal to zero or another scenario we have a sales information but it is not following the calculation so we have wrong information in the sales so we’re going to say the sales is not equal to the quantity multiplied by the price but of course we will not leave the price like this by using the function APS the absolute it’s going to go and convert everything from negative to a positive then what we have to do is to go and use the calculation so so it’s going to be the quantity multiplied by the price so that means we are not using the value that come from the source system we are recalculating it now let’s say the sales is correct and not one of those scenarios so we can say else we will go with the sales as it is that comes from the source because it is correct it’s really nice let’s go and say an end and give it the same name I will go and rename the old one here as an old value and the same for the price the quantity will not T it because it is correct so like this and now let’s go and transform the prices so again as usual we go with case wi so what are the scenarios the price is null or the price is less or equal to zero then what we’re going to do we’re going to do the calculation so it going to be the sales divided by the quantity the SLS quantity but here we have to make sure that we are not dividing by zero currently we don’t have any zeros in the quantity but you don’t know future you might get a zero and the whole code going to break so what you have to do is to go and say if you get any zero replace it with a null so null if if it is zero then make it null so that’s it now if the price is not null and the price is not negative or equal to zero then everything is fine and that’s why we’re going to have now the else it’s going to be the price as it is from The Source system so that’s it we’re going to say end as price so I’m totally happy with that let’s go and execute it and check of course so those are the old informations and those are the new transformed cleaned up informations so here previously we have a null but now we have two so two multiply with one we are getting two so the sales is here correct now moving on to the next one we have in the sales 40 but the price is two so two multiplied with one we should get two so the new sales is correct it is two and not 40 now to the next one over here the old sales is zero but if you go and multiply the four with the quantity you will get four so the sales here is not correct that’s why in the new sales we have it correct as a four and let’s go and get a minus so in this case we have a minus which is not correct so we are getting the price multiplied with one we should get here a nine and this sales here is correct now let’s go and get a scenario where the price is a null like this here so we don’t have here price but we calculated from the sales and the quantity so we divided the 10 by two and we have five so the new price is better and the same thing for the minuses so we have here minus 21 and in the output we have 21 which is correct so for now I don’t see any scenario where the data is wrong so everything looks better than before and with that we have applied the business rules from the experts and we have cleaned up the data in the data warehouse and this is way better than before because we are presenting now better data for analyzes and Reporting but it is challenging and you have exactly to understand the business so now what we’re going to do we’re going to go and copy those informations and integrate it in our query so instead of sales we’re going to get our new calculation and instead of the price we will get our correct calculation and here I’m missing the end let’s go and run the whole thing again so with that we have as well now cleaned sales quantity and price and it is following our business rules so with that we are done cleaning up the sales details The Next Step we’re going to go and inserted to the sales details but we have to go and check again the ddl so now all what you have to do is to compare those results with the ddl so the first one is the order number it’s fine the product key the customer ID but here we have an issue all those informations now are date and not an integer so we have to go and change the data type and with that we have better data type than before then the sales quantity price it is correct let’s go and drop the table and create it from scratch again and don’t forget to update your ddl script so that’s it for this and we’re going to go now and insert the results into our silver table say details and we have to go and list now all the columns I have already prepared the list of all the columns so make sure that you have the correct order of the columns so let’s go now and insert the data and with that and with that we can see that the SQL did insert data to our sales details but now very important is to check the health of the silver table so what we going to do instead here of bronze we’re going to go and switch it to Silver so let’s check over here so here always the order is smaller than the shipping and the due date which is really nice but now I’m very interested on the calculations so here we’re going to switch it from bronze to Silver and I’m going to go and get rid of all those calculations because we don’t need it this and now let’s see whether we have any issue well perfect our data is following the business rules we don’t have any nulls negative values zeros now as usual the last step the final check we will just have a final look to the table so we have the order number the product key the customer ID the three dates we have have the sales quantity and the price and of course we have our metadata column everything is perfect so now by looking to our code what are the different types of data Transformations that we are doing so in those three columns we are doing the following so at the start we are handling invalid data and this is as well type of transformation and as well at the same time we are doing data type casting so we are changing it to more correct data type and if you are looking to the sales over here then what we are doing over here is we are handling the missing data and as well the invalid data by deriving the column from already existing one and it is as well very similar for the price we are handling as well the invalid data by deriving it from specific calculation over here so those are the different types of data Transformations that you have done in these scripts all right now let’s keep moving to the next our system we have the customer AZ 12 so here we have we have like only three columns and let’s start with the ID first so here again we have the customers informations and if we go and check again our model you can see that we can connect this table with the CRM table customer info using the customer key so that means we have to go and make sure that we can go and connect those two tables so let’s go and check the other table we can go and check of course the silver layer so let’s query it and we can query both of the tables now we can see there is here like exract characters that are not included in the customer key from the CRM so let’s go and search for example for this customer over here where C ID like so we are searching for customer has similar ID now as you can see we are finding this customer but the issue is that we have those three characters in as there is no specifications or explanation why we have the nas so actually what we have to do is to go and remove those informations we don’t need it so let’s again check the data so it looks like the old data have an Nas at the start and then afterward we have new data without those three characters so we have to clean up those IDs in order to be able to connect it with other tables so we’re going to do it like this we’re going to start with the case wiin since we have like two scenarios in our data so if the C ID is like the three characters in as so if the ID start with those three characters then we’re going to go and apply transformation function otherwise eyes it’s going to stay like it is so that’s it so now we have to go and build the transformation so we’re going to use substring and then we have to define the string it’s going to be the C ID and then we have to define the position where it start cutting or extracting so we can say 1 2 3 and then four so we have to define the position number four and then we have to define the string how many characters should be extracted I will make it Dynamic so I will go with the link I will not go and count how much so we’re going to say the C ID so it looks good if it’s like an as then go and extract from the CID at the position number four the rest of the characters so let’s go and execute it and I’m missing here a comma again where we don’t have any Nas at the start and if you scroll down you can see those as well are not affected so with that we have now a nice ID to be joined with other table of course we can go and test it like this where and then we take the whole thing the whole transformation and say not in we remove of course the alas name we don’t need it and then we make very simple substring select distinct CST key the customer key from the silver table can be silver CRM cost info so that’s it let’s go and check so as you can see it is working fine so we are not able to find any unmatching data between the customer info from ERB and the CRM but of course after the transformation if you don’t use the transformation so if I just remove it like this we will find a lot of unmatching data so this means our transformation is working perfectly and we can go and remove the original value so that’s it for the First Column okay now moving on to the next field we have the birthday of their customers so the first thing to do is to check the data type it is a date so it’s fine it is not an integer or a string so we don’t have to convert anything but still there is something to check with the birth dates so we can check whether we have something out of range so for example we can go and check whether we have really old dates at the birth dates so let’s take 1900 and let’s say 24 and we can take the first date of the month so let’s go and check that well it looks like that we have customers that are older than a 100 Year well I don’t know maybe this is correct but it sounds of course strange to bit of the business of course this is Creed and he is in charge of something that is correct say hi to the kids hi kids yay and then we can go and check the other boundary where it is almost impossible to have a customer that the birthday is in the future so we can say birth date is higher than the current dates like this so let’s go and query this information well it will not work because we have to have like an or between them and now if we check the list over here we have dates that are invalid for the birth dates so all those dates they are all birthday in the future and this is totally unacceptable so this is an indicator for bad data quality of course you can go and report it to the source system in order to correct it so here it’s up to you what to do with those dates either leave it as it is as a bad data or we can go and clean that up by replacing all those dates with a null or maybe replacing only the one that is Extreme where it is 100% is incorrect so let’s go and write the transformation for that as usual we’re going to start with case whenn per dates is larger than the current date and time then null otherwise we can have an else where we have the birth dat as it is and then we have an end as birth date so let’s go and excuse it and with that we should not get any customer we the birthday in the future so that’s it for the birth dates now let’s move to the next one we have the gender now again the gender informations is localities so we have to go and check all the possible values inside this column so in order to check all the possible values we’re going to use select distinct gen from our table so let’s go and execute it and now the data doesn’t look really good so we have here a null we have an F we have here an empty string we have male female and again we have the m so this is not really good what we going to do we’re going to go and clean up all those informations in order to have only three values male female and not available so we’re going to do it like this we’re going to say case when and now we’re going to go and trim the values just to make sure there is like no empty spaces and as well I’m going to go and use the upper function just to make sure that in the future if we get any lower cases and so on we are covering all the different scenarios so case this is in F4 let’s say female then make it as female and we can go and do the same thing for the male like this so if it is an M or a male make sure it is capital letters because here we are using the upper then it is a male otherwise all other scenarios it should be not available so whether it is an empty string or nulls and so on so we have to have an end of course as gen so now let’s go and test it and check whether we have covered everything so you can see the m is now male the empty is not available the f is female the empty string or maybe spaces here is not available female going to stay as it is and the same for the male so with that we are covering all the scenarios and we are following our standards in the project so I’m going to go and cut this and put it in our original query over here so let’s go and execute the whole thing and with that we have cleaned up all those three columns now the question is did we change anything in the ddl well we didn’t change anything we didn’t introduce any new column or change any data type so that means the next step is we’re going to go and insert it in the server layer so as usual we’re going to say here insert into silver Erp the customer and then we’re going to go and list all the column names so C ID birth dat and the gender all right so let’s go and execute it and with that we can see it inserted all the data and of course the very important step as the next is to check that data quality so let’s go back to our query over here and change it from bronze to Silver so let’s go and check the silver layer well of course we are getting those very old customers but we didn’t change that we only change the birthday that is in the future and we don’t see it here in the results so that means everything is clean so for the next one let’s go and check the different genders and as you can see we have only those three values and of course we can go and take a final look to our table so you can see the C ID here the birth date the gender and then we see our metadata column and everything looks amazing so that’s it what are the different types of data Transformations that we have done first with the ID what you have done we have handled inv valid values so we have removed this part where it is not needed and the same thing goes for the birth dates we have handled as well invalid values and then for the last one for the gender we have done data normalizations by mapping the code to more friendly value and as well we have handled the missing values so those are the types that we have done in this code okay moving on to the second table we have the location informations so we have Erp location a101 so now here the task is easy because we have only two columns and if you go and check the integration model we can find our table over here so we can go and connect it together with the customer info from the other system using the CI ID with the customer key so those two informations must be matching in order to join the tables so that means we have to go and check the data so let’s go and select the data CST key from let’s go and get the silver Data customer info so let’s now if you go and check the result you can see over here that we have an issue with the CI ID there is like a minus between the characters and the numbers but the customer ID the customer number we don’t have anything that splits the characters with the numbers so if you go and join those two informations it will not be working so what we have to do we have to go and get rid of this minus because it is totally unnecessary so let’s go and fix that it’s going to be very simple so what we’re going to do we’re going to say C ID so we’re going to go and search for the m and replace it with nothing it’s very simple like this so let’s go and quer it again and with that things looks very similar to each others and as well we can go and query it so we’re going to say where our transformation is not in then we can go and use this as a subquery like this so let’s go and execute it and as you can see we are not finding any unmatching data now so that means our transformation is working and with that we can go and connect those two tables together so if I take take the transformation away you can see that we will find a lot of unmatching data so the transformation is okay we’re going to stay with it and now let’s speak about the countries now we have here multiple values and so on what I’m going to do this is low cardinality and we have to go and check all possible values inside this column so that means we are checking whether the data is consistent so we can do it like this distinct the country from our table I’m just going to go and copy it like this and as well I’m going to go s the data by the country so let’s go and check the informations now you can see we have a null we have an empty string which is really bad and then we have a full name of country and then we have as well an abbreviation of the countries well this is a mix this is not really good because sometimes we have the E and sometimes we have Germany and then we have the United Kingdom and then for the United States we have like three versions of the same information which is as well not really good so the quality of the is not really good so let’s go and work on the transformation as usual we’re going to start with the case win if trim country is equal to D then we’re going to transform it to Germany and the next one it’s going to be about the USA so if trim country is in so now let’s go and get those two values the US and the USA so us and USA then it’s going to be the United States States states so with that we have covered as well those three cases now we have to talk about the null and the empty string so we’re going to say when trim country is equal to empty string or country is null then it’s going to be not available otherwise I would like to get the country as it is so trim country just to make sure that we don’t have any leading or trailing spaces so that’s it let’s go and say this is the country so it is working and the country information is transformed and now what I’m going to do I’m going to take the whole new transformation and compare it to the old one let me just call this as old country and let’s go and query it so now we can check those value State as before so nothing did change the de is now Germany the empty string is not available the null the same thing and the United Kingdom State as like it’s like before and now we have one value for all those information so it’s only the United States so it looks perfect and with that we have cleaned as well the second column so with that we have now clean results and now the question did we change anything in the ddl well we haven’t changed anything both of them are varar so we can go now immediately and insert it into our table so insert into silver customer location and here we have to specify the columns it’s very simple the ID and the country so let’s go and execute it and as you can see we got now inserted all those values of course as a next we go and double check those informations I would just go and remove all those stuff as well here and instead of bronze let’s go with the silver so as you can see all the values of the country looks good and let’s have a final look to the table so like this so we have the IDS without the separator we have the countries and as well our metadata information so with that we have cleaned up the data for the location okay so now what are the different types of data transformation that we have done here is first we have handled invalid values so we have removed the minus with an empty string and for the country we have done data normalization so we have replaced codes with friendly values and as well at the same time we have handled missing values by replacing the empty string and null with not available and one more thing of course we have removed the unwanted spaces so those are the different types of transformation that we have done for this table okay guys now keep the energy up keep the spirit up we have to go and clean up the last table in the bronze layer and of course we cannot go and Skip anything we have to check the quality and to detect all the errors so now we have a table about the categories for the products and here we have like four columns let’s go and start with the first one the ID as you can see in our integration model we can connect this table together with the product info from the CRM using the product key and as as you remember in the silver layer we have created an extra column for that in the product info so if you go and select those data you can see we have a column called category ID and this one is exactly matching the ID that we have in this table and we have done the testing so this ID is ready to be used together with the other table so there is nothing to do over here and now for the next columns they are string and of course we can go and check whether there are any unwanted spaces so we are checking for The Unwanted spaces is so let’s go and check select star from and we’re going to go and get the same table like this here and first we are checking the category so the category is not equal to the category after trimming The Unwanted spaces so let’s go and execute it and as you can see we don’t have any results so there are no unwanted spaces let’s go and check the other column for example the subcategory the next one so let’s get the subcategory and the under query as well we don’t have anything so that means we don’t have unwanted spaces for the subcategory let’s go now and check the last column so I will just copy and paste now let’s get the maintenance and let’s go and execute and as well no results perfect we don’t have any unwanted spaces inside this table so now the next step is that we’re going to go and check the data standardizations because all those columns has low cardinality so what we’re going to do we’re going to say select this thing let’s get the cat category from our table I’ll just copy and paste it and check all values so as you can see we have the accessories bikes clothing and components everything looks perfect we don’t have to change anything in this column let’s go and check the subcategory and if you scroll down all values are friendly and nice as well nothing to change here and let’s go and check the last column the maintenance perfect we have only two values yes and no we don’t have any nulls so my friends that means this table has really nice data quality and we don’t have to clean up anything but still we have to follow our process we have to go and load it from the bronze to the silver even if we didn’t transform anything so our job is really easy here we’re going to go and say insert into silver dots Erp PX and so on and we’re going to go and Define The Columns so it’s going to be the ID the category sub category maintenance so that’s it let’s go and insert the data now as usual what we’re going to do we’re going to go and check the data so silver Erp PX let’s have a look all right so we can see the IDS are here the categories the subcategories the maintenance and we have our meta column so everything is inserted correctly all right so now I have all those queries and the insert statements for all six tables and now what is important before inserting any data we have to make sure that we are trating and emptying the table because if you run this qu twice what’s going to happen you will be inserting duplicates so first truncate the data and then do a full load insert all data so we’re going to have one step before it’s like the bronze layer we’re going to say trate table and then we will be trating the silver customer info and only after that we have to go and insert the data and of course we can go and give this nice information at the start so first we are truncating the table and then inserting so if I go and run the whole thing so let’s go and do it it will be working so if I can run it again we will not have any duplicates so we have to go and add this tip before each insert so let’s go and do that all right so I’m done with all tables so now let’s go and run everything so let’s go and execute it and we can see in the messaging everything working perfectly so with that we made all tables empty and then we inserted the data so perfect with that we have a nice script that loads the silver layer but of course like the bronze layer we’re going to put everything in one stored procedure so let’s go and do that we’ll go to the beginning over here and say create or alter procedure and we’re going to put it in the schema silver and using the naming convention load silver and we’re going to go over here and say begin and take the whole code end it is long one and give it one push with a tab and then at the end we’re going to say and perfect so we have our s procedure but we forgot here the US with that we will not have any error let’s go and execute it so the thir procedure is created if you go to the programmability and you will find two procedures load bronze and load silver so now let’s go and try it out all what you have to do is now only to execute the Silver Load silver so let’s execute the start procedure and with that we will get the same results this thir procedure now is responsible of loading the whole silver layer now of course the messaging here is not really good because we have learned in the bronze layer we can go and add many stuff like handling the error doing nce messaging catching the duration time so now your task is to pause the video take this thir procedure and go and transform it to be very similar to the bronze layer with the same messaging and all the add-ons that we have added so pause the video now I will do it as well offline and I will see you soon okay so I hope you are done and I can show you the results it’s like the bronze layer we have defined at the star few variables in order to catch the duration so we have the start time the end time patch start time and Patch end time and then we are printing a lot of stuff in order to have like nice messaging in the outut so at the start we are saying loading the server layer and then we start splitting by The Source system so loading the CRM tables and I’m going to show you only one table for now so we are setting the timer so we are saying start time get the dat date and time informations to it then we are doing the usual we are truncating the table and then we are inserting the new informations after cleaning it up and we have this nice message where we say load duration where we are finding the differences between the start time and the end time using the function dat diff and we want to show the result in the seconds so we are just printing how long it took to load this table and we’re going to go and repeat this process for all the tables and of course we are putting everything in try and Cat so the SQL going to go and try to execute the tri part and if there are any issues the SQL going to go and execute the catch and here we are just printing few information like the error message the error number and the error States and we are following exactly the same standard at the bronze layer so let’s go and execute the whole thing and with that we have updated the definition of the S procedure let’s go now and execute it so execute silver do load silver so let’s go and do that it went very fast like few than 1 second again because we are working on local machine loading the server layer loading the CRM tables and we can see this nice messaging so it start with trating the table inserting the data and we are getting the load duration for this table and you will see that everything is below 1 second and that’s because at in real project you will get of course more than 1 second so at the end we have low duration of the whole silver layer and now I have one more thing for you let’s say that you are changing the design of this thr procedure for the silver layer you are adding different types of messaging or maybe are creating logs and so on so now all those new ideas and redesigns that you are doing for the silver layer you have always to think about bringing the same changes as well in the other store procedure for the pron layer so always try to keep your codes following the same standards don’t have like one idea in One S procedure and an old idea in another one always try to maintain those scripts and to keep them all up to date following the same standards otherwise it can to be really hard for other developers to understand the cause I know that needs a lot of work and commitments but this is your job to make everything following the best practices and following the same naming convention and standards that you put for your projects so guys now we have very nice two ETL scripts one that loads the pron layer and another one for the server layer so now our data bear house is very simple all what you have to do is to run first the bronze layer and with that we are taking all the data from the CSV files from the source and we put it inside our data warehouse in the pron layer and with that we are refreshing the whole bronze layer once it’s done the next step is to run the start procedure of the servey layer so once you executed you are taking now all the data from the bronze layer transforming it cleaning it up and then loading it to the server layer and as you can see the concept is very simple we are just moving the data from one layer another layer with different tasks all right guys so as you can see in the silver layer we have done a lot of data Transformations and we have covered all the types that we have in the data cleansing so we remove duplicates data filtering handling missing data invalid data unwanted spaces casting the data types and so on and as well we have derived new columns we have done data enrichment and we have normalized a lot of data so now of course what we have not done yet business rules and logic data aggregations and data integration this is for the next layer all right my friends so finally we are done cleaning up the data and checking the quality of our data so we can go and close those two steps and now to the next step we have to go and extend the data flow diagram so let’s go okay so now let’s go and extend our data flow for the silver layer so what I’m going to do I’m just going to go and copy the whole thing and put it side by side to the bronze layer and let’s call it silver layer and the table names going to stay as before because we have like one to one like the bronze layer but what we’re going to do we’re going to go and change the coloring so I’m going to go and Mark everything and make it gray like silver and of course what is very important is to make the lineage so I’m going to go now from the bronze and take an arrow and put it to the server table and now with that we have like a lineage between three layers and you are checking this table the customer info you can understand aha this comes from the bronze layer from the customer info and as well this comes from the source system CRM so now you can see the lineage between different layers and without looking to any scripts and so on in one picture you can understand the whole projects so I don’t have to explain a lot of stuff by just looking to this picture you can understand how the data is Flowing between sources bronze layer silver layer and to the gold layer of course later so as you can see it looks really nice and clean all right so with that we have updated the data flow next we’re going to go and commit our work in the get repo so let’s go okay so now let’s go and commit our scripts we’re going to go to the folder scripts and here we have a server layer if you don’t have it of course you can go and create it so first we’re going to go and put the ddl scripts for the server layer so let’s go and I will paste the code over here and as usually we have this comment at the header explaining the purpose of this scripts so let’s go and commit our work work and we’re going to do the same thing for the start procedure that loads the silver layer so I’m going to go over here I have already file for that so let’s go and paste that so we have here our stored procedures and as usual at the start we have as well so this script is doing the ETL process where we load the data from bronze into silver so the action is to truncate the table first and then insert transformed cleans data from bronze to Silver there are no parameters at all and this is how you can use the start procedure okay so we’re going to go and commit our work and now one more thing that we want to commit in our project all those quaries that you have built to check the quality of the server layer so this time we will not put it in the scripts we’re going to go to the tests and here we’re going to go and make a new file called quality checks silver and inside it we’re going to go and paste all the queries that we have filled I just here reorganize them by the tables so here we can see all the checks that we have done during the course and at the header we have here nice comments so here we are just saying that this script is going to check the quality of the server layer and we are checking for nulls duplicates unwanted spaces invalid date range and so on so that each time you come up with a new quality check I’m going to recommend you to share it with the project and with other team in order to make it part of multiple checks that you do after running the atls so that’s it I’m going to go and put those checks in our repo and in case I come up with new check I’m going to go and update it perfect so now we have our code in our repository all right so with that our code is safe and we are done with the whole epic so we have build the silver layer now let’s go and minimize it and now we come to my favorite layer the gold layer so we’re going to go and build it the first step as usual we have to analyze and this time we’re going to explore the business objects so let’s go all right so now we come to the big question how we going to build the gold layer as usual we start with analyzing so now what we’re going to do here is to explore and understand what are the main business objects that are hidden inside our source system so as you can see we have two sources six files and here we have to identify what are the business objects once we have this understanding then we can start coding and here the main transformation that we are doing is data integration and here usually I split it into three steps the first one we’re going to go and build those business objects that we have identified and after we have a business object we have to look at it and decide what is the type of this table is it a dimension is it a fact or is it like maybe a flat table so what type of table that we have built and the last step is of course we have now to rename all the columns into something friendly and easy to understand so that our consumers don’t struggle with technical names so once we have all those steps what we’re going to do it’s time to validate what we have created so what we have to do the new data model that we have created it should be connectable and we have to check that the data integration is done correctly and once everything is fine we cannot skip the last step we have to document and as well commit our work in the git and here we will be introducing new type of documentations so we’re going to have a diagram about the data model we’re going to build a data dictionary where we going to describe the data model and of course we can extend the data flow diagram so this is our process those are the main steps that we will do in order to build the gold layer okay so what is exactly data modeling usually usually the source system going to deliver for you row data an organized messy not very useful in its current States but now the data modeling is the process of taking this row data and then organize it and structure it in meaningful way so what we are doing we are putting the data in a new friendly and easy to understand objects like customers orders products each one of them is focused on specific information and what is very important is we’re going to describe the relationship between those objects so by connecting them using lines so what you have built on the right side we call it logical data model if you compare to the left side you can see the data model makes it really easy to understand our data and the relationship the processes behind them now in data modeling we have three different stages or let’s say three different ways on how to draw a data model the first stage is the conceptual data model here the focus is only on the entity so we have customers orders products and we don’t go in details at all so we don’t specify any columns or attributes inside those boxes we just want to focus what are the entities that we have and as well the relationship between them so the conceptual data model don’t focus at all on the details it just gives the big picture so the second data model that we can build is The Logical data model and here we start specifying what are the different columns that we can find in each entity like we have the customer ID the first name last name and so on and we still draw the relationship between those entities and as well we make it clear which columns are the primary key and so on so as you can see we have here more details but one thing we don’t describe a lot of details for each column and we are not worry how exactly we going to store those tables in the database the third and last stage we have the physical data model this is where everything gets ready before creating it in the database so here you have to add all the technical details like adding for each column the data types and the length of each data type and many other database techniques and details so again if if you look to the conceptual data model it gives us the big picture and in The Logical data model we dive into details of what data we need and the physical layer model prepares everything for the implementation in the database and to be honest in my projects I only draw the conceptual and The Logical data model because drawing and building the physical data model needs a lot of efforts and time and there are many tools like in data bricks they automatically generate those models so in this project what we’re going to do we’re going to draw The Logical data model for the gold layer all right so now for analytics and specially for data warehousing and business intelligence we need a special data model that is optimized for reporting and analytics and it should be flexible scalable and as well easy to understand and for that we have two special data models the first type of data model we have the star schema it has a central fact table in the middle and surrounded by Dimensions the fact table contains transactions events and the dimensions contains descriptive informations and the relationship between the fact table in the middle and the dimensions around it forms like a star shape and that’s why we call it star schema and we have another data model called snowflake schema it looks very similar to the star schema so we have again the fact in the middle and surrounded by Dimensions but the big difference is that we break the dimensions into smaller subdimensions and the shape of this data model as you are extending the dimensions it’s going to look like a snowflake so now if you compare them side by side you can see that the star schema looks easier right so it is usually easy to understand easy to query it is really perfect for analyzes but it has one issue with that the dimension might contain duplicates and your Dimensions get bigger with the time now if you compare to the snowflake you can see the schema is more complex you so you need a lot of knowledge and efforts in order to query something from the snowflake but the main advantage here comes with the normalization as you are breaking those redundancies in small tables you can optimize the storage but to be honest who care about the storage so for this project I have chose to use the star schema because it is very commonly used perfect for reporting like for example if you’re using power pii and we don’t have to worry about the storage so that’s why we going to adapt this model to build our gold layer okay so now one more thing about those data models is that they contain two types of tables fact and dimensions so when I I say this is a fact table or a dimension table well the dimension contains descriptive informations or like categories that gives some context to your data for example a product info you have product name category subcategories and so on this is like a table that is describing the product and this we call it Dimension but in the other hand we have facts they are events like transactions they contain three important informations first you have multiple IDs from multiple dimensions then we have like the informations like when the transaction or the event did happen and the third type of information you’re going to have like measures and numbers so if you see those three types of data in one table then this is a fact so if you have a table that answers how much or how many then this is a fact but if you have a table that answers who what where then this is a dimension table so this is what dimension and fact tables all right my friends so so far in the bronze layer and in the silver layer we didn’t discuss anything about the business so the bronze and silver were very technical we are focusing on data Eng gestion we are focusing on cleaning up the data quality of the data but still the tables are very oriented to the source system now comes the fun part in the god layer where we’re going to go and break the whole data model of the sources so we’re going to create something completely new to our business that is easy to consume for business reporting and analyzes and here it is very very important to have a clear understanding of the business and the processes and if you don’t know it already at this phase you have really to invest time by meeting maybe process experts the domain experts in order to have clear understanding what we are talking about in the data so now what we’re going to do we’re going to try to detect what are the business objects that are hidden in the source systems so now let’s go and explore that all right now in order to build a new data model I have to understand first the original data model what are the main business objects that we have how things are related to each others and this is very important process in building a new model so now what I usually do I start giving labels to all those tables so if you go to the shapes over here let’s go and search for label and if you go to more icons I’m going to go and take this label over here so drag and drop it and then I’m going to go and increase maybe the size of the font so let’s go with 20 and bold just make it a little bit bigger so now by looking to this data model we can see that we have a bradu for informations in the CRM and as well in the ARP and then we have like customer informations and transactional table so now let’s focus on the product so the product information is over here we have here the current and the history product informations and here we have the categories that’s belong to the products so in our data model we have something called products so let’s go and create this label it’s going to be the products and so let’s go and give it a color to the style let’s Pi for example the red one now let’s go and move this label and put it beneath this table over here that I have like a label saying this table belongs to the objects called products now I’m going to do the same thing for the other table over here so I’m going to go and tag this table to the product as well so that I can see easily which tables from the sources does has informations about the product business object all right now moving on we have here a table called customer information so we have a lot of information about the customer we have as well in the ARB customer information where we have the birthday and the country so those three tables has to do with the object customer so that means we’re going to go and label it like that so let’s call it customer and I’m going to go and pick different color for that let’s go with the green so I will tag this table like this and the same thing for the other tables so copy tag the second table and the third table now it is very easily for me to see which table to belong to which business objects and now we have the final table over here and only one table about the sales and orders in the ARB we don’t have any informations about that so this one going to be easy let’s call it sales and let’s move it over here and as well maybe change the color of that to for example this color over here now this step is very important by building any data model in the gold layer it gives you a big picture about the things that you are going to module so now the next step with that we’re going to go and build those objects step by step so let’s start with the first objects with our customers so here we we have three tables and we’re going to start with the CRM so let’s start with this table over here all right so with that we know what are our business objects and this task is done and now in The Next Step we’re going to go back to SQL and start doing data Integrations and building completely new data model so let’s go and do that now let’s have a quick look to the gold layer specifications so this is the final stage we’re going to provide data to be consumed by reporting and Analytics and this time we will not be building tables we will be using views so that means we will not be having like start procedure or any load process to the gold layer all what you are doing is only data transformation and the focus of the data transformation going to be data integration aggregation business logic and so on and this time we’re going to introduce a new data model we will be doing star schema so those are the specifications for the gold layer and this is our scope so this time we make sure that we are selecting data from the silver layer not from the bronze because the bronze has bad data quality and the server is everything is prepared and cleaned up in order to build the good layer going to be targeting the server layer so let’s start with select star from and we’re going to go to the silver CRM customer info so let’s go and hit execute and now we’re going to go and select the columns that we need to be presented in the gold layer so let’s start selecting The Columns that we want we have the ID the key the first name I will not go and get the metadata information this only belongs to the Silver Perfect the next step is that I’m going to go and give this table an ilas so let’s go and call it CI and I’m going to make sure that we are selecting from this alas because later we’re going to go and join this table with other tables so something like this so we’re going to go with those columns now let’s move to the second table let’s go and get the birthday information so now we’re going to jump to the other system and we have to join the data by the CI ID together with the customer key so now we have to go and join the data with another table and here I try to avoid using the inner join because if the other table doesn’t have all the information about the customers I might lose customers so always start with the master table and if you join it with any other table in order to get informations try always to avoid the inner join because the other source might not have all the customers and if you do inner join you might lose customers so iend to start from the master table and then everything else is about the lift join so I’m going to say Lift join silver Erp customer a z12 so let’s give it the ls CA and now we have to join the tables so it’s going to be by C from the first table it going to be the customer key equal to ca and we have the CI ID now of course we’re going to get matching data because we checked the silver layer but if we haven’t prepared the data in the silver layer we have to do here preparation step in order to join Jo the tables but we don’t have to do that because that was a preep in the silver layer so now you can see the systematic that we have in this pron silver gold so now after joining the tables we have to go and pick the information that we need from the second table which is the birth dat so B dat and as well from this table there is another nice information it is the gender information so that’s all what we need from the second table let’s go and check the third table so the third table is about the location information the countries and as well we connect the tables by the C ID with the key so let’s go and do that we’re going to say as well left join silver Erp location and I’m going to give it the name LA and then we have to join while the keys the same thing it’s going to be CI customer key equal to La a CI ID again we have prepared those IDs and keys in the server layer so the joint should be working now we have to go and pick the data from the second table so what do we we have over here we have the ID the country and the metadata information so let’s go and just get the country perfect so now with that we have joined all the three tables and we have picked all the columns that we want in this object so again by looking over here we have joined this table with this one and this one so with that we have collected all the customer informations that we have from the two Source systems okay so now let’s go and query in order to make sure that we have everything correct and in order to understand that your joints are correct you have to keep your eye in those three columns so if you are seeing that you are getting data that means you are doing the the joints correctly but if you are seeing a lot of nulls or no data at all that means your joints are incorrect but now it looks for me it is working and another check that I do is that if your first table has no duplicates what could happen is that after doing multiple joints you might now start getting dgates because the relationship between those tables is not clear one to one you might get like one to many relationship or many to many relationships so now the check that I usually do at this stage advance I have to make sure that I don’t have duplicates from their results so we don’t have like multiple rows for the same customer so in order to do that we go and do a quick group bu so we’re going to group by the data by the customer ID and then we do the counts from this subquery so this is the whole subquery and then after that we’re going to go and say Group by the customer ID and then we say having counts higher than one so this query actually try to find out whether we have any duplicates in the primary key so let’s go and executed we don’t have any duplicate and that means after joining all those tables with the customer info those tables didn’t didn’t cause any issues and it didn’t duplicate my data so this is very important check to make sure that you are in the right way all right so that means everything is fine about the D Kates we don’t have to worry about it now we have here an integration issue so let’s go and execute it again and now if you look to the data we have two sources for the gender informations one comes from the CRM and another where come from the Erp so now the question is what are we going to do with this well we have to do data integration so let me show you how I do it first I go and have a new query and then I’m going to go and remove all other stuff and I’m going to leave only those two informations and use it distinct just to focus on the integration and let’s go and execute it and maybe as well to do an order bu so let’s do one and two let’s go and execute it again so now here we have all the scenarios and we can see sometimes there is a matching so from the first table we have female and the other table we have as well female but sometimes we have an issue like those two tables are giving different informations and the same thing over here so this is as well an issue different informations another scenario where we have a from the first table like here we have the female but in the other table we have not available well this is not a problem so we can get it from the first table but we have as well the exact opposite scenario where from the first table the data is not available but it is available from the second table and now here you might wonder why I’m getting a null over here we did handle all the missing data in the silver layer and we replace everything with not available so why we are still getting a null this null doesn’t come directly from the tables it just come because of joining tables so that means there are customers in the CRM table that is not available in the Erb table and if there is like no match what’s going to happen we will get a null from scel so this null means there was no match and that’s why we are getting this null it is not coming from the content of the tables and this is of course an issue but now the big issue what can happen for those two scenarios here we have the data but they are different and here again we have to ask the experts about it what is the master here is it the CRM system or the ARP and let’s say from their answer going to say the master data for the customer information is the CRM so that means the CRM informations are more accurate than the Erp information and this is only about the customers of course so for this scenario where we have female and male then the correct information is the female from the First Source system the same goes over here and here we have like male and female then the correct one is is the mail because this Source system is the master okay so now let’s go and build this business rule we’re going to start as usual with the case wi so the first very important rule is if we have a data in the gender information from the CRM system from the master then go and use it so we’re going to go and check the gender information from the CRM table so customer gender is not equal to not available so that means we have a value male or female let me just have here a comma like this then what going to happen go and use it so we’re going to use the value from the master CRM is the master for gender info now otherwise that means it is not available from the CRM table then go and use and grab the information from the second table so we’re going to say ca gender but now we have to be careful this null over here we have to convert it to not available as well so we’re going to use the Calis so if this is a null then go and use the not available like this so that’s it let’s have an end let me just push this over here so let’s go and call it new chin for now let’s go and excute it and let’s go and check the different scenarios all those values over here we have data from the CRM system and this is as well represented in the new column but now for the second parts we don’t have data from the first system so we are trying to get it from the second system so for the first one is not available and then we try to get it from the Second Source system so now we are activating the else well it is null and with that the CIS is activated and we are replacing the null with not available for the second scenario as well the first system don’t have the gender information that’s why we are grabbing it from the second so with that we have a female and then the third one the same thing we don’t have information but we get it from the Second Source system we have the mail and the last one it is not available in in both Source systems that’s why we are getting not available so with that as you can see we have a perfect new column where we are integrating two different Source system in one and this is exactly what we call data integration this piece of information it is way better than the source CRM and as well the source ARP it is more rich and has more information and this is exactly why we Tred to get data from different Source system in order to get rich information in the data warehouse so do we have a nice logic and as you can see it’s way easier to separate it in separate query in order first to build the logic and then take it to the original query so what I’m going to do I’m just going to go and copy everything from here and go back to our query I’m going to go and delete those informations the gender and I will put our new logic over here so a comma and let’s go and execute so with that we have our new nice column now with that we have very nice objects we don’t have delates and we have integrated data together so we took three three tables and we put it in one object now the next step is that we’re going to go and give nice friendly names the rule in the gold layer that to use friendly names and not to follow the names that we get from The Source system and we have to make sure that we are following the rules by the naming conventions so we are following the snake case so let’s go and do it step by step for the first one let’s go and call it the customer ID and then the next one I will get rid of using keys and so on I’m going to go and call it customer number because those are customer numbers then for the next one we’re going to call it first name without using any prefixes and the next one last name and we have here marital status so I will be using the exact name but without the prefix and here we just going to call it gender and this one we going to call it create date and this one birth dat and the last one going to be the country so let’s go and execute it now as you can see the names are really friendly so we have customer ID customer numbers first name last name material status gender so as you can see the names are really nice and really easy to understand now the next step I’m going to think about the order of those columns so the first two it makes sense to have it together the first name last name then I think the country is very important information so I’m going to go and get it from here and put it exactly after the last name it’s just nicer so let’s go and execute it again so the first name last name country it’s always nice to group up relevant columns together right so we have here the status of the gender and so on and then we have the CATE date and the birth date I think I’m going to go and switch the birth date with the CATE date it’s more important than the CATE dates like this and here not forget a comma so execute again so it looks wonderful now comes a very important decision about this objects is it a fact table or a dimension well as we learned Dimensions hold descriptive information about an object and as you can see we have here a descriptions about the customers so all those columns are describing the customer information and we don’t have here like transactions and events and we don’t have like measures and so on so we cannot say this object is a fact it is clearly a dimension so that’s why we’re going to go and call this object the dimension customer now there is one thing that if you creating a new dimension you need always a primary key for the dimension of course we can go over here and the depend on the primary key that we get from The Source system but sometimes you can have like Dimensions where you don’t have like a primary key that you can count on so what we have to do is to go and generate a new primary key in the data warehouse and those primary Keys we call it surrogate keys serate keys are system generated unique identifier that is assigned to each record to make the record unique it is not a business key it has no meaning and no one in the business knows about it we only use it in order to connect our data model and in this way we have more control on how to connect our data model and we don’t have to depend all way on the source system and there are different ways on how to generate surrogate Keys like defining it in the ddl or maybe using the window function row number in this data warehouse I’m going to go with a simple solution where we’re going to go and use the window function so now in order to generate a Sur key for this Dimension what we’re going to do it is very simple so we’re going to say row number over and here if we have to order by something you can order by the create date or the customer ID or the customer number whatever you want but in this example I’m going to go and order by the customer ID so we have to follow the naming convention that’s all surate keys with the key at the end as a suffix so now let’s go and query those informations and as you can see at the start we have a customer key and this is a sequence we don’t have here of course any duplicates and now this sgate key is generated in the data warehouse and we going to use this key in order to connect the data model so now with that our query is ready and the last step is that we’re going to go and create the object and as we decided all the objects in the gold layer going to be a virtual one so that means we’re going to go and create a view so we’re going to say create View gold. dim so follow damic convention stand for the dimension and we’re going to have the customers and then after that we have us so with that everything is ready let’s go and excuse it it was successful let’s go to the Views now and you can see our first objects so we have the dimension customers in the gold layer now as you know me in the next of that we’re going to go and check the quality of this new objects so let’s go and have a new query so select star from our view temp customers and now we have to make sure that everything in the right position like this and now we can do different checks like the uniqueness and so on but I’m worried about the gender information so let’s go and have a distinct of all values so as you can see it is working perfectly we have only female male and not available so that’s it with that we have our first new dimension okay friends so now let’s go and build the second object we have the products so as you can see product information is available in both Source systems as usual we’re going to start with the CRM informations and then we’re going to go and join it with the other table in order to get the category informations so those are the columns that we want from this table now we come here to a big decision about this objects this objects contains historical informations and as well the current informations now of course depend on the requirement whether you have to do analysis on the historical informations but if you don’t have such a requirements we can go and stay with only the current informations of the products so we don’t have to include all the history in the objects and it is anyway as we learned from the model over here we are not using the primary key we are using the product key so now what we have to do is to filter out the historical data and to stay only with the current data so we’re going to have here aware condition and now in order to select the current data what we’re going to do we’re going to go and Target the end dates if the end date is null that means it is a current data let’s take this example over here so you can see here we have three record for the same product key and for the first two records we have here an information in the end dates because it is historical informations but the last record over here we have it as a null and that’s because this is the current information it is open and it’s not closed yet so in order to select only the current informations it is very simple we’re going to say BRD in dat is null so if you go now and execute it you will get only the current products you will not have any history and of course we can go and add comment to it filter out all historical data and this means of course we don’t need the end date in our selection of course because it is always a null so with that we have only the current data now the next step that we have to go and join it with the product categories from the Erp and we’re going to use here the ID so as usual the master information is the CRM and everything else going to be secondary that’s why I use the Live join just to make sure I’m not losing I’m not filtering any data because if there is no match then we lose data so let’s join silver Erp and the category so let’s call it PC and now what we’re going to do we’re going to go and join it using the key so PN from the CRM we have the category ID equal to PC ID and now we have to go and pick columns from the second table so it’s going to be the PC we have the category very important PC we have the subcategory and we can go and get the maintenance so something like this let’s go and query and with that we have all those columns comes from the first table and those three comes from the second so with that we have collected all the product informations from the two Source systems now the next step is we have to go and check the quality of these results and of course what is very important is to check the uniqueness so what we’re going to do we’re going to go and have the following query I want to make sure that the product key is unique because we’re going to use it later in order to join the table with the sales so from and then we have to have group by product key and we’re going to say having counts higher than one so let’s go and check perfect we don’t have any duplicates the second table didn’t cause any duplicates for our join and as well this means we don’t have historical data and each product is only one records and we don’t have any duplicates so I’m really happy about that so let’s go in query again now of course the next step do we have anything to integrate together do we have the same information twice well we don’t have that the next step is that we’re going to go and group up the relevant informations together so I’m going to say the product ID then the product key and the product name are together so all those three informations are together and after that we can put all the category informations together so we can have the category ID the category itself the subcategory let me just query and see the results so we have the product ID key name and then we have the category ID name and the subcategory and then maybe as well to put the maintenance after the subcategory like this and I think the product cost and the line can start could stay at the end so let me just check so those three four informations about the category and then we have the cost line and the start date I’m really happy with that the next step we’re going to go and give n names friendly names for those columns so let’s start with the first one this is the product ID the next one going to be the product number we need the key for the surrogate key later and then we have the product name and after that we have the category ID and the category and this is the subcategory and then the next one going to stay as it is I don’t have to rename it the next one going to be the cost and the line and the last one will be the start dates so let’s go and execute it now we can see very nicely in the output all those friendly names for the columns and it looks way nicer than before I don’t have even to describe those informations the name describe it so perfect now the next big decision is what do we have here do we have a effect or Dimension what do you think well as you can see here again we have a lot of descriptions about the products so all those informations are describing the business object products we don’t have like here transactions events a lot of different keys and ideas so we don’t have really here a facts we have a dimension each row is exactly describing one object describing one products that’s why this is a dimension okay so now since this is a dimension we have to go and create a primary key for it well actually the surrogate key and as we have done it for the customers we’re going to go and use the window function row number in order to generate it over and then we have to S the data I will go with the start dates so let’s go with the start dates and as well the product key and we’re going to gra it a name products key like this so let’s go and execute it with that we have now generated a primary key for each product and we’re going to be using it in order to connect our data model all right now the next step we does we’re going to go and build the view so we’re going to say create view we’re going to say go and dimension products and then ask so let’s go and create our objects and now if you go and refresh the views you will see our second object the second dimension so we have here in the gold layer the dimension products and as usual we’re going to go and have a look to this view just to make sure that everything is fine so them products so let’s execute it and by looking to the data everything looks nice so with that we have now two dimensions all right friends so with that we have covered a lot of stuff so we have covered the customers and the products and we are left with only one table where we have the transactions the sales and for the sales information we have only data from the CRM we don’t have anything from the Erp so let’s go and build it okay so now I have all those informations and now of course we have only one table we don’t have to do any Integrations and so on and now we have to answer the big question do we have here a dimension or a fact well by looking to those details we can see transactions we can see events we have a lot of dates informations we have as well a lot of measures and metrics and as well we have a lot of IDs so it is connecting multiple dimensions and this is exactly a perfect setup for effects so we’re going to go and use those informations as effects and of course as we learned effect is connecting multiple Dimensions we have to present in this fact the surrogate keys that comes from the dimensions so those two informations the product key and the customer ID those informations comes from the searce system and as we learned we want to connect our data model using the surate keys so what we’re going to do we’re going to replace those two informations with the surate keys that we have generated and in order to do that we have to go and join now the two dimensions in order to get the surate key and we call this process of course data lookup so we are joining the tables in order only to get one information so let’s go and do that we will go with the lift joint of course not to lose any transaction so first we’re going to go and join it with the product key now of course in the silver layer we don’t have any ciruit Keys we have it in the good layer so that means for the fact table we’re going to be joining the server layer together with the gold layer so gold dots and then the dimension products and I’m going to just call it PR and we’re going to join the SD using the product key together with the product number [Music] from the dimension and now the only information that we need from the dimension is the key the sget key so we’re going to go over here and say product key and what I’m going to do I’m going to go and remove this information from here because we don’t need it we don’t need the original product key from The Source system we need the circuit key that we have generated in our own in this data warehouse so the same thing going to happen as well for the customer so gold Dimension customer again again we are doing here a look up in order to get the information on SD so we are joining using this ID over here equal to the customer ID because this is a customer ID and what we’re going to do the same thing we need the circuit key the customer key and we’re going to delete the ID because we don’t need it now we have the circuit key so now let’s go and execute it and now with that we have in our fact table the two keys from the dimensions and now this can help us to connect the data model to connect the facts with the dimensions so this is very necessary Step Building the fact table you have to put the surrogate keys from the dimensions in the facts so that was actually the hardest part building the facts now the next step all what you have to do is to go and give friendly names so we’re going to go over here and say order number then the surrogate keys are already friendly so we’re going to go over here and say this is the order date and the next one going to be shipping date and then the next one due date and the sales going to be I’m going to say sales amount the quantity and the final one is the price so now let’s go and execute it and look to the results so now as you can see the columns looks very friendly and now about the order of the columns we use the following schema so first in the fact table we have all the surrogate keys from the dimensions then second we have all the dates and at the end you group up all the measures and the matrics at the end of The Facts so that’s it for the query for the facts now we can go and build it so we’re going to say create a view gold in the gold layer and this time we’re going to use the fact underscore and we’re going to go and call it sales and then don’t forget about the ass so that’s it let’s go and create it perfect now we can see the facts so with that we have three objects in the gold layer we have two dimensions and one and facts and now of course the next step with this we’re going to go and check the quality of the view so let’s have a simple select fact sales so let’s execute it now by checking the result you can see it is exactly like the result from the query and everything looks nice okay so now one more trick that I usually do after building a fact is try to connect the whole data model in order to find any issues so let’s go and do that we will do just simple left join with the dimensions so gold Dimension customers C and we will use the [Music] keys and then we’re going to say where customer key is null so there is no matching so let’s go and execute this and with that as you can see in the results we are not getting anything that means everything is matching perfectly and we can do as well the same thing with the products so left join C them products p on product key and then we connect it with the facts product key and then we going to go and check the product key from the dimension like this so we are checking whether we can connect the facts together with the dimension products let’s go and check and as you can see as well we are not getting anything and this is all right so with that we have now SQL codes that is tested and as well creating the gold layer now in The Next Step as you know in our requirements we have to make clear documentations for the end users in order to use our data model so let’s go and draw a data model of the star schema so let’s go and draw our data model let’s go and search for a table and now what I’m going to do I’m going to go and take this one where I can say what is the primary key and what is the for key and I’m going to go and change little bit the design so it’s going to be rounded and let’s say I’m going to go and change to this color and maybe go to the size make it 16 and then I’m going to go and select all the columns and make it as well 16 just to increase the size and then go to our range and we can go and increase it 39 so now let’s go and zoom in a little bit for the first table let’s go and call it gold Dimension customers and make it a little bit bigger like this and now we’re going to go and Define here the primary key it is the customer key and what else we’re going to do we’re going to go and list all the columns in the dimension is little bit annoying but the results going to be awesome so what do we we have the customer ID we have the customer number and then we have the first name now in case you want a new rows so you can hold control and enter and you can go and add the other columns so now pause the video and then go and create the two Dimensions the customers and the products and add all the columns that you have built in the [Music] view welcome back so now I have those two Dimensions the third one one going to be the fact table now for the fact table I’m going to go with different color for example the blue and I’m going to go and put it in the middle something like this so we’re going to say gold fact sales and here for that we don’t have primary key so we’re going to go and delete it and I have to go and add all The Columns of the facts so order number products key customer key okay all right perfect now what we can do we can go and add the foreign key information so the product key is a foreign key key for the products so you’re going to say fk1 and the customer key going to be the foreign key for the customers so fk2 and of course you can go and increase the spacing for that okay so now after we have the tables the next step in data modeling is to go and describe the relationship between these tables this is of course very important for reporting and analytics in order to understand how I’m going to go and use the data model and we have different types of relationships we have one to one one too many and in Star schema data model the relationship between the dimension and the fact is one too many and that’s because in the table customers we have for a specific customer only one record describing the customer but in the fact table the customer might exist in multiple records and that’s because customers can order multiple times so that’s why in fact it is many and in the dimension side it is one now in order to see all those relationships we’re going to go to the menu to the left side and as you can see we have here entity relations and now you have different types of arrows so here for example we have zero to many one one to many one to one and many different types of relations so now which one we going to take we’re going to go and pick with this one so it says one mandatory so that means the customer must exist in the dimension table too many but it is optional so here we have three scenarios the customer didn’t order anything or the customer did order only once or the customer did order many things so that’s why in the fact table it is optional so we’re going to take this one and place it over here so we’re going to go and connect this part to the customer Dimension and the many parts to the facts well actually we have to do it on the customers so with that we are describing the relationship between the dimensions and fact with one to many one is mandatory for the customer Dimension and many is optional to the facts so we have the same story as well for the products so the many part to the facts and the one goes to the products so it’s going to look like this each time you are connecting new dimension to the fact table it is usually one too many relationship so you can go and add anything you want to this model like for example a text like explaining something for example if you have some complicated calculations and so on you can go and write this information over here so for example we can say over here sales calculation we can make it a little bit smaller so let’s go with 18 so we can go and write here the formula for that so sales equal quantity multipli with a price and make this a little bit bigger so it is really nice info that we can add it to the data model and even we can go and Link it to the column so we can go and take this arrow for example with it like this and Link it to the column and with that you have as well nice explanation about the business rule or the calculation so you can go and add any descriptions that you want to the data model just to make it clear for anyone that is using your data model so with that you don’t have only like three tables in the database you have as well like some kind of documentations and explanation in one Blick we can see how the data model is built and how you can connect the tables together it is amazing really for all users of your data model all right so now with that we have really nice data model and now in The Next Step we’re going to go and create quickly a data catalog all right great so with that we have a data model and we can say we have something called a data products and we will be sharing this data product with different type of users and there’s something that’s every every data product absolutely needs and that is the data catalog it is a document that can describe everything about your data model The Columns the tables maybe the relationship between the tables as well and with that you make your data product clear for everyone and it’s going to be for them way easier to derive more insights and reports from your data product and what is the most important one it is timesaving because if you don’t do that what can happen each consumer each user of your data product will keep asking you the same question questions about what do you mean with this column what is this table how to connect the table a with the table B and you will keep repeating yourself and explaining stuff so instead of that you prepare a data catalog a data model and you deliver everything together to the users and with that you are saving a lot of time and stress I know it is annoying to create a data catalog but it is Investments and best practices so now let’s go and create one okay so now in order to do that I’ve have created a new file called Data catalog in the folder documents and here what we’re going to do is very St straightforwards we’re going to make a section for each table in the gold layer so for example we have here the table dimension customers what you have to do first is to describe this table so we are saying it stores details about the customers with the demographics and Geographics data so you give a short description for the table and then after that you’re going to go and list all your columns inside this table and maybe as well the data type but what is way important is the description for each column so you give a very short description like for example here the gender of the customer now one of the best practices of describing a column is to give examples because you can understand quickly the purpose of the columns by just seeing an example right so here we are seeing we can find inside it a male female and not available so with that the consumer of your table can immediately understand uhhuh it will not be an M or an F it’s going to be a full friendly value without having them to go and query the content of the table they can understand quickly the purpose of the column so with that we have a full description for all the columns of our Dimension the same thing we’re going to do for the products so again a description for the table and as well a description for each column and the same thing for the facts so that’s it with that you have like data catalog for your data product at the code layer and with that the business user or the data analyst have better and clear understanding of the content of your gold layer all right my friends so that’s all for the data catalog in The Next Step we’re going to go back to Dro where we’re going to finalize the data flow diagram so let’s go okay so now we’re going to go and extend our data flow diagram but this time for the gold layer so now let’s go and copy the whole thing from the silver layer and put it over here side by side and of course we’re going to go and change the coloring to the gold and now we’re going to go and rename stuff so this is the gold layer but now of course we cannot leave those tables like this we have completely new data model so what do we have over here we have the fact sales we have dimension customers and as well we have Dimension products so now what I’m going to do I’m going to go and remove all those stuff we have only three tables and let’s go and put those three tables somewhere here in the center so now what you have to do is to go and start connecting those stuff I’m going to go with this Arrow over here direct connection and start connecting stuff so the sales details goes to the fact table maybe put the fact table over here and then we have the dimension customer this comes from the CRM customer our info and we have two tables from the Erp it comes from this table as well and the location from the Erp now the same thing goes for the products it comes from the product info and comes from the categories from the Erp now as you can see here we have cross arrows so what we going to do we can go and select everything and we can say line jumps with a gap and this makes it a little bit like Pitter individual for the arrows so now for example if someone asks you where the data come from for the dimension products you can open this diagram and tell them okay this comes from the silver layer we have like two tables the product info from the CRM and as well the categories from the Erp and those server tables comes from the pron layer and you can see the product info comes from the CRM and the category comes from the Erp so it is very simple we have just created a full data lineage for our data warehouse from the sources into the different layers in our data warehouse and data lineage is is really amazing documentation that’s going help not only your users but as well the developers all right so with that we have very nice data flow diagram and a data lineage all right so we have completed the data flow it’s really feel like progress like achievement as we are clicking through all those tasks and now we come to the last task in building the data warehouse where we’re going to go and commit our work in the get repo okay so now let’s put our scripts in the project so we’re going to go to the scripts over here we have here bronze silver but we don’t have a gold so let’s go and create a new file we’re going to have gold/ and then we’re going to say ddl gold. SQL so now we’re going to go and paste our views so we have here our three views and as usual at the start we going to describe the purpose of the views so we are saying create gold views this script can go and create views for the code layer and the code layer represent the final Dimension and fact tables the star schema each view perform Transformations and combination data from the server layer to produce business ready data sets and those us can be used for analytics and Reporting so that it let’s go and commit it okay so with that as you can see we have the PRS the silver so we have all our etls and scripts in the reposter and now as well for the gold layer we’re going to go and add all those quality checks that we have used in order to validate the dimensions and facts so we’re going to go to The Taste over here and we’re going to go and create a new file it’s going to be quality checks gold and the file type is SQL so now let’s go and paste our quality checks so we have the check for the fact the two dimensions and as well an explanation about the script so we are validating the integrity and the accuracy of the gold layer and here we are checking the uniqueness of the circuit keys and whether we are able to connect the data model so let’s put that as well in our git and commit the changes and in case we come up with a new quality checks we’re going to go and add it to our script here so those checks are really important if you are modifying the atls or you want to make sure that after each ATL those script SC should run and so on it is like a quality gate to make sure that everything is fine in the gold layer perfect so now we have our code in our repo story okay friends so now what you have to do is to go and finalize the get repo so for example all the documentations that we have created during the projects we can go and upload them in the docs so for example you can see here the data architecture the data flow data integration data model and so on so with that each time you edit those pages you can commit your work and you have likey version of that and another thing that you can do is that you go to the read me like for example over here I have added the project overview some important links and as well the data architecture and a little description of the architecture of course and of course don’t forget to add few words about yourself and important profiles in the different social medias all right my friends so with that we have completed our work and as well closed the last epek building the gold layer and with that we have completed all the faces of building a data warehouse everything is 100% And this feels really nice all right my friends so if you’re still here and you have built with me the data warehouse then I can say I’m really proud of you you have built something really complex and amazing because building a data warehouse is usually a very complex data projects and with that you have not only learned SQL but you have learned as well how we do a complex data projects in real world so with that you have a real knowledge and as well amazing portfolio that you can share with others if you are applying for a job or if you are showcase that you have learned something new and with that you have experienced different rules in the project what the data Architects and the data Engineers do in complex data projects so that was really an amazing journey even for me as I’m creating this project so now in the next and with that you have done the first type of data analytics projects using SQL the data warehousing now in The Next Step we’re going to do another type of projects the exploratory data analyzes Eda where we’re going to understand and explore our data sets if you like this video and you want me to create more content like this I’m going to really appreciate it if you support the channel by subscribing liking sharing commenting all those stuff going to help the Channel with the YouTube algorithm and as well my content going to reach to the others so thank you so much for watching and I will see you in the next tutorial bye

By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog

Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
March 4, 2025
Under Your Skin: The O’Malley Family, Book 1 by SHANNYN SCHROEDER
This source presents excerpts from “Under Your Skin (The O’Malley Family Book 1).” It centers around the lives and relationships of the O’Malley family, specifically focusing on themes of pregnancy, family dynamics, and personal struggles. The narrative appears to follow multiple characters, such as Norah and Kai, as they navigate complex situations involving family, work, and unexpected pregnancies. There also seems to be an overarching narrative, though not specifically stated in the book description, involving criminal behavior. The characters’ interactions are portrayed with a focus on their emotions and internal conflicts as they negotiate their individual challenges. The story seems to take place in the Boston and Chicago areas.

Under Your Skin: The O’Malley Family Book 1 – Study Guide

Key Themes
- Family Dynamics: The complex relationships between the O’Malley siblings, their parents, and extended family members, marked by both love and conflict.
- Responsibility and Burden: The weight of responsibility each character carries, particularly in relation to caring for family members.
- Second Chances: Opportunities for redemption and self-improvement are themes woven into the story, especially for characters like Kai and Tommy.
- Personal Growth: Characters evolve as they confront their pasts and make choices about their futures.
- Love and Relationships: The various forms love takes, including familial love, romantic love, and friendship, and how these relationships affect the characters.
Chapter Summaries
- Chapter One: Introduces the O’Malley family, specifically focusing on Tommy and his return to Chicago after time in rehab. Kai is shown to be running the tattoo parlor.
- Chapter Two: Introduces Norah’s pregnancy and Jimmy’s reaction. We see Norah’s strained relationship with her father and interactions with Moira.
- Chapter Five: Focuses on Kai taking care of his mother and his internal conflict. It also introduces Jaleesa’s physical therapy.
- Chapter Six: Explores Kai and Norah’s interactions and their respective burdens. Norah’s conversation with Kevin reveals family tensions.
- Chapter Seven: Touches upon Norah’s cravings and discomfort during pregnancy. Kai is shown taking care of his mother.
- Chapter Eight: Norah navigates her pregnancy and book club responsibilities.
- Chapter Nine: Kai takes care of Norah when she goes to the hospital, demonstrating their growing bond.
- Chapter Ten: Centers on Norah’s reaction to Kai’s poker game and their evolving relationship.
- Chapter Twelve: Features tension between Kai and Norah.
- Chapter Thirteen: Features a sexual encounter between Kai and Norah.
- Chapter Fifteen: Norah and Kai’s intimacy is revisited.
- Chapter Sixteen: Kai continues his tattoo work while struggling with his feelings for Norah.
- Chapter Seventeen: Explores Kai’s complicated situation and tension with Rooster.
- Chapter Eighteen: Focuses on Sean’s birthday party and the family gathering, which reveals underlying tensions.
Character Relationships
- Norah & Kai: An evolving relationship marked by attraction, shared burdens, and emotional vulnerability. They seem to support one another.
- Tommy & Kai: Brotherly relationship, shaped by shared history and a need for support. Kai keeps Tommy on a relatively straight path after rehab.
- Norah & Jimmy: Siblings who clearly care for one another, even if Jimmy struggles with the circumstances of Norah’s pregnancy.
- Kai & His Mother: Kai is very dedicated to his mother.
Quiz
1. Describe Kai O’Malley’s profession and a key aspect of his personality.
2. What major life change is Norah experiencing in the novel, and how is she handling it?
3. What is Tommy’s recent history and how does his brother Kai play a role in Tommy’s life?
4. What are some of the main issues or concerns that Kai’s mother deals with?
5. Describe the dynamic between Norah and her brother Jimmy.
6. What significant decision does Norah make about the baby, and what are her motivations?
7. What activity does Kai participate in during his leisure time, and what do we learn about his past from it?
8. How does the novel portray the themes of family loyalty and obligation within the O’Malley family?
9. What services do Kai and Norah separately provide for family members?
10. Describe the nature of Kai and Norah’s eventual relationship.
Quiz – Answer Key
1. Kai is a tattoo artist who runs his own shop, Ink Envy. He is portrayed as someone trying to do what’s best for his family, and has had struggles that he is trying to overcome.
2. Norah is pregnant and facing the challenges of unplanned pregnancy as a single woman. She demonstrates bravery in the face of her unplanned situation.
3. Tommy has recently been through rehab, is still struggling with past mistakes and trying to find his place. Kai provides guidance and support, to help him stay on the right track.
4. Kai’s mother is a single woman who appears to have limited mobility. He takes care of her in the mornings and makes sure she is safe while she is in the house all day.
5. Norah and Jimmy seem to have a strong sibling bond and have one another’s best interests at heart. Jimmy seems to want to do what is best for his sister.
6. Norah makes the decision to put her baby up for adoption in the hopes of a better life for her. Her decision is hard for her, but she stands by it.
7. Kai plays poker, often in the basement of his house. It’s revealed that the game provides an escape, in the presence of old friends, but that the presence of an ex-gangbanger is disruptive.
8. The O’Malley family shows strong loyalty and obligation, evident in their willingness to support one another through thick and thin. The family always seems to pull together, although their methods of support vary.
9. Kai is dedicated to being a tattoo artist in his shop and providing for his mother. Norah provides care to her, but also acts as a resource of advice and assistance to her brothers, in times of need.
10. Kai and Norah develop an intimate relationship. The novel explores the complicated nature of their romance.
Essay Questions
1. Discuss the role of responsibility and burden in shaping the lives and choices of the O’Malley siblings.
2. Analyze how the setting of Chicago contributes to the overall mood and themes of Under Your Skin.
3. Explore the significance of art, particularly tattooing, in the novel and its connection to character development.
4. Compare and contrast the different types of love depicted in the novel and their impact on the characters’ lives.
5. Examine the theme of second chances and how characters like Tommy and Kai seek redemption and personal growth.
Glossary of Key Terms
- Ink Envy: The name of Kai’s tattoo parlor, representing his profession and artistic expression.
- Rehab: A facility or program designed to help individuals recover from addiction, as experienced by Tommy.
- Adoption: The legal process by which a child is permanently placed with a family other than their biological parents, which is part of Norah’s storyline.
- Poker: A card game that serves as a leisure activity and social outlet for Kai and his friends, also tied to aspects of his past.
- Home Healthcare: The provision of medical and personal care services in a patient’s home, relevant to Kai’s mother’s needs.
- Single Motherhood: The experience of raising a child without a partner, a central aspect of Norah’s initial situation.
- Family Dynamics: The patterns of interaction and relationships between family members, a key focus of the novel.
- Responsibility: The obligation or duty to care for or be accountable for someone or something, a recurring theme for the characters.
- Redemption: The act of making amends for past mistakes or wrongdoings, sought by characters like Tommy and Kai.
- Personal Growth: The process of improving oneself through learning, experience, and self-reflection, evident in the characters’ journeys.
Under Your Skin: The O’Malley Family, Book 1

Okay, I will analyze the provided text excerpts from “Under Your Skin (The O’Malley Family Book 1)” and create a detailed briefing document, highlighting the main themes, important ideas, and including relevant quotes.

Briefing Document: “Under Your Skin (The O’Malley Family Book 1)”

Overview:

The provided excerpts introduce the O’Malley family, focusing on the complex relationships between siblings, particularly Norah, Kai, Tommy, and Jimmy, as well as their mother, Lani. The story revolves around themes of family loyalty, responsibility, unexpected pregnancy, and the challenges of navigating adulthood while carrying the weight of past experiences. The characters grapple with difficult decisions, family secrets, personal growth, and attempts to forge their own paths.

Main Themes and Ideas:
- Family Dynamics and Loyalty: The O’Malley siblings exhibit a strong, albeit often turbulent, bond. They support each other but also clash frequently, revealing a deep-seated history of shared experiences and expectations.
- Example: Several interactions highlight the siblings’ willingness to intervene in each other’s lives, even when unwanted. Norah often finds herself helping her brothers.
- Unexpected Pregnancy and Adoption: Norah’s unexpected pregnancy and subsequent exploration of adoption are central to the excerpts. The excerpts show how everyone around Norah is affected by her decision. The family members start talking to adoption agencies and are trying to find a suitable family for the baby.
- Personal Growth and Responsibility: Characters, especially Kai and Norah, are shown grappling with their individual responsibilities. Kai is dealing with the financial strains of running his own business. Norah confronts the need to make significant life choices related to her pregnancy.
- Example: Norah makes decisions about the adoption based on the best family for the baby.
- Past Trauma and its Lingering Effects: There is a sense of a shared family history that continues to impact the characters’ present lives. The family seem to have gone through hardship and tragedy.
- Complex Relationships: There are several complex relationships discussed in this excerpt. Norah is navigating her relationships with multiple men who are trying to help her, including Kai, Tommy, and Jimmy.
Key Characters and Plot Points:
- Norah: Pregnant, independent, and grappling with the decision of whether to keep the baby or pursue adoption. She is a central figure around whom much of the plot revolves. “Maybe she was a chicken because she not only asked him to do it, but she would actually let him.”
- Kai: A tattoo artist, seems to be the most responsible sibling and is helping Norah with her choices.
- Tommy: Involved in hockey and seems to be an emotional support for the family. He seems to be closest to Kai.
- Jimmy: Seems supportive, but he also faces his own personal issues.
- Lani: The mother. It was hard to believe that barely two months ago she’d gotten out of the hospital with her knee replacement.”
Quotes Illustrating Key Themes:
- On Family: “Besides, the fact that maybe kill me if I didn’t?”
- On unexpected situations: “”You’re talking like a crazy woman. She’s pregnant with another man’s child.” “Is that man around? I think not or she wouldn’t have spent her day with me.”
- On adoption decisions:”I want someone who wants my little girl. I want them to be from this area so I can see her. I want two parents. Definitely”
Possible Conflicts and Questions Raised:
- How will Norah’s adoption decision impact her relationships with her family and potential adoptive parents?
- Can Kai manage to overcome his past and present challenges and find a stable path forward?
Overall Tone:

The excerpts convey a tone that blends humor, tenderness, and underlying tension. The characters’ interactions are often laced with wit and sarcasm, but there’s also a sense of vulnerability and genuine care beneath the surface.

This briefing document summarizes the core themes, characters, and potential conflicts presented in the provided excerpts from “Under Your Skin (The O’Malley Family Book 1)”.

Under Your Skin: O’Malley Family FAQs

FAQ: Under Your Skin (The O’Malley Family Book 1)
- What is the central conflict or challenge that Norah faces?
- Norah is dealing with an unplanned pregnancy and is struggling to figure out her next steps. She is also figuring out if adoption may be the best path forward for herself and the child. She also has to deal with a variety of strong opinions from her family.
- How is Kai’s artistic ability presented in the story?
- Kai is depicted as a talented tattoo artist. His work at Ink Envy is sought after, and the narrative highlights his skill in both designing and executing tattoos.
- What role does family play in the characters’ lives?
- Family is a dominant theme, with close-knit sibling relationships and strong familial expectations influencing the characters’ decisions and behaviors. The O’Malley family is very involved in each others’ lives, even when it may not be wanted.
- How does the story portray the challenges of adulthood?
- The characters grapple with issues like unplanned pregnancy, career aspirations, financial struggles, and complicated romantic relationships, reflecting the complexities and uncertainties of early adulthood.
- What is Ink Envy, and why is it significant?
- Ink Envy is the tattoo shop where Kai works. It serves as both his workplace and a space where the characters interact and their stories unfold. The tattoo shop is where Kai is able to be artistically productive, as well as support himself financially.
- What are the key traits of the O’Malley brothers, and how do they differ?
- The O’Malley brothers—Tommy, Jimmy, and Kai—each possess distinct personalities. Tommy seems to be the responsible caretaker, Jimmy provides support and commentary, and Kai is focused on his art and working.
- How are themes of independence and dependence explored in the story?
- The characters navigate a balance between asserting their independence and relying on family for support, demonstrating the tension between self-reliance and interconnectedness. Norah has moments of being highly independent, and then other moments when she seeks the love and support of her family.
- What is the significance of the book’s title, “Under Your Skin”?
- The title, “Under Your Skin,” could have multiple meanings. It refers literally to the art of tattooing, but it also symbolizes the way that family history, relationships, and emotions permeate and shape the characters’ identities and experiences. It is a reference to how close the O’Malley family is to each other.
Pregnancy Anxieties in “Under Your Skin”

Some characters in “Under Your Skin (The O’Malley Family Book 1)” experience anxieties related to pregnancy.

Examples of pregnancy anxieties:
- Avery is worried about how her body will change.
- Avery is concerned that her hormones are affecting her negatively.
- Norah is scared about the possibility of having twins.
- Norah expresses concern about the changes in her life as a result of the pregnancy.
- Norah worries about how her family will adjust and whether she will have the support she needs.
- Norah reflects on whether she is ready to be a mother.
- Moira reflects on the beginning of her pregnancy.
O’Malley Family Dynamics: Pregnancy, Relationships, and Conflict

The source text reveals a complex web of family dynamics, including those influenced by pregnancy and its related anxieties. Various relationships and interactions within the O’Malley family are depicted:
- Sibling Relationships: The source text illustrates sibling relationships. For example, Norah has brothers, and their interactions range from supportive to overprotective. There are tensions and caring moments between siblings.
- Parent-Child Dynamics: The text refers to parent-child dynamics, showing the complexities and potential for conflict. Characters reflect on their relationships with their parents, and the impact their parents had on their lives.
- Extended Family: Interactions with extended family members, like aunts and cousins, also shape the family dynamic. The O’Malley family appears very involved in each others’ lives.
- Impact of Pregnancy on Family Dynamics: Pregnancy is a central theme that influences family relationships. The characters discuss and debate the impact of unplanned pregnancies. The family members respond differently to the pregnancies. Some family members are supportive, while others are judgmental or concerned. The impending arrival of a new baby also stirs up anxieties and prompts reflections on family history and future.
- Family Support and Conflict: The source text also highlights instances of family members supporting each other through difficult times. However, there are conflicts and disagreements within the family. These conflicts sometimes stem from differing opinions about how to handle pregnancies or other life challenges.
- Loyalty and Protection: Despite the conflicts, there is a strong sense of family loyalty and a willingness to protect one another. Siblings rally to support each other, and parents want the best for their children.
- Changing Roles: The source shows the changing roles within the family as members navigate new relationships, pregnancies, and personal growth. Characters grapple with their identities as parents, siblings, and individuals within the family.
Personal Growth and Relationships: Navigating Life’s Challenges

The characters in the source text experience personal growth as they navigate complex relationships, pregnancies, and various life challenges.

Aspects of personal growth depicted in the source text:
- Overcoming Past Trauma: Characters grapple with past traumas and work towards healing and moving forward.
- Identity and Self-Discovery: Characters reflect on their identities and strive toward self-discovery. They consider their roles within the family and as individuals.
- Changing Relationships: Characters navigate changing relationships and the evolving roles of family members.
- Taking Responsibility: Some characters make an effort to take responsibility for their actions and decisions.
- Emotional Maturity: The source text shows characters developing emotional maturity through introspection and self-reflection. They learn to understand their feelings and motivations.
- Letting Go: Characters learn to let go of past grievances, forgive others, and move forward.
- Confronting difficult situations: Characters confront difficult situations and make tough choices. They show resilience in the face of adversity.
Under Your Skin: Relationship Struggles in the O’Malley Family

In “Under Your Skin (The O’Malley Family Book 1),” characters experience relationship struggles stemming from various sources such as family dynamics, personal growth challenges, and pregnancy anxieties.

Relationship struggles include:
- Impact of Family Dynamics: Characters experience relationship struggles stemming from family dynamics. For example, siblings’ involvement in each other’s lives can lead to tension. Differing opinions on handling pregnancies can also cause conflict among family members.
- Personal Growth Challenges: Characters’ individual journeys of self-discovery and healing from past traumas can create friction in relationships. Differing levels of emotional maturity or commitment to taking responsibility can also lead to misunderstandings and disagreements.
- Pregnancy-Related Anxieties: The anxieties surrounding pregnancy, such as concerns about body image and the future, can strain relationships. The source text shows characters grappling with unplanned pregnancies and the adjustments required.
Under Your Skin: Purpose, Relationships, and Growth

The characters in “Under Your Skin (The O’Malley Family Book 1)” grapple with finding purpose while navigating relationship struggles, family dynamics, personal growth, and pregnancy anxieties.

Examples of characters finding purpose:
- Taking Responsibility: Characters strive to take responsibility for their actions and decisions, suggesting a search for purpose through accountability and maturity.
- Confronting Difficult Situations: Characters confront difficult situations and make tough choices, indicating they find purpose by facing adversity and demonstrating resilience.
- Personal Growth and Self-Discovery: Characters reflect on their identities and consider their roles within their families and as individuals, indicating a journey toward finding purpose through understanding themselves.
- Supporting Family: Despite conflicts, a strong sense of family loyalty and a willingness to protect one another is present, suggesting that characters find purpose in supporting and caring for their families.
- Defining Relationships: Characters navigate changing relationships and evolving roles within their families, showing they seek purpose by adapting to new dynamics and defining their place within them.
- Healing from Trauma: Characters grapple with past traumas and work toward healing and moving forward, implying they find purpose in overcoming adversity and seeking a better future.
By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog

Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
March 4, 2025
Power BI Data Transformation, Visualization, and Drill Down
The text is a series of excerpts describing the process of creating data visualizations and dashboards using Power BI. It guides users on how to import, transform, and model data from sources like Excel using Power Query. The text covers topics such as cleaning data, creating relationships between tables, and using DAX functions to perform calculations. Various chart types are explored, along with techniques like drill-downs, conditional formatting, and grouping data using bins and lists. The final portion focuses on building a comprehensive dashboard from survey data, including considerations for layout and theme customization.

Power BI Mastery: A Comprehensive Study Guide

Quiz: Short Answer Questions
1. What is Power BI and what is its primary function? Power BI is a data visualization tool within the Microsoft ecosystem used for creating interactive dashboards and reports. Its primary function is to transform raw data into insightful visuals for better decision-making.
2. Where can you download Power BI Desktop and is it free? Power BI Desktop can be downloaded from the Microsoft Store, or from a direct download link. It is available for free.
3. Name at least three different data sources that Power BI can connect to. Power BI can connect to various data sources including Excel workbooks, SQL databases, and online services like Google Analytics.
4. What is Power Query and why is it important in Power BI? Power Query is a data transformation tool within Power BI. It is important because it allows users to clean, reshape, and transform data before creating visualizations.
5. How do you rename a column in Power Query and where can you view the steps you have taken to edit data? In Power Query, a column can be renamed by double-clicking on its header and typing in the new name. The applied steps appear in the “Applied Steps” pane on the right side of the Power Query Editor window.
6. How do you remove a filter that you have applied to data in Power Query? To remove a filter, locate the “Filtered Rows” step in the “Applied Steps” pane and click the “X” icon next to it.
7. What are the three main tabs in Power BI Desktop and what is the primary function of each? The three main tabs are: Report (for creating visualizations), Data (for viewing and managing data), and Model (for defining relationships between tables).
8. Give a brief overview of the importance of the “Model” tab in Power BI? The Model tab is important for defining relationships between different tables or data sources. These relationships allow you to create more complex and accurate visualizations by combining data from multiple sources.
9. Describe what drill down is. What are the three drill down effects that Power BI offers? Drill down lets you explore data at increasingly granular levels within a single visualization. The three drill down features are: “turn on drill down,” “go to next level in the hierarchy,” and “expand all down one level in the hierarchy.”
10. What is conditional formatting and what are some options for displaying conditional formatting? Conditional formatting highlights data points based on specific criteria, making it easier to identify patterns and outliers. Some options for displaying conditional formatting include background color, font color, data bars, and icons.
Quiz: Answer Key
1. Power BI is a data visualization tool within the Microsoft ecosystem used for creating interactive dashboards and reports. Its primary function is to transform raw data into insightful visuals for better decision-making.
2. Power BI Desktop can be downloaded from the Microsoft Store, or from a direct download link. It is available for free.
3. Power BI can connect to various data sources including Excel workbooks, SQL databases, and online services like Google Analytics.
4. Power Query is a data transformation tool within Power BI. It is important because it allows users to clean, reshape, and transform data before creating visualizations.
5. In Power Query, a column can be renamed by double-clicking on its header and typing in the new name. The applied steps appear in the “Applied Steps” pane on the right side of the Power Query Editor window.
6. To remove a filter, locate the “Filtered Rows” step in the “Applied Steps” pane and click the “X” icon next to it.
7. The three main tabs are: Report (for creating visualizations), Data (for viewing and managing data), and Model (for defining relationships between tables).
8. The Model tab is important for defining relationships between different tables or data sources. These relationships allow you to create more complex and accurate visualizations by combining data from multiple sources.
9. Drill down lets you explore data at increasingly granular levels within a single visualization. The three drill down features are: “turn on drill down,” “go to next level in the hierarchy,” and “expand all down one level in the hierarchy.”
10. Conditional formatting highlights data points based on specific criteria, making it easier to identify patterns and outliers. Some options for displaying conditional formatting include background color, font color, data bars, and icons.
Essay Format Questions
1. Discuss the importance of data transformation using Power Query in the Power BI workflow. Provide examples of common data transformation tasks and explain how they contribute to creating accurate and meaningful visualizations.
2. Explain the significance of the “Model” tab in Power BI, focusing on how relationships between tables are created and managed. Discuss the different types of cardinalities and cross-filter directions, and how they impact data analysis.
3. Compare and contrast aggregator functions and iterator functions (like SUM vs. SUMX) in DAX. Provide specific examples of when each type of function would be most appropriate and explain how they differ in their evaluation context.
4. Describe the various types of visualizations available in Power BI and provide examples of scenarios where each would be most effective. Consider the strengths and weaknesses of each visualization type and how they can be used to convey different types of information.
5. Explain the purpose and application of conditional formatting in Power BI reports. Discuss the different conditional formatting options available and provide examples of how they can be used to highlight key trends and outliers in the data.
Glossary of Key Terms
- Power BI: A Microsoft business analytics service that provides interactive visualizations and business intelligence capabilities.
- Data Visualization: The graphical representation of information and data.
- Dashboard: A visual display of the most important information needed to achieve one or more objectives; consolidated and arranged on a single screen.
- KPI (Key Performance Indicator): A measurable value that demonstrates how effectively a company is achieving key business objectives.
- Power BI Desktop: A free Windows application for creating interactive dashboards and reports.
- Data Source: The location where the data being used originates (e.g., Excel file, SQL database).
- Power Query: A data transformation and preparation engine used within Power BI to clean, reshape, and enrich data.
- Data Transformation: The process of converting data from one format or structure into another.
- Applied Steps: A record of each data transformation step performed in Power Query.
- Report Tab: The area in Power BI Desktop where visualizations are created and arranged.
- Data Tab: The area in Power BI Desktop where the underlying data can be viewed and managed.
- Model Tab: The area in Power BI Desktop where relationships between tables are defined.
- Cardinality: Defines the relationship between two tables (e.g., one-to-many, one-to-one).
- Cross-filter Direction: Determines how filters applied to one table affect related tables.
- DAX (Data Analysis Expressions): A formula language used in Power BI for calculations and data analysis.
- Measure: A calculation performed on data, typically aggregated (e.g., sum, average).
- Column: A field in a table containing a specific attribute of the data.
- Aggregator Function: A DAX function that calculates a single value from a column of data (e.g., SUM, AVERAGE, MIN, MAX).
- Iterator Function: A DAX function that evaluates an expression for each row in a table (e.g., SUMX, AVERAGEX).
- Drill Down: A feature that allows users to explore data at increasingly granular levels within a visualization.
- Bin: A grouping of continuous data into discrete intervals or categories.
- List: An ordered collection of values or items.
- Conditional Formatting: Highlighting data points based on specific criteria, making it easier to identify patterns and outliers.
Power BI Tutorial: Data Visualization and Analysis Guide

Okay, here’s a detailed briefing document summarizing the key themes and ideas from the provided transcript of the Power BI tutorial.

Briefing Document: Power BI Tutorial Series

Overview:

This document summarizes a comprehensive Power BI tutorial series focused on data visualization and analysis. The series aims to take viewers from complete beginners to proficient Power BI users, covering essential skills from data acquisition and transformation to creating interactive dashboards. The content emphasizes practical application and encourages viewers to follow along using provided sample datasets.

Main Themes & Key Ideas:
- Power BI as a Leading Data Visualization Tool: The tutorial positions Power BI as a prominent tool within the Microsoft ecosystem.
- “powerbi is one of the most popular data visualization tools in the world of course it’s within the Microsoft ecosystem”
- Hands-on Learning Approach: The tutorial emphasizes a practical, hands-on approach, encouraging viewers to download datasets and actively participate in the exercises.
- “I’m going to leave the Excel that I’m going to be using in the description you can go and download it and walk through this with me”
- Data Acquisition and Connectivity: A significant portion focuses on connecting to various data sources, highlighting the flexibility of Power BI.
- “it’s going to give us a lot of different options for where we can get data from… you have a ton of options there’s databases… SQL databases… Google analytics”
- Power Query for Data Transformation: The tutorial introduces Power Query as a crucial tool for cleaning, shaping, and transforming data before visualization.
- “it’s going to take us to powerbi power query which is going to allow us to transform our data”
- “this is the window to basically transform your data and get it ready for your visualizations”
- Applied Steps in Power Query: Emphasis on the importance of the “Applied Steps” feature in Power Query for tracking and modifying data transformations.
- “everything that you do every single step that you apply to transform this data is going to be right over here and if I want to … I can just click X and it is going to get rid of that”
- Data Modeling and Relationships: Connecting multiple data tables and defining relationships between them is covered.
- “this is especially useful when you have multiple tables or multiple excels and you need to join them to kind of connect them together”
- DAX (Data Analysis Expressions): Introduction to DAX functions for creating calculated columns and measures.
- “What we’re going to be using are these new measures and new columns to create create our Dax functions”
- Aggregator vs. Iterator functions (SUM vs. SUMX): Explains the difference between aggregate functions (operate on an entire column) and iterator functions (operate row by row).
- Conditional Formatting: Applying visual cues (colors, icons, data bars) to highlight trends and patterns in data.
- Drill-Down Functionality: Creating hierarchical visualizations that allow users to explore data at different levels of detail.
- Lists and Bins: Grouping data using Lists and Bins to aid in visualization and create cohorts.
- Visualization Techniques: Stacked Bar Chart, 100% Stacked Column Chart, Line Chart, Clustered Column Chart, Scatter Chart, Donut Chart, Cards and tables are covered and used throughout the tutorial.
- “the very first one that we’re going to start with probably the easiest one and the one that you’ll recognize the most is a stacked bar chart”
- Project-Based Learning: The series culminates in a final project using real-world survey data from data professionals.
- Customizing Dashboards: Demonstrates how to improve the look and feel of dashboards using themes and color schemes.
Important Ideas and Facts:
- Power BI Desktop can be downloaded for free.
- “we’re going to click this download free button”
- Power Query Editor is used to transform data.
- “this is the window to basically transform your data and get it ready for your visualizations”
- Data relationships are crucial for combining data from multiple sources.
- DAX functions are essential for creating calculations and performing advanced analysis.
- Drill-down functionality allows for interactive exploration of data.
- “what happens is is someone some stakeholder in our company is saying hey Alex we want this and we want to know we want to drill down on on this IP address”
- Bins can be used to create groups of data.
- Conditional formatting enhances the readability and impact of visualizations.
- “it’s just better to have these simple visualizations on this table rather than just having the numbers themselves makes it a little bit more easy to read and understand stand”
- The choice of visualization depends on the data and the insights you want to convey.
Quotes:
- “By the end of this video you’re going to be an expert in powerbi you’re going to be creating all sorts of dashboards and kpis and reports and you’re going to be sending all of them to me so I can be really impressed.” (Illustrates the goal of the tutorial.)
- “Everything that you do every single step that you apply to transform this data is going to be right over here and if I want to if I go back and I say you know I really didn’t want to rename that column I can just click X and it is going to get rid of that and take it back to its original state” (Highlights the flexibility and ease of data transformation in Power Query.)
Conclusion:

The Power BI tutorial provides a comprehensive guide for users of all levels. By focusing on practical skills, real-world examples, and hands-on exercises, the series equips viewers with the knowledge and confidence to effectively use Power BI for data analysis and visualization.

Power BI: Quick Answers and Usage Guide

Power BI FAQ

Here’s an 8-question FAQ based on the provided source material:

1. What is Power BI and why is it useful?

Power BI is a data visualization tool from Microsoft. It allows users to create interactive dashboards, KPIs (Key Performance Indicators), and reports from various data sources. Its usefulness stems from its ability to transform raw data into understandable visual formats, providing actionable insights for decision-making. If your organization uses Microsoft products there is a good chance you already have access to it.

2. How do I download Power BI Desktop?

You can download Power BI Desktop for free from the Microsoft Store. The source material recommends a download link in its description for direct access. Once on the store page, click the “Download” button to begin the installation process.

3. What types of data sources can Power BI connect to?

Power BI offers a wide array of data connection options, including Excel workbooks, SQL databases, cloud services (like Azure Blob Storage), and online platforms (like Google Analytics). Some connectors are free, while others may require an upgrade.

4. What is Power Query Editor and why is it important?

Power Query Editor is a data transformation interface within Power BI. It allows you to clean, reshape, and transform data before creating visualizations. You can rename columns, filter rows, change data types, and perform other data manipulation tasks. Every transformation step is recorded, allowing you to easily modify or undo changes.

5. How do I create a basic visualization in Power BI?

To create a visualization:
1. Import your data into Power BI Desktop.
2. Navigate to the Report tab.
3. Select the type of chart you want to create from the Visualizations pane.
4. Drag and drop the desired fields (columns) from your data into the appropriate areas of the chart (e.g., axis, values, legend).
6. What are relationships in Power BI and how do I create them?

Relationships define how tables in your data model are connected. They’re crucial when working with multiple tables, as they enable Power BI to combine data from different sources.

To create a relationship:
1. Go to the Model tab.
2. Drag a field from one table onto the corresponding field in another table. Power BI will automatically attempt to create the relationship. You can double click it to edit the relationship.
Cardinality (one-to-one, one-to-many, many-to-many) and cross-filter direction (single, both) are important properties to configure. The single filter setting in the relationship limits data transfer between tables, whereas the both setting causes both tables to act as a single table.

7. What are DAX measures and columns?

DAX (Data Analysis Expressions) is a formula language used in Power BI.
- Measures: Calculations performed on the fly, typically used to aggregate data (e.g., sum, average, count). They are not stored in the data model.
- Columns: New columns created in your data model that contain calculated values for each row.
Examples of DAX functions include SUM, AVERAGE, COUNT, and IF. Aggregator functions (like SUM) and iterator functions (like SUMX) behave differently, with iterator functions performing calculations on each row of a table. Date functions such as DAY can also be used in DAX expressions.

8. What is Drill Down and how can I use it in my visuals?

Drill Down allows users to explore data at different levels of detail within a visualization. To use it, add multiple fields to a hierarchy in a chart’s axis. When Drill Down is activated, clicking on a data point will “drill down” to the next level in the hierarchy, showing more granular data. Useful for presenting data with layered levels of information.

Power BI Data Transformation with Power Query

Data transformation within Power BI involves using Power Query to prepare data for visualization. Power Query Editor allows for a variety of transformations.

Key aspects of data transformation include:
- Accessing the Power Query Editor Accessing the Power Query Editor to transform data can be done by selecting ‘Transform Data’.
- Applied Steps Every transformation step is documented in the ‘Applied Steps’ section, enabling review or removal of changes.
- Common TransformationsChanging data type Data types can be changed by clicking the icon in the column header.
- Filtering rows Filter rows to remove null values or specific values.
- Removing columns or rows Columns or top rows can be removed.
- Renaming columns Columns can be renamed.
- Using first row as headers The first row can be used as headers.
- Unpivoting columns Converting columns into rows can be achieved by selecting the columns and using the unpivot columns option in the transform tab.
Salary Data Analysis and Visualization in Power BI

The sources provide information on how salary data can be analyzed and visualized in Power BI, including transforming salary data and analyzing average salaries based on various factors.

Key aspects of salary analysis discussed in the sources:
- Data Transformation for Salary AnalysisSalary data often needs transformation to be usable, especially when provided as a range.
- Splitting Columns: Salary ranges can be split into separate columns representing the lower and upper bounds of the range.
- Data Type Conversion: Convert text data to numeric data types to enable calculations.
- Calculating Average Salary: Create a new column to calculate the average salary from the range by summing the lower and upper bounds and dividing by two.
- Salary Analysis and VisualizationAverage Salary by Job Title: Calculate and visualize the average salary for different job titles using a clustered bar chart.
- Average Salary by Sex: Visualize the average salary for males and females using a donut chart.
- Impact of Country on Salary: A tree map can be used to filter salary data by country, acknowledging that the average salary varies significantly depending on the country.
Programming Language Popularity Among Data Professionals

The sources discuss how to analyze the popularity of programming languages among data professionals using Power BI.

Key aspects include:
- Identifying Favorite Programming Languages Survey data can be used to determine the preferred programming languages of data professionals.
- Data Transformation The survey may include an “Other” option where respondents can enter their preferred language. This necessitates splitting the column to separate the pre-selected languages from the write-in languages.
- VisualizationA clustered column chart can effectively display the count of votes for each programming language.
- The visualization can be enhanced by including job titles, allowing for a breakdown of language preferences by profession. For example, it can show which languages are favored by data analysts versus data scientists.
Power BI Analysis of Survey Demographics

The sources contain information regarding the collection, transformation, and visualization of survey demographics using Power BI.

Key aspects of survey demographics discussed in the sources:
- Data Collection The data was collected via a survey of data professionals. The survey collected information such as job titles, salary, industry, programming language preferences, and demographic information including age, sex, and country of residence.
- Data Transformation Several transformations were performed on the raw survey data within Power BI’s Power Query Editor to prepare it for analysis. These transformations included:
- Splitting columns The ‘Job Title’ and ‘Favorite Programming Language’ columns were split to separate pre-defined options from free-text entries, simplifying analysis.
- Calculating average salary Salary ranges were split into lower and upper bounds, and a new column was created to calculate the average salary.
- Demographic Visualizations The transformed data was used to create visualizations to analyze survey demographics:
- Average Age A card visualization was used to display the average age of survey respondents.
- Country of Residence A tree map was used to visualize the distribution of survey respondents by country. This allows users to filter the data and examine other variables by country.
- Sex A donut chart was considered to visualize the distribution of male and female respondents and their average salaries.
- Difficulty to Break into the Field A pie chart was used to visualize the distribution of how easy or difficult it was to break into the data field.
- Interactivity Visualizations such as the tree map showing the “Country of Survey Takers” allows users to click on a country and see how the other visualizations change based on that selection.
Power BI: Data Visualization Techniques and Best Practices

The sources cover various aspects of data visualization using Power BI, from basic chart creation to more advanced techniques and considerations.

Data Visualization Options and Usage
- Basic Chart Creation:
- Stacked Bar/Column Charts: Useful for comparing different categories and their composition. These can represent customer purchase breakdowns, showing what percentage of purchases come from specific products.
- Clustered Bar/Column Charts: Useful for comparing values across different categories.
- Line Charts: Effective for visualizing trends over time, especially with date-related data.
- Pie/Donut Charts: While sometimes discouraged due to difficulty in comparing slice sizes, they can be used to show proportions.
- Cards: Display single values, like total survey takers or average age, for quick insights.
- Tables: Display data in a tabular format.
- Scatter Charts: Useful for identifying outliers and trends in data.
- Advanced Visualization Techniques:
- Combination Charts: Combine different chart types (e.g., line and clustered column) to display multiple aspects of the data in one visualization.
- Conditional Formatting: Use rules, color gradients, and icons to highlight data within tables or charts.
- Data bars Data bars can visually represent values within a table, making it easier to compare magnitudes.
- Drill Down: Allows users to explore data at different levels of granularity within a visualization.
- Gauges: Visualize survey data, showing average scores and satisfaction levels.
- Tree Maps: Visualize hierarchical data, allowing users to click through different levels for more details.
Key Considerations for Effective Data Visualization:
- Choosing the Right Visual: Different chart types are suited for different data types and analytical goals.
- Customization: Visual elements like titles, labels, colors, and data presentation should be customized to enhance clarity and readability.
- Data Transformation: Data often needs to be transformed and cleaned before visualization to ensure accurate and meaningful representations.
- Interactivity: Incorporate interactive elements like drill-down to allow users to explore the data.
- Color Coordination: Choosing appropriate color schemes and themes can significantly improve the visual appeal and effectiveness of a dashboard.
- Clear Titles and Labels: Use clear and descriptive titles and labels to ensure the audience understands the visualization.
- Summarization: Instead of “Don’t Summarize,” choosing Sum, Average, Minimum or Maximum functions to derive insights.
- Conditional Formatting: Add background colors based on gradient or rules, data bars, and icons.
- Drill Down: Can be enabled to present data at different levels.
- Bins and Lists: Numeric and date data can be grouped using bins. Lists can group customer names.
Specific Examples and Applications
- Survey Data: Visualizing survey responses, such as satisfaction levels, is facilitated through gauge charts.
- Sales Data: Analyzing sales data and identifying top-performing products and customer segments.
- Geographic Data: Visualizing data by country using tree maps, enabling comparisons and filtering based on location.
- Salary Data: Presenting salary distributions and averages, broken down by job title, gender, and country.
- Programming Language Preferences: A clustered column chart is used to display the count of votes for each programming language.
Learn Power BI in Under 3 Hours | Formatting, Visualizations, Dashboards + Full Project

The Original Text

what’s going on everybody welcome back to another video today we are going to learn powerbi in under 3 [Music] hours now powerbi is one of the most popular data visualization tools in the world of course it’s within the Microsoft ecosystem and so if your company uses any Microsoft products you most likely have access to the Microsoft Suite which includes powerbi I use powerbi for many years as a data analyst and then when I became a manager of analytics I actually switched our entire team over to powerbi and so I know how amazing powerbi can actually be so we’re going to be taking a look at several things in this long long video we’ll start with some of the basics of just creating some visualizations but we’ll dive into a ton of other things as well by the end of this video you’re going to be an expert in powerbi you’re going to be creating all sorts of dashboards and kpis and reports and you’re going to be sending all of them to me so I can be really impressed so without further Ado let’s jump onto my screen and get started all right so the first thing I’m going to do is download powerbi desktop I will leave this link in the description so you can just click on it go to it and download it we’re going to click this download free button and once we click it you can go to the Microsoft store and I already have it downloaded so when you see it uh it’ll already say downloaded but um for you you can go in here you can click download and it will download it for you I’m on Microsoft uh but it may look a little bit different for you if you’re on a different system but once that is done we are going to open up powerbi so let’s go right down here to our search let’s go to powerbi and it is going to open up for us all right so right away this is what it’s going to look like when you open it and we’re going to go right over here to get data and let’s click on that it’s going to open up this window and it’s going to give us a lot of different options for where we can get data from now some of these are free and some you need to upgrade from but you just taking a quick glance through here you have a ton of options there’s databases there’s um you know blob storages there’s postr SQL or different SQL databases um there’s Google analytics there’s a lot of places and you can go through the process to connect to that data and you can pull that data in from those data sources now for what we are doing we’re just going to be using an Excel I’m going to leave the Excel that I’m going to be using in the description you can go and download it and walk through this with me so what we’re going to do is click on Excel workbook and we’re going to click connect so we’re going to go right here in our powerbi tutorials folder and we’re going to click on apocalypse food prep so let’s click on that and it is going to connect and pull that data in now right here we have our Navigator and so if you had a lot of different sheets you can click on that and choose which ones to pull in I just clicked on it right over here and we’re able to preview the data but I can’t load or transform it yet I need to select which sheets I’m bringing in so we only have on that’s the only one we’re going to bring in so you can go ahead and load the data or you can click on transform data it’s going to take us to powerbi power query which is going to allow us to transform our data so I’m going to have an entire video on how to transform the data but I’m going to give you a really quick glance at it to kind of show you what it is so right up here it says our power query editor and this is a the window to basically transform your data and get it ready for your visualizations now you can do this in Excel if you want to and do that beforehand or you can do it here and there are lots of things that we can do in here as you can see at the top again I’ll have an entire video dedicated to just power query but let’s take a quick look at the data and see if there’s anything we want to transform quickly before we actually go and start building our visualizations so over here we have the store where we purchased it we have the product that we purchased the price that we paid and the date that we bought it now the first thing that jumps out to me is that this just says date on it um we might want to say date uncore purchased and we’re going to hit enter and if you noticed right over here on these applied steps it says renamed columns everything that you do every single step that you apply to transform this data is going to be right over here and if I want to if I go back and I say you know I really didn’t want to rename that column I can just click X and it is going to get rid of that and take it back to its original state so again I’m just going to say purchased and we’re going to enter that now this is our apocalypse food prep so this is food that we are buying for the Apocalypse um for this example and if we look at our products we have bottled water canned vegetables dried beans milk and rice and all of that stuff makes sense except for the milk U milk will not stay or last long in the apocalypse so I think what we’re going to do is we’re going to filter that out really quickly and we’re click okay and right over here again says filtered rows and so now if we scroll down there’s no milk so what we are going to do is we are going to go over here to close and apply and it is going to actually load the data into powerbi desktop so on this left hand side it immediately takes us to the report Tab and what we want to do is go right here to the data Tab and take a look at our data so again there’s our date purchased and as you can see the milk is not in there another tab that we’re going to take a look at um and again in this report tab this is where we actually build our visualizations the is where we can see the data and and change it up a little bit and change some small things about it like sorting The Columns or even creating a new column and over here we have this other Tab and is called model and this is especially useful when you have multiple tables or multiple excels and you need to join them to kind of connect them together we don’t have that but in a future video I’m going to walk through how to use this entire tab so now let’s go back to the data Tab and I want to just look at the data really quickly before we go over to the report Tab and we start building our first visualization as you can see I’ve been buying these different products in different months so this rice I’ve been purchasing in January February March and April and I’ve been buying it from three different locations because I wanted to see if I was spending less money at one location on all of the products so then I would just shop there in the future and save a lot of money or if there were specific products that were really cheap at one location but others they were cheaper at a different location so I should just buy like the dried beans at Costco but everything else I should be buying at Walmart and so that’s what we’re going to look at in just a little bit so let’s go over to the report tab right up here at the top there’s this data section so you can kind of choose if you want to add any more data now that we are here we can also write queries or transform the data like we were looking at in the power query editor window over here in the insert we can add a new visualization or a text box and then in the calculation section we can create a new measure or a quick measure and then over here we have share where you can actually publish your report or your dashboard online now over on the visualization section on this far right this is a very important area this is where a lot of the actual creating of the dashboards happen so let’s take a look really quick and we’ll get into a lot of these things as we’re actually building our dashboard so we’re not just sitting here looking and talking we’re going to be actually building and doing all right so we’re going to click right here on this drop down on sheet one it’s going to show us all of our columns now two of the things that we wanted to look at were where are we spending the least amount of money buying the exact same product that’ll help us determine where we want to shop and the second thing was should I be buying all my products at the same place or are there certain products that they’re going to be cheaper at a specific store and I should buy it there so let’s start out with the first one which we’re just going to see uh with the store and the price uh where we’re spending the least amount of money and just at a quick glance we can see we’re spending the least amount of money at Costco at $210 versus Target 219 and Walmart at 225 and that really answers our question but we want to visualize it better be able to see it in an easier way so we’re going to go right over here and we can click on a lot of these but the one that probably makes the most sense is the stocked column chart and it’s going to show Walmart Target and Costco now they’re all the same color let’s add a legend so we’re just going to drag store over here down to this Legend and let’s make this larger while we’re working on it so now we can see we’re spending the most amount of money at Walmart uh right in between at Target and then at Costco is the lowest and so right there we know that Costco is the place to go for our apocalypse food prep but is it going to be that way for every product I don’t know let’s take a look let’s put this up in this corner and let’s start a new one we’re going to need to select the product for sure and the price and probably Additionally the store as well and let’s click on let’s not do this one we need a cluster column chart that’s what we need let’s bring this over here let’s expand this quite a bit and so really at a glance this is giving us everything that we need we can see each product right here and we can see how much we’re paying per store and so for Rice we’re paying it looks like a lot more for our rice at Walmart while at Target is actually where we are paying the least now if we look at all of these it looks like for Costco the only one that we’re really paying a lot more on is on our rice but for our dried beans our bottled water we’re paying quite a bit less and really it’s pretty negligible for these canned vegetables we’re paying maybe what 60 cents 50 60 cents more per can so that’s pretty negligable but for the big ticket items um we’re really spending a lot less at Costco if we wanted to SP to save just a little bit more money we could go to Target for our rice now if I want to make this more like a dashboard and we’re only keeping these two things I’m going to kind of size them kind of like this whoops going to show you that in a little bit I’m going to size them a little bit like this so now that we have that looking good we want to change the title of both of these so what we’re going to do is go over here in our visualizations and format your visual uh and we are going to go to this General go to title and now we can name it anything we really want for this we’re going to say best store for product and while we’re in here one other thing that I wanted to do is I want to go to this visual go right down here to these data labels now we haven’t added any data labels so I’m going to click on and you’ll see exactly what it does uh it just puts the labels and the numbers above it so you don’t have to actually like hover over it and see what it is now it is actually rounding these numbers so what we’re going to do is go down here we’re going to go down to values and we’ll go down to display units and it’s on auto so it’s Auto rounding those numbers and we’re just going to say none so we can see the actual value of these numbers and we can do the exact same thing over here it probably is a good thing to do um and it just is going to visualize it a little bit differently in here but you can always change that if you want to go over here to title and we’re going to say total by store and now we you’re going to take a look and so in a matter of minutes we were able to take our data from an Excel put it into powerbi transform it a little bit then we’re able to create these visualizations that gave us concrete answers to some very important topics we now know that Costco is the place to go for basically every single product except if we’re buying rice if we want to save just a few dollars we’re going to head over to Target and that’s genuinely going to change my shopping habits for the next several years until the apocalypse happens all right all right so before we jump over to powerbi and start using power query I wanted to take a look at the data and this is the Excel from our last video called apocalypse food prep and in that video we went through and we bought some rice some beans water vegetables and milk all for the apocalypse getting prepared for that now we decided to buy some additional things like rope some flashlights duct tape and a water filter several water filters and after we purchased those uh our boss or whoever were working with there somebody decided to go and make a pivot table now in this pivot table they kind of broke it out by Costco Target and Walmart and had all the items had some subtotals as well as some Grand totals right here and then they decided to kind of copy and paste that into this and you’ll see this a lot when you’re working with uh people who use Excel they like to kind of make things like this maybe makeing them to like a table or or format a little bit differently but you’ll see stuff like this a lot so this is what we’re going to actually pull into Power query and work with now we’re going to imagine that this is all we have this is the only thing we were working with and I’ll kind of reference this pivot table a little bit but we’re going to pretend this is all we have and we want to transform it to make it a lot more usable to where we can make visualizations with it so let’s hop over to powerbi and pull this excel in so what we’re going to do is Click import data from Excel we’re going to click apocalypse food prep and click open and then it’s going to bring up this window right here now this is where we can choose what data to bring in so we can take a preview and just click on it real quick and this is the pivot table that we were looking at so it does have that pivot table so we are able to pull in just a pivot table and then we have the purchase overview where it’s kind of that formatted um thing that we were just looking at with all the colors we’re going to pull both of those in so we’re going to pull in the pivot table and the purchase overview now we could just load it or we could transform it and we’re going to click transform and that’s going to bring us to power query so let’s click on transform data so now really quick before before we actually jump into working through this and transforming it I want to show you what the power query editor looks like so if we go right over here we have our queries and these are the tables that we actually pulled in and we can click on those and kind of go back and forth between them now up top we have our ribbon and the ribbon offers a lot of functionality we have things like remove columns keep rows remove rows split columns these are all things that we’re likely to use when using this power query editor there’s also another tab called transform where there’s a lot of functionality here as well things like unpivoting a column or transposing columns and rows and using a first row as a header some of the things that we’ll be looking at today there’s also another tab called add a column and this one’s pretty self-explanatory where you can add additional columns like deleting a column creating an index column or a conditional column those are the three main ones there’s also view tools and help but we’re not going to really be looking at those today and then on the far right side we have our query settings you can do things like change the name so we call it pivot table 2022 and it’ll update right over here on our query side and we have our applied steps now our applied steps are extremely important and very very useful anytime we make any change to transform this data it’s going to be documented right here and then we can go back and look at it or we could even delete that change in the future if we want to and go back to a previous version of what we just did so when we loaded the data into to powerbi it did a few things for us it CHS The Source the navigation and it promoted the headers and then it also changed the data type so if we want to check we can actually see those things or change those things like this Source right here we can click on this little icon and it’s going to bring up the actual path where we got this file so if we wanted to change that or or it changes in the future we can come here and we can change this file path but we’re not going to do that right now so let’s click on cancel and let’s go back down to change type so I promoted these headers and obviously these headers are not correct we’re looking at this pivot table and not the purchase overview but it changed these column headers and so in the future if we wanted to we could easily change those but it did that for us and it changed the type as well so if you look right here it says abc123 all the way over here to where it just says ABC ABC means it’s only going to be text where abc123 means it could be basically anything uh text or it could be numeric so now let’s go over to purchase overview View and this is the one that we’re actually going to be working on the most but we might be looking at pivot table just a little bit to kind of reference it and see some of the differences so before we do anything let’s just take a look at how powerbi decided to take this data in so it shows this apocalypse food prep overview as kind of the First Column and that was kind of our header or the title of what we were looking at before and then all these other columns are basically column 1 2 3 4 fivs so that’s something that we’re going to want to change in just a little bit there’s also all these blank uh columns right at the top and kind of these null values as we go along and we’ll take a look at those and we kind of are going to want to get rid of some of this and just clean this up to make it more usable for our powerbi visualizations this may be perfectly fine and acceptable in an Excel but when you’re pulling it into powerbi the real reason you’re pulling it in is to create visualizations not just it to look good in an Excel so we’re going to need to clean this up quite a bit so let’s go right up top the first thing that I want to do is I want to get rid of these top r so we’re going to go to this top ribbon and we’re going to click remove rows and we’re going to select remove top rows and we’re going to select two because we have one two rows of all nulles and those are completely useless we just want to get rid of them right away so let’s cck Okay and it removed those the next thing that we want to do is these this location product and all these dates these are actually the column headers that we wanted so what we need to do now is we want to go over to transform and you want to say use first row as headers and just like that we have location products and these dates as our headers exactly how we wanted them now let’s say for whatever reason you know we made a mistake and we needed to go back we would just select remove top rows and that would be perfectly fine now you can see over here it promoted the headers but it’s also changed the data type so before if we went to before we removed the headers these were all AB 123 abc123 cuz it had a lot of different data types in there so it just kind of made a generic data type but when we promoted these headers the first thing that I decided to do was also change this data type for us giving us its best guess as to what this data type is and it decided to do this decimal so this one two is a decimal but we’re actually going to change that and all you have to do is click on This 1.2 or or the data type that it has right here for you and we’re going to click on fixed decimal number and let’s do replace current and now it’s just a little bit better so now it’s 2.70 2.5 and that’s normally how we would read uh values like this because this is money so we would normally read it to the second decimal just like that and if we have it on the second decimal for some we should probably have it on the second decimal for all of them so really quickly I’m going to go through and I’m just going to change that and it should be pretty quick so hang with me for just a second all right right that is perfect now for the purposes of what we’re about to do we don’t actually need these subtotals or this Costco total Target total and Walmart total as well as the grand total really we want to get rid of those and so what we’re going to do is we’re going to go right over here we’re going to click on this dropdown and we’re going to try to filter this data before we actually load it into Power VII so we’re going to filter and we’re going to say remove empty and let’s remove those and it’s going to take out all of those nulles if we wanted to try to filter this out by saying something like Costco total or Target total we could do that by going right here clicking this drop town on products go to text filters and saying does not contain and let’s do insert and we’re going to say does not contain and we want to say total and let’s click okay and again it filtered out all of those things so there’s a few different options that you can do if you want to filter out rows that contain either null values or specific values now the next thing that we’re going to do is actually get rid of a column this grand total column and so what we’re going to do is we’re going to click on the very top part where it says grand total we’re going to go back over here to home and we’re going to click on remove columns and it says insert that’s because we’re on this filtered rows one right here um but what we’re going to do is just insert that and it’ll insert it right there that’s totally fine we can just move it to the bottom now we got rid of this column entirely now this looks really good visually I like how this looks I I like how everything is set up the biggest thing about this is that when you’re actually wanting to use this for visualizations these columns as dates doesn’t really work too well and so what we’re going to want to do is we’re going to want to transpose this or pivot this to where these dates are actually rows so what we’re going to do is select the first date which is January 1st all the way through April 1st and we’re going to hit shift and click on that April 1 right there to select all of them at the same time and then we’re going to go over here to the transform Tab and we’re going to click unpivot columns and let’s see what this does and so now what we’ve done is we’ve basically recreated our original Excel that we had so let’s go back and take a look really quickly at that so this looks almost identical to what we have in powerbi right now and this is extremely usable and very good for visualizations and is much much better than this but again we were pretending that this is what we were given at the beginning so you have to imagine you know somebody just handing you this and you to make it much more usable for visualizations in the future which happens a lot and we actually wanted to create this we just weren’t given this now a few last things that we might want to do is we want to clean this up just a little bit we’re going to select the data type and change this to date and then we’re going to select the value and I double clicked on the value and I actually want to call this cost uh or product cost productor cost and then for the location I actually want want this to be called store so now this looks really good but I want to show you one thing really quickly on this pivot table 2022 so let’s go back here this looks very similar to how we had it when it first started one thing I wanted to show you uh really quickly and I want to click on this first one we’re going to make this our column header and then we’re going to try to Pivot or unpivot this January February March April so really quickly let’s do that so we’re going to transform use first row as headers so now we have this January February March April now if you notice these are not dates these are actually text it says January February March and April so if we go to do this and we click unpivot and here’s the columns that are created when we unpivot it it is January February March and April these are not dates so we cannot go and change this to a date because that would out because it’s actually text so it’s something that you want to look out for it’s something that you need to be aware of and you can change that in the pivot table so you want to be aware of how it actually sits and looks in the Excel or whatever data source you’re pulling from before you actually pull it into Power query to transform and now the very last thing that we need to do to finalize all of this is go over here to close and apply and once we click that everything that we’ve worked on is going to be applied to the actual data and it’s going to load into powerbi to create our visualizations so let’s go ahead and click on that and so now the data has been pulled into powerbi let’s go right down here to data and we can see the data right here if we need to transform this data again we can bring it back into the power query editor window by just clicking the transform data button and it’s going to bring us right back all right so before we jump over to powerbi and start creating our relationships and our model I want to take a look at the data in Excel we realized we were buying so many products for the apocalypse that we decided to start our own store and we have several customers and some client information down here and so I wanted to take a look at some of the columns and these tables that we’re going to be looking at first thing we have is the apocalypse store these are the things that we are selling I know it’s a very limited inventory but these are the really high sellers these are the ones that I wanted to sell so we have this product ID our product name price and production cost then we have this apocalypse sales this is how many sales we’ve actually made to our customers so we have this customer ID our customer name product ID order ID unit sold and the date it was purchased and then we have our customer information right here here are all of our clients so we have this customer ID customer address city state and zip code so now that we’ve taken a look at our data let’s go and load it into powerbi so we’re going to say import data from Excel we’re going to choose this model right here we’re going to click open and we are going to want all three of these so I’m going to click on all of them and we’re just going to load it we’re not going to transform the data at all so now the data has been loaded let’s go right over here on the left hand side to our model Tab and let’s scoot this over just a little bit and move back and we’re going to move these tables up to where it’s a little bit easier to see so right off the bat you can already see that there are these lines between these tables so there are already relationships that powerbi has automatically detected and created from my experience powerbi actually does a really good job at creating these relationships automatically but we’re going to go in and take a look at these and kind of see what everything means and then we’re going to go back and create these relationships from scratch just to make sure that we know how to do every single part so to get us started let’s double click on this line connecting the customer information table to the apocalypse sales table and it’s going to bring up this edit relationship page right here so this line right here connecting these two tables actually gives us quite a bit of information without actually having to click into to this edit relationship page what this is showing is that we have a one to many relationship and there’s only one or a single cross filter direction and you can find both of those things right down here and I’m going to walk through what those mean in just a little bit on this page you can also see the columns that powerbi decided to choose in order to tie these two tables together now for our example they decided to use the customer and customer right here from the customer information table as well as the apocalypse sales but I don’t really want want to use those specifically because on this apocalypse sales table I might remove this customer information and just keep the customer ID it may have chosen these customer columns because they have the exact same name and really the same information but I want to use this customer ID anyways so what I’m going to do is I’m going to click on that column and click on this column and then I’m going to click okay and if we go back into it by double clicking again we’re going to see that and now save that and if we did what we just did before which is kind of hover over it it’s going to show us what those two tables are joined on so opening this back up let’s go down here to this cardinality and cross filter Direction cardinality has several different options that you can choose from you have one to many one to one one to many and many to many now for this example we’re looking at apocalypse sales and we’re going apocalypse sales down to customer information now there are a lot of rows in the apocalypse sales but there’s very few in this customer information and there’s only one customer per row whereas in the ocalypse sales up here the customer can have several rows for several different orders so that’s why the cardinality is many to one now if we flip this and we say we want the customer information here and we want the apocalypse sales down here we tie that together now it’s going to flip and it’s going to say one to many now let’s look at the cross filter Direction and there’s only two options here it’s either single or both and if we choose both and we click okay this now goes from a single arrow pointing in one direction to two arrows pointing in both directions but what does this really mean so in order to demonstrate this I’m going to put this back to a single Direction and what we’re going to try to do is connect the data over here or the columns over here to the columns in this apocalypse store so let’s go over here to build a visualization and what we’re going to do is we’re going to take this customer information and let’s just say we want to look at state so I’m going to click on state right here and I’m just going to make this into a table and the customer information table is only tied right now to this sales table so we’re actually going to go over to the apocalypse store and we want to see how many product IDs are being bought in these different states so really quickly we’re going to come up here and create a new measure and all we’re going to say is this measure is the count of Apocalypse store product ID and we’re going to create that and now we’re going to select it so it’s added to that table so now what this is showing is that there are 10 product IDs which there are 10 products for each of these states but that’s not actually technically correct because not every state purchased these 10 different items if we go back to our model and we change both of these to a both Direction then we’re going to go back and see what changed in our numbers so now let’s go back to our visualization and now we can see that Minnesota actually only ordered seven different product IDs Missouri 8 New York 9 and Texas 10 this is actually much more accurate than before when you use the both option it takes these tables and treats them as if they are a single table but the single option is not going to do that and so for our example if we’re trying to connect this table to this table and one of the last things that I want to show you is this option right down here which says make this relationship active now if we don’t click list and there are other options in here that connect these things like the customer to the customer then that may be the active relationship but if I select this is the active relationship that means this is going to become the default relationship between these two tables so now let’s come out of here we’re going to click cancel we’re going to zoom in just a little bit and bring these tables a little bit closer so we can zoom in just a little bit more now we are going to go ahead and delete these so we’re going to say delete yes and delete yes so just for demonstration purposes we’re going to build these relationships from scratch so we’re going to come over to the customer information t table and we’re going to drag it all the way over here and put it on top of this cust ID or the customer ID in Apocalypse sales and it’s going to automatically create that relationship and we can open this up and as you can see it created the relationship between this customer ID and the apocalypse sales and the customer ID in the customer information it also defaulted the cardinality from many to one and the cross filter direction to single so we’re going to go ahead and change that to both and click okay and then we’re going to come over here to the product ID in Apocalypse store and drag this over the product ID in the apocalypse sales and again if we open it up it created that relationship for us it created the cardinality automatically and we’re going to change this cross filter direction to both and click okay and so on a really small scale that is how it works of course it becomes a little bit more complex the more tables that you add and the more relationships that are created but this is how you’re going to actually create the relationships in the model tab within powerbi all right so let’s take a look at our tables and data before we get started so we have two tables the apocalypse sales the apocalypse store for this apocalypse sales table we have the customer product ID order ID unit sold and the date it was purchased and then for the apocalypse store we have product ID product name price and production cost now these are joined together or they do have a relationship together via the product ID so what we’re going to be using are these new measures and new columns to create create our Dax functions so really quickly let’s go over to this report Tab and let’s drop down our Fields over here so we can see everything and so to get us started we’re going to go right up here to apocalypse sales we’re going to rightclick and click new measure and it’s going to open up this right here which is basically our bar where we can create our functions and so right here it’s automatically given us the name measure but we can change that and we’re going to say count of sales so now we can start writing our Dax function that’s just going to be the name of it and what’s going to show up right over here once we click enter so let’s go over here and we’re going to say count and as we’re typing it’s automatically giving us options it has something called intellisense if you’ve ever used other Microsoft products intellisense is their kind of autoc completion that helps you look at other options very quickly and so we’re just going to click on this count and it’s prompting us to put in a column name and so we can come down here and we can select one or or we can type it out and it’ll try to predict and help us choose which column to select so for us we’re going to use this order ID but let’s just start typing it out we’ll say order ID and then we can click on it and we’re going to close this parenthesis and click enter or you can go over here and click this check mark but we’re just going to click enter and so over on this right side it finalized that and save that and we can actually look at that by clicking on this box next to it and we want to look at this in a cable so now we can see that there are 74 sales now for this we want to see who’s buying our products we want to see what our what our client name is so we’re going to go over here we’re going to choose customer and we’re going to put customer on top of sales and we’re just going to take a look at it like this so now we can see that our number one customer is Uncle Joe’s Prep shop he has 22 orders now they have the most orders with us but it doesn’t necessarily mean that they’re spending the most money with us but we can take a look at that later the next thing that I want to take a look at is how many products we’re actually selling what are our big products that we’re selling we have 10 different items but I don’t know exactly which one is selling the best if if one is doing really poorly and getting no orders this is something that I want to look into so all we’re going to do is go right back up here to apocalypse sales again right click and select new measure and for this one we’re going to call it the sum of products sold and all we’re going to start out with is by doing sum and if this seems familiar to something like Excel you’re 100% correct it is very similar and remember these are both Microsoft products so there’s going to be similar functionality in both of them and so this Dax is going to have a lot of similarities to exactly how it has it in Excel so we’re going to do an open bracket and now what we’re going to choose is this units sold we want to sum up all of these units sold and see how many were we actually selling so we’re going to say units sold I’m going to hit tab it’s going to autocomplete that I’m going to close my parenthesis and I’m going to come over here and click this checkbox so now it’s created that measure and we’re already selected in this table so all we have to do is click the check mark and it’s going to show us that we have 3,000 total products sold and we can go through here and see what the big sellers are and probably the biggest one that I see right off the bat is this multi-tool survival night so these Dax functions that you can write can be very simple and lead to really good insights that you can use for the visualizations later on now I want to take a look at the difference between something like sum which is an aggregator function and something like sum X which is an iterator function because if you add X to some of these aggregator functions you can create them or or make them into an iterator function so you can have sum and some X or average and average X adding X onto the end of them can make them into an iterator function so let’s take a look and see how that actually works I’m going to show you the difference and then I’m going to talk through the difference at the end so really quickly let’s go back to our data and let’s go to the apocalypse store now what we have right here is we have the price and we have the production cost and we want to see how much profit we’re getting from each of these as well as we can take a look at the unit sold and see how much money we are actually making so what we’re going to do is we’re going to come back over here we’re going to go to apocalypse store we’re going to right click and create a measure and in just a little bit we’re going to be creating a new column and that’ll kind of show the difference really well so we’re going to create this new measure and we’re going to name it profit and we’re going to come over here and what we’re going to do is we’re going to take the sums we’re going to start with our sums we’re going to take the sum of the price and then we’re going to close that parenthesis and we’re going to subtract the sum of the production cost so all that does is it says if something cost $20 if we sold it for $20 and it only cost us $10 that’s $10 in profit for that item and then what we’re going to want to do is we’re going to actually want to encapsulate that really quickly because we’re about to use multiply and then we’re going to sum and now we’re going to take the units sold so how many units were actually sold at that profit that we just made so let’s see if that works and let’s click the check right here and so we have the profit so let’s click on the profit oops that’s not what I wanted to do let’s use a new one let’s create a new uh table we’re going to click profit let’s make it a table and I’m going to pull this right over here now we have our profit but I really want to know is which customer is spending the most money at my store so we’re going to come right over here we’re going to click on customer customer at the top and just at a glance we can see that Uncle Joe’s Prep shop is spending the most money at the store now what I want to show you is the difference between Su and sum X so what I’m going to do so I’m going to go back to this profit and going to copy this this entire thing and we’re going to go back here to this table now we just created a measure and we were able to break it down by each customer so let’s go back over here now let’s go up here to home and we’re going to create a new column and we’re going to call this profit underscore column and we’re going to literally paste the exact same thing into here and we’re going to hit enter and each row is the exact same thing so what it’s doing is it is going through the price and it’s adding all of it up and calculating it at the bottom it’s adding the production cost it’s going all the way down and calculating it at the bottom and then it’s going over and looking at how many units it sold and then it’s performing this calculation up here and then it gives us the total and it’s doing it for every single row but that’s not really what we wanted to show what we wanted to show is the profit for each row what we wanted to say is here’s the price for the Rope the production cost for the rope and then how many units we actually sold and then it’ll calculate that and give us the actual profit for just that row but we cannot do it by just using this sum what we need to do is use something called sum X so let’s add another column let’s go back to home say new column and now we’re going to say profit underscore oops underscore column underscore sum X and now we’re going to use sum X and hit Tab and we need to choose the table that we want to put this in so we’re going to say apocalypse sales because that’s the table that we’re looking at right here we’re going to say comma and now we need to input an expression which it says it Returns the sum of an expression evaluated for each row in a table before when you’re just using sum it’s looking at all of these combined now it’s taking it row by row so what we’re going to do is basically input the same thing as we did before I’m going to copy I’m going to paste that it’s not going to be correct I need to get rid of these sums but it’s basically the exact same equation give me just a second and let’s get rid of this sum and let’s see if this works so let’s click the check button and now this looks a lot better so what this is now showing us is at a row level this nylon rope made us 51,000 almost $52,000 the waterproof matches made us $155,000 and we can go down and look at each item and see how much that actually made us versus this profit column and so that is the biggest difference between some and sum X hopefully that made sense I know that sum and sum X and and the difference between an aggregator function and iterator function can be a little bit confusing especially if you’ve never done it before but hopefully that was a good example for you to understand that concept now let’s go back over here to apocalypse sales right here we have a date purchase now in the Dax function we have some ways that we can interact with dates and so I want to take a look at those really quickly so we’re going to go right up here and click on new column and we’re just going to leave that as column but what we’re going to say is day so there’s a few different ones we have Day dates YTD next day previous day and weekday and they all are pretty self-explanatory if you click on it let’s click on weekday day it says it’s going to return a number from 1 to 7 identifying the day of the week of a date so let’s use this really quickly and so we’re going to say date purchased and click tab hit comma and it’s going to give us a three different options basically it’s a one a two and a three um right here if you hit this button read more you can read more on it this is going to say Sunday is equal to one Saturday is equal to seven I like this one personally which is Monday equals one in my brain it just makes more sense so I’m going to click on two I’m going to close that parenthesis and we’re going to I guess I’ll say uh let’s say day of week for the column let’s click that check box and now Saturdays are equal to sixes Mondays are equal to one this allows us to see which day of the week people are buying the most products on or or which day of the week is somebody submitting their orders on and so let’s go over to our report let’s get rid of this I’m just going to move this oh jeez I hate moving stuff sometimes all right really quickly I want to show you the difference between what we just did and what we already have so we have this um date purchased and let’s make that into a bar graph and what we’re going to be taking a look at is actually the units sold so right here we have this and obviously for we don’t want 2022 we’re going to get rid of the year we only have one quarter right here we can see January February March so we can tell that January has the most sales or the most units sold in that month if we get rid of that we go down today we do have some information but we don’t know what day of the week it is it could change from month to month and it’s really hard to tell exactly what if there’s any pattern there at all that’s where what we just created comes in handy so let’s recreate this exact same thing but instead we’re going to use day of week so we’re going to select day of week in unit sold let’s drag that down move this over right here and this day of the week should be on the x-axis and it’s really easy now to see if there’s a pattern here there’s really not at least not for this fake data that we have um but just I I want these uh data labels on really quickly um it’s not easy to see if there’s any pattern again Monday has the most so maybe that that I mean it goes down a little bit and then it picks back up so maybe middle of the week is our least uh sales day our Wednesdays and Thursdays are a little bit lower than the rest and the beginning and the end of the the week tend to be the highest again not a huge pattern but you know it’s much easier to see if there is a pattern from week to week or what day of the week now that we use this weekday function and so this can be really really useful let’s go back here to our data and now we’re going to look at our last Dax function for this video let’s go up here and create a new column and we’re going to be looking at something called the if statement now if you’ve ever used Excel I’m sure you have heard of this and you can do the exact same thing here in powerbi and so we’re going to name this one order size order undor size and so all we’re going to say is if we’re going to click on this one right here we need to perform our logical test and then we want to say if it’s true what’s our value and if it’s false what is our value so what we’re going to be looking at is units sold so we’re looking at order size so we’re going to say if unit sold is greater than 25 what’s going to happen if it is true if the order is larger than 25 we want to say it’s a big order and if it’s not we want to say it’s a small order super simple we’ll close that parenthesis we’ll click okay and now really quickly we’re able to see if this is a big order or a small order and so that is all I have for you today there are a lot of other doc functions but the ones that we looked at today are ones that are very common ones that you’ll see the most and there can be a lot of of really complex and intricate Dax functions that you can create and in our project at the end of this series I will be sure to include some more complex Dax functions but hopefully this gave you a good introduction into Dax so you know how to use it a little bit better all right so before we get started I wanted to remind you that you can find the data that we’re going to be working with in this tutorial in the description you can go and download it from my GitHub now the two tables we’re going to be looking at are apocalypse sales and purchase tracker and if you’ve ever created any visualization you probably seen something like this where you’ll have the store and the price and this is the the things that we actually bought so this is the total amount of Apocalypse prepping uh equipment that we bought and we’ll put the store in this Legend right here and you’ve probably seen something like this and if you’re anything like me you’re going to be in a meeting and you’re going to be presenting this and some higher up is going to be like hey Alex Alex great but I want to you know see what things we actually bought in Target how much this cost can you create a visualization for that and you’re going to be like well I could or I could use drill down and so you could have done this in the first place uh which you should have so what we’re going to do is all we’re going to do is we’re going to say we’re going to say the product right here and these are going to be the actual things and we’re going to put it right under store now you can’t see these things right but there is a a hierarchy here so once we added this these options became available let’s take it out and all those just disappeared and then if we add it back right here they came back and so you can do right here which is click to turn on drill down you can go to the next level in the hierarchy or you can even expand all down one level in the hierarchy so let’s look at each of those really quickly so let’s click on this one it’s just going to turn on drill down mode so now if I go and I click on target it’s going to drill down into these and if we want to I can then put product under this Legend and we can see all of those things but of course if we go back up it’s going to be all broken up into this clustered column chart which is more like um this which isn’t exactly what we were going for but it works now uh let me get rid of this I actually want store in the legend now if we turn that off and we click it doesn’t do that anymore so what it does now is it just highlights Walmart it highlights Costco it highlights Target so we’re going to keep that on uh but we can also do something called going down in the next level of hierarchy so let’s click on that and so now this is going to go down to the next level down to this product level because that is the next level and now it’s going to show us each of those things but it’s going to have it broken out by the store and so it’s a completely different visualization but all within the same Realm of the data that we’re looking at and what we actually care about so let’s go back up in the hierarchy and then let’s use this one right here which is expand all down one level in the hierarchy and so this one is again extremely similar except it just visualizes it differently and now what it’s doing is Walmart rice Target dried beans Costco rice so instead of having an all uh like this one where it’s stacked on top of each other it’s breaking it down individually so this one column would become three separate columns now I’m going to minimize this right here uh I’m actually going to go back up in the hierarchy just for visual purposes now I’m going to show you one more example we’re going to use this apocalypse sales up here and this is one that I actually use all the time so the one you’ve seen you know you’ll get stuff like that as especially if you’re working with like sales and stuff but I work in operations right so I have a lot of order IDs product IDs stuff like that now this one this one genuinely I use quite often I’ll have a customer and let’s make it we’ll just go like this we have a customer and we have unit sold and let’s use the customer as the legend so let’s make this one quite a bit larger and I’ll have something like this and they’ll say okay well we want to see the order IDs that go with it cuz we want to know what orders are actually happening for each of these people obviously I’m not using this exact data but very very very similar and all you have to do is take these order IDs and slide it right under here under customer and this visualization right here is something I’ve done a thousand times because what happens is is someone some stakeholder in our company is saying hey Alex we want this and we want to know we want to drill down on on this IP address we want to drill down on this certain database we want to drill down on something and we want to see the order IDs within them so then all you do is you turn on drill mode or drill down mode you’ll click on it and you can see every single order ID that’s in there and then they can go and look those up in their system and resolve them or whatever they’re trying to do with it and it helps a ton and it’s very very useful this one is extremely applicable and that’s really all drill down is again you have these different hierarchies as well um but for different things it’s not as useful as you can see we also have this hierarchy which again is not as useful so it just depends on the data that you’re using and how you want to use this drill down effect but I promise you that drill down is used all the time especially when you’re giving presentations where people want to know more information than just the the visualization that you’re presenting all right so before we get started I wanted to let you know you can go and download the data that we’re going to be using in this tutorial in the description below is on my GitHub so we are going to be looking at bins and lists today um and for this we’re going to be going over here to this apocalypse sales uh and let’s open up our data right over here and we want to look at apocalypse sales really quickly I feel like more people would know what a bin is so we’ll kind of start with a list just go a little bit backwards than we normally would uh I’m going to use this customer or we’re going to use this customer column right here for a list really quickly and you can do that in two ways you can come up here and you can right click on on the customer and go to new group or you can come over here under this uh the Field section on the far right and go to customer right click and click new group so let’s click on that now and right now is only giving us the list type it’s not giving us bins because bins have to be numeric so we really can’t do that at the moment um so we’re going to call this just customer groups just or or we’ll actually call it list just so it’s easier to recognize when we create it and so all we’re going to do is we’re going to basically group these but it’s going to be called a list and so what we’re going to do is we’re going to select and we’re going to select and we’re going to say group and click on this group button and then it creates this Alex the analyst apocalypse Preppers and uh this prep for anything prepping store so that it kind of named it for us but if we double click on it then we can rename this and we can call this the best prepping stores and then we have these last two and we can we can click on one and then click control and click on the other one so we get both of them and then we can click group and we can call this and we’ll double click and we’ll call this the worst prepping Stores um and then that’s it and that’s all we have to do and what we’re then going to do and if you want to undo this and you want to switch it up and do whatever you can click on group but we’re not going to do that we’re going to click okay and here is the column that it created and it basically tells us what list we put it in if it’s Uncle Joe’s Prep shop that’s in the worst prepping stores list and if it’s the Alex the analyst apocalypse Preppers that is in the best prepping stores so it’s kind of like an if statement you could even create a calculated column do it on this customer create an if statement this is just a lot faster and a lot easier than doing that but it basically would do the exact same thing now you can use lists as well on things like numeric so let’s say we have order ID and we’ll go to new group and it’s going to Auto go to bin because typically that’s what you’ll use but you can do list as well and let’s say you know we want to say we want to call these like we’ll group these and call these the first um we’ll call this the first customers or the first orders because we’re looking at order IDs look at the first orders and then we will go back here we’re going on the left side we’re going to click oops we’re going to go back to the top we’re going to hit shift group all of these and we’ll say the latest orders and you absolutely can do this um again this is kind of like an if statement right so you’re saying if it falls between this range and this range then it’s called the first orders and if it’s between this range and this other range it’s the latest orders um again it’s just a much simpler version of an if statement and so you don’t have to write it all out you can just have this user interface kind of do it for you uh and and it’s really really useful so now let’s talk about bins and by far the easiest way to demonstrate this and I’ll show you one other way uh but by far the easiest way to show this is by using age and so uh for absolutely no reason whatsoever these customer IDs uh who are right here in this customer information they decided to give us some of their buyer information who are actually buying their products on their website it or in their store they just decided to give it to us as well as some uh simple demographic information I I don’t know why but what we’re going to use bins for is grouping these age brackets so you know you might be interested in say well I want to know if my core population who are buying my products are within a certain range and you don’t want to look at every single age because then it just you know in your visualizations it’s not going to look right you want to kind of group them and make it easier to visualize so what we’re going to do is going to go through here and we’re going to basically go by tens so 10 20 30 40 50 60 and see what age bracket these people fall in so we’re going to go to age we’re going to right click and we’re going to say new group and we’re going to go to bin and we’ll leave it as the default age bins um and you can do two things you can do the size of the bins which splits it uh uh which splits it by this number right here or you can go based on the number of bins so if you only want to do five different bins it’ll count calculate that for you and it’ll say okay if you only want five bins you’re going to have to do it at 12.2 if you want 10 bins it can be 6.1 but it is completely up to you on how you want to do that um you can do the size and we’ll just say every 10 which is what we’re going to do or you can go through and then you can create you know the how many bins you actually want so let’s go ahead and click okay and it’s going to create those bins for us so if somebody is 78 they’re going to be in the 70s bin if somebody’s 41 they’ll be in the 40 bin if somebody is 29 they’ll be in the 20 bin and so on and so forth so when we go to visualize this we don’t have you know 71 72 73 74 have a lot more things on our visualization it’ll just be the 70 or it’ll just be the 20 now we can also use bins on dates as well so let’s go back to apocalypse sales we have this date purchase so we can create a bin for this as well so let’s go to date purchased let’s go new group now you can also create a list and that’s totally fine if you would like to do that um and it would look kind of like this where you can go through and you can select it and you can say okay this group all these dates you can group those and say this is going to be January uh and you can do that and that’s totally okay um but for this one we’re going to do bins I think it’s a little bit easier to do bins because what we can do is go right here and we can specify what we want seconds minutes hours days months or years and so um for the data that we have it goes January February and March so we’re going to do months and we’re going to say the bin size is going to be one month so each month should have its own bin so it’ll be three bins total so we’re going to select okay and as you can see on this right side we have January of 2022 and that correlates to the January over here then it goes down to February and then it goes down to March and then when we visualize this uh we don’t have to do this the hierarchy stuff that we do in here where we filter it down down to months we can just use this right here and that will be our month’s column so now let’s go over to our visualizations and we’ll see how this looks really quickly we’re not going to look at all of them but we will take a look at few of them so the first one that we can look at is age so let’s look at the buyer ID and then we’ll do age as well and so let’s spread this out and we can see our distribution of our buyers so it looks like we have very few uh who are in the 10 range thank goodness and we can even put the age right under here under the age bins and we have this now we kind of have this drill down and so if we go right here and we drill down right there this will actually give us the breakdown so this is what it would have kind of looked like our visualization would have looked like if we had just kept it the age because now we’re drilling down into the age and so it looks like we have one 18-year-old and maybe a 20-year-old as well um um let’s go back up yeah so it looks like we only have one buyer ID yes so there’s only one 18-year-old so of legal age to start buying you know all these prepping equipment and probably uh buying online and stuff like that which makes sense right so uh this gives you kind of a quick breakdown in the bins rather than um doing it the alternative way so now let’s take a look at the customer list as well as the unit sold and it looks like the best prepping store uh is actually performing much worse surprisingly uh than the worst prepping store right so before we get started if you want to use the data that we’re using in this video you can find it in the description on my GitHub now conditional formatting is super simple and you’ve most likely used it in Excel before but you can also use it in powerbi and let me show you how to do that so the first thing we’re going to do is come over to our apocalypse store and we’re going to pull up our product name as well as the price and what we can do is come over here and we’re going to go to price and it has to be Under The Columns so you can’t come over here and do this we’re going to come right over here to price and we’re going to right click and let’s go to conditional formatting and we have background color font color icons and web URL let’s take a look at background color first this is most likely the one that we’ll look at the most so we’re going to get this pop up and I’m going to slide this over now there’s a lot of different things we can customize in here and the first thing I want to take a look at is format style we have the gradient and what it’s going to say is the lowest value will be this color highest value will be this color it’ll give us this gradient color scale and so we’ll use that in just a little bit but we can also create rules kind of like an if statement and if it is between this range and this range we’ll give it a color and if it’s between a different range and a different range we’ll give it a different color so we’ll also try that one and then we have this field value uh and this one is one that uh honestly I don’t use that much I’ve used it maybe once and what you can do is select a text field like customer and you can do some izations on the first and last and that is it so what we’re going to do is we’re going to look at gradient specifically for not the customer but we’re going to go back to the apocalypse store and we’re going to do it on the price now what I’m going to do is keep it as the count because this is what the default is and we’re going to go back and fix it later but what we want our lowest value to be is this bright green showing that it’s it’s a cheap product it’s easy to purchase the high value ones are going to be just the shade of red more expensive and we’ll do it on the count now remember the count is on each of these and we’re not doing a count of how many are sold we’re doing a count of each product so it’s just one per row so it all should be the same color let’s take a look so it is all the same color but what we really want to show is the actual price not just the count of the price so let’s go back to conditional formatting we’re going to click the background color again and this time we’re going to change the summarization now you can do sum you can do average minimum maximum it really doesn’t matter for this example the number is the same regardless of really which one we choose so we can just choose the minimum and it’s going to choose the minimum of each row which is the price so we’re just going to select minimum for this example we’ll select okay and it should correct it accordingly which means the bright green is the lowest and it goes all the way up to the highest which is the red now let’s go over here to apocalypse sales We’ll add in the units sold and let’s move that out a little bit and I’m doing that on purpose because we’re about to look at something within the conditional formatting so let’s go to unit sold and we’ll look at the conditional formatting for this one now if you noticed we now have a new one on here called datab bars now we’re able to see data bars on unit sold and not price because unit sold is something like a sum an average something that’s aggregated but let’s take a look at data bars because I want to show you how to use this and then we’ll go back to the background color so for data bars we are going to taking a look at the lowest to the highest value again we’re going to go from bright green all the way to this exact red it’s going to be from left to right and what it’s going to show you is if it is a positive number which all of these are is going to be a green bar basically representing the number that you see in here along this line so let’s click okay and we’re going to be able to see the highest numbers and let’s let’s scooch this over quite a bit so you can kind of get a better understanding and we’re going to do it from highest to lowest so we sold the most multi-tool survival knives at 477 and so this entire bar this row is entirely filled up or almost all the way filled up while as it gets lower and as we sell only 182 solar battery flashlights the bar is going to represent that and show that now I’m about to completely mess up this visualization on purpose because it’s about to get very messy to show you that you can do a little bit too much uh it is possible what we’re going to do is we’re going to go right over here to this background color unit sold and instead of gradient let’s look at rules now with the price we just did a gradient scale but we can do basically groups of these and say if a number is greater to or equal than this number then it’s going to be a certain color and then if it’s in a different range we can give it a different color so we’re going to say if it’s greater than or equal to zero and we’re going to say number not PR and if it’s less than 266 cuz we have 265 right here let’s make it a nice uh like gold a beautiful lovely mustard gold just just great now we’re going to say if it’s greater than or equal to we’ll do 266 because this is less than 266 so it should be greater than or equal to 266 number and if it is less than we’ll say 500 now we want to do this this one and we’ll give it uh let’s do like a peach and we’ll click okay and now we have another conditional formatting on top of that that can give us more information now again you should not do this it’s just too many now let’s go one step further and make it even more ridiculous and show you one more thing before I show you how you may actually want to use this uh let’s go back to unit sold we’re going to right click go to conditional formatting and you can do something called icons um font color is the exact same thing as back color except it changes the the font and so I’m not really going to look into that one icons are very simple extremely similar to Excel and how you’ve seen them and the rules that you can apply to them are basically the same as if you’re doing like a gradient and it’s these if statements that we saw before now it autog gives us this right here which basically says 0 to 33% 33 to 67 67 to 100 if it’s in the bottom 3% it gives us this red the middle is yellow and the top is green green so we can go through and change all of this but honestly this looks pretty good so let’s click on it and so the ones that are least sellers are these red ones right here and the top sellers are up here now this is just based on unit sold and this looks absolutely terrible so let’s kind of take this exact information but make it a little bit better so we’re going to create a new visualization or at least a new table so let’s click on product name and we’ll take the price unit sold and revenue and what I think makes the most sense for looking at revenue is these data bars right here but there’s only one problem I can’t do that because it’s not summarized like unit sold was but what I can do is to get that those data bars is I can come right down here instead of saying don’t summarize I can summarize it and I can just click the sum so it now is summarized it’s the exact same number but if I right click on here as sum of Revenue I go to conditional formatting I can now use those data bars and so we’re going to use those data bars and we’re going to say for the lowest value and the highest value and let’s just make it a nice maybe a darker green I don’t want it to well that’s that’s hideous let’s make it this color right here a nice dark green and there’s no negative so it doesn’t really matter we’re going to go left to right and you can show the bar only but we’re going to keep it because I want to see it and we’re going to go just like this we’re going to order and this is pretty telling um honestly I did not think the weatherproof jackets were performing so well but I mean they are by far a number one seller so you know our weatherproof jackets multi-tool survival knives and the nylon rope are perform outperforming all of our other products so those might be the ones that I focus on the most while duct tape the n95 masks and waterproof matches I mean those are those are garbage so I might be looking to replace those in the near future with some other items that might sell a little bit better so that’s how you use conditional formatting and it’s actually pretty useful there are a lot of times where I’ve done something like this in an actual visualization for work and it looks something like this it just depends on what you’re visualizing but this is very much a simple thing that you can do to just add a little bit more information and and actual visuals to this little chart or table that you’re going to create sometimes it’s just better to have these simple visualizations on this table rather than just having the numbers themselves makes it a little bit more easy to read and understand stand all right before we jump into it there is a link in the description where you can get the data that we’re going to be using for these visualizations if you want to practice them yourself before we actually get into it we do need to combine this and if you download that Excel and you see this you’ll have to do the same thing all we have to say is that this product ID is the same as this product ID purchased and now we are good to go do one to many and it’s okay if it’s one way so right over here under this visualizations tab there are lot lots of different options and it can be a little bit overwhelming you don’t really know which one to choose there are some in here that I have almost never used for my job ever so I’ll Point those out as we go through but the main focus is going to be focusing on the ones that I do use that I have used and showing you how to actually create that visualization maybe spice it up just a little bit but we have a lot of them to go through so let’s jump right into it and the very first one that we’re going to start with probably the easiest one and the one that you’ll recognize the most is a stacked bar chart and what we going to do is go ahead right over here to the product name and we want this unit sold as well so we’re going to click product name and it’s going to go straight into the Y AIS for us and then we’re going to click unit sold and that will go into the x axis automatically it just kind of intuitively knows but sometimes it will make a mistake and then you can just fix it or flip it and we do want this uh let me make this much larger we do want this to be a little bit more colorcoded that is what this Legend is down here so what we’re going to do is drag this product name down to the legend and now we have each product as its own color and in previous videos we have gone through and looked at some of these Visual and general options that you have when you’re actually creating these visualizations but we’re going to do some of them while we’re in here as well so we’re just going to go down here we’re going to choose data labels and we’re going to shrink that and if you go higher the higher you go the less you see so if you want all of them all the way down to the green we’re going to go right about there and we’re going to make it smaller so now we can go ahead and click anywhere outside of that visualization and now we can create a new one if we had just kept it like this where we were still interacting with this visualization and we clicked on a different one it would have then changed our visualization completely which we don’t want so let’s hit contrl Z click out of it and now we can create a new one let’s go right over here to this 100% stacked column chart I’m going to click on it try get get over here and make it much larger and we’re going to come right over here to this customer information and we’re going to click on customer and then we’re going to go up to unit sold and click on unit sold and we want to break these out and so basically what this is doing is it’s breaking it out by each of these shops and we can see the total of what they’re buying the units sold but we want to see exactly what products make up this percentage or this 100% so we’re going to go right over here to product name we’re going to drag that down to the legend and as you can see now we have each of these products and each of the products is up here so this backpack we can see the backpack right here backpack right here and right here and we can see which customer is buying what percentage of their purchases so for this prep for anything prepping store they have a very large percentage 40% is duct tape so they’re buying a lot of duct tape so really quickly we’re able to see what clients are purchasing or which clients are purchasing what products the most so so just like this Alex analyst apocalypse Preppers they’re buying a lot of water purifiers we like drinking clean water um you know that’s just what my audience likes and so you know we can easily get a quick glance of that again we’re going to go in here I tend to like putting these data labels on here that’s just what I preference so you know something like this it looks nice it looks clean um we can always go back and change these names which we’ll do for this one so we’re going to go over here go to title we’ll go down to to the text and we’ll do customer oops customer purchase oh jeez breakdown pretend I’m really good at spelling and we’re GNA do it just like that we’ll get out of there so now we have customer purchase breakdown and that looks really nice it’s a good uh a good visualization and we’re going to bring that right over here we’re going to have a lot on the screen so I may have to uh make them smaller or larger to fit everything all right so let’s go on to our next one another really common visualization is this one right here which is the line chart and the line chart is great especially when you’re using things like dates I have found this one to be the best and a lot of people use this as well so we’re going to go right over here and click on date purchased and then unit sold and on the x-axis you can see it’s broken up by year quarter month and day so we don’t want to do at that high level we only have three months of data in here so we’re going to get rid of the year we’re going to get rid of the quarter and then we at least have this and let’s break it out cu right now we’re looking at all of the units sold so we’re going to drag the product name right down here to the legend and now it breaks it out by the actual product and for each month in January February or March you can follow these products and see how they did in each of those months and if we wanted to we can come right over here to the filter on the product name and we could filter it by maybe the top three so let’s do multi-tool survival knife the nylon rope and the duct tape we can have it just like this and you know you can do those for any product that you want but again we just want to do it for those three just for an example and that really doesn’t give us a ton of information we could even go down to the day and you know it might give us a little bit more information and so we’ll keep it like that and we can go over here change the name as well we’re not going to do for all of them again we’re just looking at the different types of visualizations I think are really good to know but we’ll change this one as well to products purchased by date we’ll keep it just like that again nothing fancy we’re just trying to look at a bunch of different stuff so let’s put this over here down here now let’s click out of there and there are other ones in here um that are definitely useful and you absolutely can use um like this one is a stocked bar chart this one is a stacked column chart it’s basically the same thing just a different orientation we went to here it’s just a different orientation it’s the same thing um just like this clustered bar chart cluster column chart it’s just its orientation either horizontal or vertical then we have things like an area chart a stacked area chart not really things that I’ve used too much in previous positions one that I have used though is a line and clustered column chart so it kind of combines a few of these with you know you have these bar charts as well as line charts into one visualization so let’s look at this one because this is one that I have used several times in my actual job so for our x axis we’ll use the product name then we’ll look at something like the price and so let’s make this a lot larger so you can actually see it so now we have the price and now we can look at something like the production cost and that can be our line Y AIS so now we’re looking at the price of it how much someone is actually paying for it and then we’re looking at how much it’s costing us to actually produce that product and so really quickly at a glance you can kind of see that it’s around the halfway to 2/3 point on most of these you can see that the production cost is always lower than the actual price because of course we’re out here to make a profit on these products so let’s minimize this one we’re going to put this one right down here let’s make it even smaller smaller let’s click out of that and the next one that we’re going to take a look at is a scatter chart so let’s click on that and make it much larger oops there we go so let’s use the price and the production cost again and so our x axis is the price our y AIS is the production cost but now we need to fill in this values right here so let’s go over here and click on the product name and drag that into values and so now we have our values we just don’t know what they are but we can see see it so let’s drag this down to Legend as well and it breaks it out and we kind of have this scatter plot and you know for this fake data that we’re using it doesn’t really show a lot U but if you’re using real data you can definitely find outliers and Trends and patterns using this type of visualization let’s go ahead and make that one small as well iy get right down into the corner now let’s go right over here and we have the the dreaded pie charts um and donut chart now look I think it’s kind of a joke in the data analyst Community about pie charts and dut charts but at the same time people use them and they request them and so sometimes you’re going to use it whether you like it or not so let’s click on the dut chart and let’s make this one a lot larger and let’s go over here and let’s click on State and we’re also going to click on total purchased and that’s really all you have to do these ones are pretty straightforward you can change a few different things like where these labels are if you want them inside you can also do that that would look totally fine um again I’m just not a super huge fan but you will get this one requested people like this and want to see it and the reason a lot of analysts don’t like using this is because when you start glancing at these it’s really hard to tell the difference between these sizes if you look at something like this you can easily see that this is larger like if you’re looking at this one the multi-tool survival knife is obviously the longest and it gets shorter shorter shorter shorter but when you start getting in here it’s really hard to approximate the size I would not be able to tell the difference between this 5.63 5.78 two 7.72 I would not be able to tell really the difference between these or or kind of the the difference between them very easily that’s why a lot of people don’t want to use them in general so again I want to show you this one because I think it’s worth noting and worth knowing how to use but I don’t really push people towards this because I don’t think it’s the best visualization available most of the time all right the next two are super easy but are used all the time uh maybe more than some of these even but they’re just so easy to use so I’m kind of saved them for last this one is the card and all the card is is it displays one number or multiple numbers if you want to use a multi- card but we’ll just look at the card for now all we’re going to look at is the total purchased and it’s just going to display it just like this and you can make it as large or as small as you’d like and normally it goes on like the top and you’ll put card here a card here um just for example I’ll kind of show you how this might look so it look something like this right and at the top it’ll have different usually High overarching information and this is super common to see and I’m sure if you’ve looked at other people’s visualizations you’ll see something like this this is usually totals or averages or something like that in here where it’s super easy to look at so like right here this is total purchased and we can go in and look at the minimum and then we can go over here and this one can be account and so it gives us a lot of information just at a really quick glance and then we have all of our more in-depth colorful visualizations that kind of have more information than just a single piece like the card does and then the very last one that I’m going to show you is this one right here which is the table and this one is obviously extremely popular it’s like an little Excel table and we can go in here and we can get the customer wherever that is and then we’ll also get the unit sold and this is what it looks like and it’s super easy and often times you’ll have it like on the side as well uh and all the other visualizations over here and so you know if we’re going to take all these visualizations and pretend they were like a real thing you know there’s a lot in here but we’ll just kind of really quickly do this um you know we might have something like this and we’ll make this larger and make this wider and you know we have a lot of information just in here and this is not a project so don’t go put this on your portfolio I’m just threw a ton of random visualizations on you know this dashboard but you can already see a lot of these you most likely have seen in other people’s work and other people’s visualizations on LinkedIn or on YouTube these are very common very very popular and again we did not go through all of the ones over here there are maps that you can use but I haven’t used Maps ever in my job there are things like gauges and deom composition trees and waterfall charts and uh tree maps and all these different things but I really have never use those in my actual job and I don’t see them a lot in others people’s work either otherwise I would be telling you to learn these and use these what’s going on everybody welcome back to the powerbi tutorial Series today we’re going to be working on our final [Music] project now this is our final project of the powerb tutorial Series so if you have not watched all of those videos leading up to this I recommend going and watching those videos so you can make sure that you know all the things we’re going to be looking at in today’s project I am really excited to work on this project with you because I think it is a really good one and it uses real data that we collected about a month ago where I took a survey of data professionals and this is the raw data that we’re going to be looking at and so I think it’s just really interesting that we collected our own data and now we’re using it for a project we’re going to transform the data using power query and then we’re actually create the visualization and finalize the dashboards as well as create a theme and a different color scheme to kind of make it a little bit more unique without further Ado let’s jump on my screen and get started with the project all right so before we jump into it I wanted to let you know that you can get the data below it is on my GitHub you can go and download this exact file that we’re going to be looking at now in the past several projects we have been using this fake apocalypse data set you know it was fun it was you know whatever this data set is real this is a real data set it was a survey that I took from data professionals I posted on LinkedIn and Twitter and all these other places and we had about 600 700 people who responded to the questions so before we actually get into it and start cleaning the data and doing all this stuff in powerbi I just wanted to show you the data all right so this is the CSV that I downloaded from the survey website that I used and this is completely raw data I haven’t done anything to it at all but let’s go through the data really quickly and we’ll kind of see what we have and we are not going to make any changes at all in Excel we’re going to do all of our Transformations or at least a few transformations in powerbi because again this is a powerbi tutorial and project so I want you to kind of learn how to use that and not use Excel because you can go through my Excel tutorial if you want to do that so let’s just look at it in Excel and then we’ll move it over to powerbi and actually start transforming the data so we have this unique ID these are all the people that actually took it oops don’t want to do that we have an email which this is completely Anonymous I didn’t collect any data or user data on this then we have the date Taken um and let’s get into the actual good information then we have all of these questions so we have question one which title fits you best and they can choose things now uh let’s add a filter really quickly that we can look at this now you had the pre-selected ones which were like data analyst architect engineer but then there was an option where you could say other and you could specify what that was so if you look in here we’re going to have all all these different other please specify with different titles right and there were a lot of them now typically what you want to do is really clean this up and we’re not going to be doing a ton ton ton of data cleaning but we are going to do some in powerbi but none in here but typically with this amount of data and the way that it’s formatted we would do so much data cleaning um with this one I mean there is a lot of work to be done um like this current year salary this is one that I would absolutely be cleaning up because it’s arranges and it has a dash and a k and all these numbers this is something that I would be cleaning up and using but we’re not going to be cleaning this up right now so anyways let’s just get into it let’s see what questions we asked uh we have the yearly salary what industry do you work in favorite programming language then there were a lot of different options this is like one question where they picked multiple options so is how happy are you in your current position with the following you have your salary work life balance um then we have co-workers management upward Mobility learning new things um and they could rank it from zero to 10 so some people ranked upward Mobility at 10 so I’m ranked it a zero or a one um and again they can answer however they want how difficult was it to break into Data very difficult very easy um if you’re looking for a new job we have have you know what would you be looking for remote work better salary Etc we have male female which country are you from and then this is more like demographics so if you’re a male how old you are and this was in a Range so this is like a a a a sliding bar so you can slide to the exact age you had there’s some people who are apparently 92 um which if that’s true I mean good for you man or woman actually really quickly I’m going to see just just while we’re here I’m going to see if this is a male male or female that’s a female from India very cool um so we have all this information and it is a lot of information when you have something like this I mean there is so much data cleaning that can be done I mean I already see like 20 plus different things that I would need to do to make this a lot better um and we also have date Taken and the time taken as well as how long they took on it like the time spent really really just really interesting data but again this is a beginner tutorial Series this is the beginner project so we’re not going to get do anything too crazy I will be using this exact data set in a future video doing a lot more data cleaning and creating a much more advanced visualization with what we have and what we’re looking at right here but for this video we’re just going to be doing a pretty simple visualization and dashboard that you can use uh to practice with or put on your portfolio if you know that’s where you’re at right now so let’s get out of here and let’s put this into powerbi so let’s exit out and let’s come right over here to import data from Excel we’ll click on powerbi final project and open give that a second doing this all in real time we only have the one so we’ll do be we won’t be practicing any joins or anything but we’re not going to load it we’re going to transform this data so let’s put it into Power query editor and now we have all of our data in here and it should look extremely familiar now when I’m looking at this when I start looking at this information I kind of need to know beforehand what I want to get out of this do I need to clean every single column do I just need to clean a few of them do I need to get rid of columns that’s kind of where my head’s at and so right off the bat I can already tell you that there are columns that we can just delete to get out of our way so we’re going to do that at the beginning so that we don’t have to do that later on or they’re just in our way so I’m going to click on browser and then I’m going to hit shift and I’m going to go over here to refer and I’m just going to go up here to remove columns and everything that we do is going to go over here to this applied steps if you’ve been following this series um you know we can remove things add things but anything we do will show up right over here so we can track it and go back if we need to now one column that I know for sure that I’m going to be using quite a bit is this which title fits you best in your current role because I I specifically wanted to do a breakdown of diff people’s roles and how much they make and different stuff like that so I know that I want to use this but as we saw before there’s kind of the issue is is it’s not very clean right it has data analyst data architect engineer scientist databased developer and then all like a hundred different options and then a student or or none of these right um and so for the purpose of this video right here we are not going to take every single one of these options because this involves a lot more data cleaning let me give you an example this says software engineer this also says software engineer and with AI these two would typically be combined or standardized to software engineer but it’s not very easy to do that in powerbi we could do that in Excel but not really in powerbi or even SQL if we pull this from a SQL database um and you can find lots of different you know options of that we have data manager and data manager if we separated these out these would be different options when we created our visualizations and we don’t want that so what we are going to do uh and this is going to be kind of a an easy way out to just make sure that this is pretty clean and doesn’t we don’t have a thousand different options we’re going to create this to other so we’re going to simplify this a lot and then we’re going to use this so we’ll have maybe six or seven options instead of the you know let’s say 50 that we would have if we actually did the harder work which just break it out standardize it and clean it up that way so what we’re going to do is we’re going to click on this right here and we’re going to go up here to split column in this ribbon up top we’ll go to split column and we want to do it by a delimiter and if you notice let me see if I can move this over if you notice we have other and then we have this parentheses and in no other option way is there parenthesis so what we’re going to do is we’re going to use a custom and we use this open parentheses what that’s going to do is it’s going to separate it by this parentheses it’s going to leave the other it’s going to create separate columns um just one separate column for each of these and we can do that at each occurrence or we can do the leftmost and we really we only need it for the leftmost because there’s only one of these uh left-handed or left-sided uh brackets or or whatever this is called and then let’s go and click okay and it should create another column so it’s going to have 0.1 point2 and now we have if we click on this now we only have these options we have analyst architect engineer data scientist database developer other and student looking or none that is what we want it makes it so much simpler and it’s not perfect but again I’m trying to show you what we are able to do in powerbi so now we’re just going to remove that column and we’re going to go and do the exact same thing to this one as well cuz I know that we want to use this and I really wanted to use this one as well but if we look at this one also um there’s a lot so I said what is your favorite programming language and people there were pre-selected answers like JavaScript Java C++ python R things like that and then there was other option and in this other option I mean it was free text so they can fill it in as they want I mean there’s four five six different ways that people put SQL that is something I would standardize and you know that would be the way I cleaned it but that’s not how we did it in here so we’re going to do the same thing we’re going to keep that other so we’re going to split this column again we use a delimiter and for this delimiter though we’re going to use a colon so we’re going to say we’re going to do a colon right there just do the leftmost we’ll click okay and then we have our options and it’s much simpler now I really would have rather kept all these and because sql’s in there quite a bit but you know a lot of people don’t think SQL is even a programming language so uh we’re going to delete that column now one that I just skipped and I kind of wanted to go back to is this current yearly salary I really want to use this let’s see if we can use it I here’s what I want to do with it and this is not perfect um but for this video I want to try it what I want to do is break up these numbers 106 25 and then take the average of those numbers so then we’ll use some docks in there so we’ll take 106 125 create that into two separate columns then we’ll create a third column that will give us the average of those two numbers so we’ll do 106 plus 125 divided by two and then we’ll have the average of that now that is not perfect but it’s going to give us at least you know an average a kind of roundabout number because they gave us this range they said my salary is between 106 125,00 000 so if we say that their salary was 112,000 at least gives us it makes it usable it’s a numeric value instead of being this which is text which we really we could use and and I’ll show you how to do that because we’re going to keep this column I’ll create a copy of this and I’ll show you the difference between this and using the average but for but for this data cleaning portion let’s just try it let’s see what we can do and see if we can make it work so first let’s create a duplicate so we’re going to uh duplicate the column so now we have this copy at the very very end and we can use this one instead of having to use the original way way way back here so we’re going to leave that one how it is and we’re going to use this one so let’s go ahead and split this one up we’re going to click on the column header then we’re going to click on split column and we’ll do it by digit to non-digit and if you look at it right here it’s broken it out kind of um in the fact that now in this one we just have numeric values and in this one we have k-h numeric or just Dash numeric and now this can be easily cleaned whereas this one we can just completely get rid of because it’s only K so we’ll just remove that column and then in this one we’re going to rightclick we’re going to click on replace values and so if it just has we’re just do a k a we’ll replace with nothing do okay and then for the last one we’ll go to replace values and we’ll do it the dash or the minus sign and we’ll place that with nothing and so now we have our values as well oh we also have a plus let me get rid of because that’s when some people had 250 or 225,000 plus so for that one the average is just going to be 225 we’ll have to specify that in our decks I forgot but actually if somebody has 225 let me find this plus really quick uh let me filter by it because that’s a lot faster what we actually want to do for the purpose of this one is we want to put 225 here so that when we do 225 plus 225 divide by two it comes out to 225 that’s just what we’re going to put it as and there’s only two people so uh I’m actually going to replace this I’m going to do replace values I’m going to say plus with 225 and we’ll click okay on awesome we can unfilter these select all so we’re going to go right up here to add column we’re going to say custom column and we’re going to go right over here actually let’s make it uh average salary so we get average salary so we’re going to insert this we going to say parentheses and we’re going to say plus this insert and close the parenthesis divided by two and it says no syntax errors have been detected let’s click on okay and it’s giving us an error so it’s saying we cannot apply operator plus to types text and text which makes perfect sense these aren’t uh numbers so let’s make it a whole number and let’s make it a whole number and then let’s see if this will actually work now or maybe we just need to try a whole another one so let’s try transform or add column custom column let’s try this all again see if uh I can make it work insert this one plus this one and we’ll do divided by two and let’s try this one and there we go so now let’s get rid of this column columns and we can actually remove these ones as well because now we have this um average salary column which when we look at this or when we use this uh we can let me see if I can just move this way way way over all right I might cut because this taking forever so if you take the average of these two numbers you’ll get 53 if you take the average of 0o and 40 you’ll get 20 so now we have this average salary and again when we get to the actual visualization part I’ll show you why this isn’t as useful as having this average salary and just a reminder this is not perfect uh I wouldn’t typically do this especially if I had it in Excel or if I was you know creating this survey in a different way I would probably have a very specific value where they can do it on a slider but this is how it is so we’ve at least made it usable or more usable in my mind and we have a few other things that we can change like what industry do you work in where we can break this one out so I’m going to go ahead and break this one out as well as this one right here which country do you live in I’m going to breako both of those out to where it’s the country or other I’m not going to have these other values although there are a lot of them because there’s a lot of people who live in these different countries but we can’t really do that super well in here because again the same issue kept happening Argentina Argentina Argentine a Australia so we can’t normalize those values unless we spend just a copious amount of time doing do that so I’m going to go ahead and do these I’m going to fast I’m going to fast speed this so it goes a lot faster so I’m just going to go silent and let this happen really quick and then we’ll get to the end and we’ll actually start building our visualizations all right so we’ve split them up and as you can see we have all these options as well as other and I think you know there let me tell you there is so much more that we could do with this I mean just so many other things but this is like what the bare minimum of what we need for this project so let’s go ahead and close and apply this and if we need to come back at any point and actually fix anything or change anything we can so it’s not like that’s permanent um so as you can see we have everything over here we have all of our data as it is transformed in here as well and now we can start building out our visualization so let’s go back to our report and let’s start building something out all right so let’s add a title to our dashboard make this right at the top call this the data professional survey breakdown and let’s make that quite a bit larger make it bold why not and we’ll put that in the center and now let’s um let’s add some effects let’s change that background to something like it’s too dark something like this and I do not like that bold let’s take that off there we go so something like this just has a quick title to what we’re about to do what we are about to build so we’re going to start off with the most simple visualizations that we’re going to do and we’ll kind of work our way towards kind of the harder ones so the first one that we’re going to start off with is a card and the cards are obviously like just super super easy they usually just display one piece of information so we’re going to go right over here to the very bottom at the unique ID and we’re going to select it and we’re going to say a count of distinct or a count it doesn’t matter um and it says 630 count of unique ID now we’re not going to keep that as is we’re actually going to go right over here we going to say rename for this visual and it says count of unique ID but we’re going to say count of survey takers and you can say whatever you want here but in in general that is what it is we’re we’re counting how many people um you know took this survey and that’s just a kind of a total maybe I say total amount or of survey takers but you can say count of survey takers how many people took the survey so let’s click out of there let’s click on card let’s make it about the same size size we’re going to drag it up here and try to make them about the same we will in a little bit we’ll make them the same size um but for this one we’re going to look at age so we’re going to look at current age so I’m click on that and we’ll say want the average age so our average age taker is almost 30 years old so let’s go right over here we’re going to say rename for this visual we’ll say average age of survey this might be too long average age of survey taker again name it whatever you’d like so again these are meant to be high level numbers so when somebody’s looking at your dashboard they can just really quickly glance at this and know exactly what it is instead of like some of these other visualizations that we’re about to create they don’t really have to dig into it look at the x- axis the y axis the the different uh Legend colors and whatnot they can just see these high numbers and get a really quick glance of the data now let’s create our first visualization and what we’re going to do for that one is a clustered bar chart so let’s go ahead and click on the clustered bar chart and create as small or as large as we’d like and for this one we’re going to be looking at the job titles now remember we kind of change the job titles or you know uh transform those if you want to say that so we’re going to look at Job titles and then we’re going to look at their average salary and if you remember we transformed that one as well we have average salary now this one is it looks like a text right now so it may not work properly and what we’re actually going to do is go over here I want to see the average salary so let’s click on average salary and see if we can change this data type from a text to a decimal number let’s click yes I forgot to do that when we were transforming it and there we go this is perfect um so now we can go back and we can select our average salary and as you can see it has this um this function symbol so now we can click on it and it’ll look a lot better and although this says average salary as the title it’s actually doing a count or the sum so we can click average right here and what we want to do is actually break this down by the job title and so now we can see data scientists are making the most by far they’re making average of 93,000 at least from the survey takers that took it then we have our data Engineers making 65,000 data Architects are making 63 and then we the data analysts data analysts are right here making 55 so again we had 630 people take this survey and so the vast majority of them were data analyst so this one’s probably the most accurate out of all of them and I actually don’t like how this looks as the clustered bar chart let’s try the Stacked bar chart and put this as the legend that’s more more what I was going for I don’t know I didn’t want as skinny because when you’re doing this one it typically they have multiple options per um uh x axis and so I think that’s why it was that little skinny line but this one is more what I was looking for but let’s make that smaller and let’s definitely change that title because good night um this is like incredibly long let’s go over here to this format visual we’ll go to the general the title and we’re just going to say average salary by job title just like that and this looks a lot better now we’re not going to kind of format all our whole dashboard yet we’re going to create our visualizations and then we’re going to kind of organize everything and kind of play Tetris with it to make it look the best so we’re just going to minimize this and put it right up here for now um but we will go back and kind of make everything look better at the end and actually while we’re here I also want to change this as well so rename for this we’re GNA say job title Oops why did I do that job title and for this one we’re just going to say name average salary there we go looks much better much cleaner uh took away a lot of the anxiety that I was feeling about two minutes ago when we first put that up there so let’s go on to our second visualization the next one that I’m interested in is actually what programming language people were using the most so we have salary there’s a thousand different things we can look at in here but I want to know you know what is people’s favorite programming language so let’s take a look at that so we have favorite programming language let’s find that so we have our favorite programming language and we also have how many people actually took it or the unique people so right now this is columns we don’t want that let’s um let’s do a clustered column chart click on this right here and it looks like here we go that is kind of what we’re looking for and instead of count of unique ID we’ll say count of let’s do count of Voters and for favorite programming language we’ll say favorite oops programming language and get rid of that as well and then we’re going to go into here also and change the title and say favorite programming languages or favorite pro programming language just like this now let’s make this a lot bigger so you can see it but really quickly at a glance you can see python is by far the most popular are other C++ JavaScript Java now all we’re seeing is the count so it’s all the same same it’s just blue we can see how many people voted for each one but if we wanted to break it out similar to how we did with the job titles we could still do that so all we’d have to do is break it out uh bring this job title down to the legend it now breaks out like this and that’s not exactly what I was going for I was going more for something like this where we can see the still the whole count but now we can see who is actually voting for these things so I’m just not a huge fan of the colors that are pre-selected here and kind of the whole whole theme of this dashboard at the very end we’re going to completely revamp this change a bunch of colors the background and make this look a lot nicer rather than just the white background like we have it um and so for now let’s just make this a lot smaller and put it into this corner these will not be staying there but we need to we need room to create our next visualizations and just a cleaner space to do things now the next thing that I really want to include is a way to break break down where they’re from their country because especially something like salary is very dependent on your country whereas the average salary in the United States for a data analyst may be like 60,000 in another country it could be 20,000 that could bring down the average quite a bit so we need a way to be able to break that down now we can do something like a filled map and there’s no problem with that at all um but you know for what we’re building what we’re creating it’s not probably going to work out the best I mean this looks okay we could stick it in the corner or something um and you can do that and that’s perfectly fine I think what I’m going to do is something like a tree map which I don’t use a lot but I want something where they can just click on it they can look at the values distinct they can look at the values and just click on it and it’ll be right there for them so they don’t have to filter it out on their own or no geography and look at this map they can just read C other United Kingdom India United States and click on that and so for example let’s click over here on United States the numbers change quite a bit now the average salary for a data scientist is 139,000 for data analyst it’s 80 and if we look at India you know the average salary for a data scientist is 68 the average salary is 26 for a data analyst that doesn’t mean that they make less money in India that just means that the cost of living is probably lower in India therefore they don’t need the higher US dollars salary because again this was all done in US dollars so just something to think about let’s click out of that so we’ll keep that one as well so now let’s create our next visualization and this is one that I do not get to use enough in my actual job so we’re going to use it in this project um and it’s going to be this gauge right here so let’s add that one put it right over here we’re going to add two of those let’s just go ahead and add another one while we’re at it we’re going to have them kind of like right here right next to each other the first one and these ones are really good for kind of looking at these kind of surveys and I don’t get to work with surveys enough but we can see you know how happy are they in terms of work life balance so we can add that we’re going to add work life balance um and right now it’s doing a count and if we don’t have minimum or maximum values in there yet so it’s going to look kind of weird but we’re going to look at the average rate or the the average score of these then we’re going to pull this over to the minimum value and we want to put that at the minimum and pull this over and add the maximum value so now it actually has 0 to 10 and it shows that the average person is happy with which one was this their average person is happy with their work life balance uh they rate about a 5.74 overall now let’s really quickly change the title of this because this is ridiculous I want to say happy with work life balance and this is their rating uh you know change it to whatever title you want that’s what I’m going to do and we’ll also do happy with their salary let’s click on salary we’ll add that to minimum and we’ll add the maximum value as well to make sure that we know how to use that and then we’ll take the average so not many people are happy with their salary I’m just finding out I mean this is a real survey this is real data so I mean it’s uh pretty interesting let’s go to the title let’s go to happy with or maybe it’s happiness happiness with salary maybe that’s what we should make it and I’m going to change that over here as well I think it sounds better some of this I’ve already planned out some I haven’t this is not something I’ve planned out so uh so we’re going to say happiness with work life balance happiness with salary really interesting um we may go back and tweak these just a little bit in the future but the very last visualization that we’re going to do is male versus female kind of got to have that in there um I don’t typically like py charts and donut charts but you know I’m feeling I’m just feeling it so let’s try it um and we will do see let’s make this larger so we have male female and what do we want to look at like what do we want to measure so we have male versus female we can measure anything um but maybe what we’ll do is the average salary again I mean we’ve kind of only looked at salary once in this one right here um and a little bit of like how happy they are but we’ll look at the average salary between males and females and then we’ll look at not the current age Oops I meant average salary and then we’ll look at the average and it looks like the average salary is actually really close versus males versus females 55 for female versus 53 for male so actually the females are a little bit higher congratulations so they’re just a little bit higher in terms of pay so now we need to start organizing all of this cleaning it up making it look a lot better than it does right now it looks great uh you know but we can do a lot more with this so I’m gonna I’m we’re we’re going to keep these or all these kind of over on this left hand side I’m going to put this I want this up here we also need to change that title I want this up here um and again we’re GNA kind of change the theme as we go I just want to format it right we have it just like this let’s change the title of this let’s go to title and we’re going to say country of survey takers uh I’m not the the survey takers I’m not really stuck on that if you find something better you think of something better I would go with that but um you know it definitely doesn’t look bad and where did this where did my other visualization go there it goes um I think this one I want to make kind of more tall um so I might move it this this way jeez this is such a I hate I hate having a lot of visualizations on here it just really is annoying to me so what we’re going to do I think we’re gonna step this to the side put this to the side as well I want to make it to where it’s just okay I didn’t want it to cut off we’ll do that might make these make these a little bigger actually so I want it to kind of match the size like right there match this perfect this one I kind of want to bring over here and bring it down a little bit maybe something like this maybe I’m not sure I’m not I’m not sold on that um I added a few different visualizations that I didn’t have in my original so now I’m kind of having to do this on the fly so um I might fast forward some of the parts where I’m like really thinking about it or taking too much time on it but I’m going to bring this down a little bit actually because I don’t like how close that is to um the the text above it but one thing we do need to do I’m going to put this up kind of like this I think that looks fine I think I’m going to put this at the very bottom so let’s make some room for it all right just like that stretch it to the side and we’ll lower it and I think we’ll keep that as is kind of like this um okay there’s a lot going on in here and there are some things I’m just noticing as we’re walking through this that I kind of missed um like I need to change some titles and stuff like that so let me go ahead and change some of those things so we’re going to do title do average salary by gender or by sex do like that average salary by sex I also don’t like that it’s in the middle um I don’t like that it’s on the outside I want them on the inside for this so let’s go to the details let’s go to inside and see if that looks any better oh that looks terrible um let me see if I can change that maybe I don’t no I definitely want it um I guess we’ll do outside I you can’t even see the information oh the decimal is crazy long um let me go and see if I can change that decimal to just like a whole number or like 1.1 uh because that’s a problem so maybe I need to go over here to the value all right so I think I want to change this one it’s just not working out exactly how I wanted and you guys know if I make mistakes I’m going to keep it in here so you guys can see it I I hoped that this was going to turn out better but it didn’t um one that I do want to add because this is kind of a a breakdown and a nice visualization I want to add this difficulty piece so I want to add this how difficult was it for you to break into data science let’s get rid of these and I want to click on this really quickly see what it gives us um values okay so now this shows us percentages um of how easy it was again it’s neither easy nor difficult difficult easy very difficult very easy these numbers make absolutely no sense we need to kind of order them a little better so I’m going to come over here to slices we have our colors over here we want very difficult to be like the most difficult um so we’re going to make that red and then we want difficult to be maybe like an orange let’s see if we can find an orange there we have an orange this does not look red enough there we go oh no no no very difficult is red difficult is orange we have neither easy nor difficult and that’s kind of a neutral um let’s see if we have something neutral in here kind of like this yellow I don’t know let’s try it out then we have easy and very easy and these will be like our Blues so I’m going to keep that um I’m going to keep that kind of like a dark blueish and then our blue for super easy is just going to be like really blue U and that doesn’t look bad the I mean look I’m I’m not a color person I I’m not great with colors and we’re going to kind of organize this in just a little bit but this looks better to me um but we need to change up some stuff as well like the title need to do difficulty to break into Data there we go and we’re also going to change this title right here we’re just say difficulty difficulty difficulty this looks better to me um again not perfect and there’s a thousand different things you could have done but that’s just what we’re going to do I need to go through here and see what I need to change so right off the bat I can see I need to change this um to let’s see right here I’m going to rename this job title just like we did in this one right here uh count of Voters that’s fine programing language breaking into difficulty happiness happiness average count okay okay so what we have here is very close to a finished product now it’s not 100% complete I mean I I do want to make it look a little nicer rather than just the typical white so what we’re going to do we’re going to go up here we’ll go to uh what is it View and we have all these different filters and we’re just kind of play around with it see if we can find something that we like um this doesn’t look too bad it’s uh not really my style u we can do this one Frontier this is pretty neat I kind of am digging this we might come back to it I like the natural tones I don’t know why I said tones like that but I did um this one’s not bad but I don’t I don’t that’s not that’s not my I don’t like how dark that is um and so maybe it’s like you know uh we change like the background color of all of these as well as match it with um match it with something else whatever you want genuinely you customize this however you want I kind of like this one it’s kind of groovy man and um it’s not perfect by any means but what we can do and we can customize this current theme we can come in here customize this theme however we’d like I personally don’t want color five which is the data analyst color I don’t like it to I don’t want to go out go and change it because I don’t like it but I don’t really like that color per se you know I might want to choose a different color um but it has to be like this muted like it has a style to it so you can come in here and you can customize this and make it however you’d like and and really mess around with it play around with it for me uh I’m just going to keep it how it is because I don’t really want to mess with it and break it or anything like that so um let me just that up just a tiny bit so this is it this is the project I hope that it was helpful um I am not joking when I say that I’m because I’m gonna do a different project I’m gonna go really in depth in another project it’s probably GNA be like a two hour project it’s gonna be crazy long um well for a YouTube video but I can see doing a thousand different things with this data creating a really great dashboard really cleaning cleaning the data which is a large part of of actually doing this and we didn’t do much data cleaning at all there’s just so much you can do with this and so really dig into this see what you like see what you don’t like see what you want to clean what you don’t want to clean you could put it in SQL you could put it in um Excel and just and just standardize the data to make it a lot more usable do whatever you want with it I mean I I took this survey for you guys that we could use it so go out and use it and make the best dashboard that you can POS possibly do so I hope that this was helpful I hope that you enjoyed this thank you so much for watching this video If you like this thank you so much for watching if you like this video be sure to like And subscribe below and I’ll see you in the next video [Music]

By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog

Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
March 4, 2025
Iqbal: Faith, Nation, and the Modern Muslim by Maulana Maudoodi
This text comprises excerpts from an interview and lecture discussing the life and legacy of Allama Iqbal, a prominent Muslim figure in early 20th-century India. The speaker analyzes Iqbal’s impact on Indian Muslims during a tumultuous period marked by political and religious upheaval, highlighting Iqbal’s efforts to combat Western influence and foster a strong sense of Muslim identity and self-reliance. The sources also address misinterpretations of Iqbal’s views, particularly claims that he was a socialist, and emphasize his unwavering commitment to Islam. Furthermore, the text explores Iqbal’s profound spirituality and personal piety, contrasting his public image with his private life of devotion and simplicity. Finally, the speaker urges listeners to uphold Iqbal’s vision of a strong, unified Muslim community.

A Deep Dive into the Thought of Allama Iqbal: A Study Guide

Quiz

Instructions: Answer the following questions in 2-3 sentences each, based on the provided source material.
1. According to the text, what was the state of Muslims in India between 1924 and 1938, and what caused this state?
2. How did Muslims react to the failure of the Khilafat movement, according to the source?
3. What is meant by “Maghribiyat” in the context of the text and why did Iqbal oppose it?
4. What did Iqbal believe was the root cause of the Muslims’ problems?
5. What did Iqbal mean when he said that the nation is made by faith?
6. What was Iqbal’s view on the relationship between religion and politics?
7. According to the source, what did Iqbal advocate as a solution to the problems faced by the Muslims of his time?
8. Why does the text assert that Iqbal was not a socialist or believer in “Islamic socialism”?
9. According to the text, what was the role of Allama Iqbal and Quaid-e-Azam in the creation of Pakistan?
10. How did Iqbal’s understanding of Islam deepen over time, as described in the text?
Quiz Answer Key
1. The text describes a period of crisis for Muslims in India between 1924 and 1938. Muslims faced disappointment and defeat after the failure of the Khilafat movement. This led to a loss of faith in their leadership and a state of despair.
2. The failure of the Khilafat movement led to severe disappointment among Muslims who had invested everything in it. Many lost their faith in the leadership that had promoted the movement, and were also left feeling disillusioned and betrayed.
3. “Maghribiyat” refers to the influence of Western culture and philosophy. Iqbal opposed it because he believed it was causing Muslims to abandon their own traditions and culture.
4. Iqbal believed the root cause of the Muslims’ problems was their loss of self-recognition. Muslims had become ashamed of their own praise, culture, religion and morals, believing instead the West had superiority.
5. Iqbal emphasized that a nation is made by faith rather than by nation or language. He wanted Muslims to see themselves as a unified community with shared beliefs and culture, distinct from other communities.
6. Iqbal believed that politics can only be good when guided by God. He stressed that separating politics from faith would lead to barbarism and cruelty.
7. According to the text, Iqbal advocated for Muslims to follow the Quran and implement the principles of Islam in their lives. He believed that only through Islam could Muslims overcome their problems.
8. The text emphasizes that Iqbal’s emphasis was on the implementation of Islam and not a hybrid of socialism and Islam. According to the source, while he may have used the term “Islamic Socialism” he didn’t preach it, and there’s no evidence that he believed it.
9. The text indicates that Iqbal gave Muslims the vision for Pakistan through his emphasis on Islam and a separate identity. Quaid-e-Azam then brought the vision into reality by creating the actual state.
10. The text asserts that Iqbal’s understanding of Islam deepened over time and became his sole focus. In the later phases of his life, he became immersed in the Quran. He would not keep any other book in front of him, using it as the basis for all of his thoughts and actions.
Essay Questions

Instructions: Answer the following essay questions based on the provided source material. Each essay should demonstrate a comprehensive understanding of the text and be 3-4 paragraphs in length.
1. Analyze the complex relationship between the Khilafat movement, Hindu-Muslim relations, and the subsequent disillusionment of Muslims in India as described in the provided text. How did these events shape Allama Iqbal’s thinking?
2. Discuss Allama Iqbal’s critique of Western civilization and the concept of “Maghribiyat,”. How did his experiences and perspectives inform this critique, and what solutions did he propose to counteract it?
3. Explore Iqbal’s concept of Muslim identity and his views on nationalism and faith. How did he advocate for a distinct Muslim identity, and why was it crucial, according to the text, to preserve that identity?
4. Examine the text’s discussion of Iqbal’s philosophy, particularly his view on the relationship between politics and religion and what he saw as the failings of contemporary Muslim leadership.
5. Evaluate the text’s portrayal of Allama Iqbal’s evolution as a thinker, from his exposure to Western education to his complete immersion in the Quran. How does this journey inform our understanding of his overall message?
Glossary of Key Terms
- Khilafat Movement: A movement in India (1919-1924) to support the Ottoman Caliphate which was led by Indian Muslims, as they saw the Caliphate as a symbol of pan-Islamic unity.
- Maghribiyat: The influence and adoption of Western culture, philosophy, and values. Iqbal saw this as a form of cultural imperialism that Muslims should reject.
- Nazm: (Often in reference to Iqbal’s writing) Poetry or verse, often used in this text to describe the type of work he produced.
- Tahrir: In this context, the movement to restore the Caliphate, and liberate Muslim Holy Places from foreign control.
- Mokama: (Likely a mispronunciation, perhaps of Mecca) The Holy city of Islam.
- Namazi: (Also spelled ‘Namaz’) The Islamic practice of prayer.
- Roza: Islamic fasting, typically during the month of Ramadan.
- Shariat: Islamic law, derived from the Quran and the teachings of the Prophet Muhammad.
- Maulvis: (Also spelled ‘Maulvi’) A Muslim religious scholar, particularly one who is well-versed in Islamic law.
- Ulema: (Also spelled ‘Ulama’) Muslim religious scholars.
- Hadith: Sayings and actions of the Prophet Muhammad, used as a source of guidance in Islamic law and theology.
- Agyaats: (Likely a mispronunciation, likely ‘Agni’, which means ‘fire’ or the fire worshippers) A reference to Hindu people in a derogatory way.
- Kalimi: (Also spelled ‘Kalima’ or ‘Kalime’) An Arabic term referring to Islamic declaration of faith.
- Faqr: In this context, the state of being devoted to God and independent of worldly desires, in the way that a true fakir lives.
- Quaid-e-Azam: An honorific title for Muhammad Ali Jinnah, the founder of Pakistan, meaning “Great Leader.”
- Pakistan: In this context, meaning the creation of a separate and independent Muslim state in India, founded on the concept of distinct Muslim culture and community.
- Madrasahs: Islamic religious schools.
- Khatib hazrats: Islamic preachers or orators
- Amrit, Naaziyat, First Year: References to specific ideologies that are criticized in the text. They represent Western/European forms of governance that the text argues are not aligned with the principles of Islam.
- Mustfair: In this context, a place for residence.
- Akliat: A person’s intellectual ability.
- Wala Jana: Devotion and affection to the Prophet Muhammad’s family.
Iqbal: Islamic Revival and the Creation of Pakistan

Okay, here is a detailed briefing document reviewing the main themes and important ideas from the provided text:

Briefing Document: Analysis of Iqbal and His Impact

Introduction:

This document analyzes a series of excerpts focusing on the life, works, and impact of Allama Muhammad Iqbal (Rahmatullah Alaih). The sources provide insights into the socio-political context of Iqbal’s era, his intellectual contributions, and his enduring legacy, particularly in relation to the identity and destiny of Muslims in India. The excerpts cover a variety of perspectives on Iqbal, exploring his views on Islam, nationalism, Western influence, and the importance of self-awareness.

Key Themes and Ideas:
1. The Critical Period of Muslim History in India (1924-1938):
- The period was marked by the failure of the Khilafat Movement, which left Muslims disillusioned and vulnerable. Muslims had “invested all their wealth in the Khilafat” and “left no stone unturned in uniting…with those Hindus…only on the hope that somehow we will be able to save the institution of Khilafat.”
- The Congress and Hindu leaders, with whom Muslims had allied, turned against them, leading to Hindu-Muslim riots and a “double defeat” for the Muslims. They had trusted Gandhi “the most” but he “never had the opportunity to open fire on us Muslims on this issue against the Hindu castes.”
- This resulted in “severe disappointment” and a loss of faith in the existing leadership, leaving the Muslim community in a state of despair and questioning their future. “Muslims lost their faith in this leadership which had raised the issue of Tahrir and had joined hands with Congress.”
1. The Rise of Anti-Islamic Trends:
- The period saw a rise in anti-religious sentiment among Muslims, with open criticism of Islam and its teachings. There was a shift where people felt those who prayed “should be ashamed of his actions, and the one who is not doing so need not be ashamed.”
- The influence of Communism and Western ideologies impacted Muslim education, promoting secular and anti-religious ideas.
1. Iqbal as a Force for Islamic Revival:
- Amidst the turmoil, Iqbal emerged as a powerful force for Islamic revival and preservation of Islamic and religious values. He was seen as the “greatest power…for the Islamic Tariq Islamic Tehri for the call of Islamic passion” during the 14-year period from 1924-1938.
- He attacked Western culture (“Maghribiyat”), including “female chauvinism”, effectively challenging its dominance over the Muslim mind, while addressing its appeal from the perspective of a man fully familiar with western culture. He “knew more about the west than them and was more aware of the philosophy of the west and the western life than them.”
- He aimed to break the “mental slavery” of Muslims, encouraging them to recognize their own worth, heritage and the fact that “you are the most powerful person in the whole world.” They had become ashamed of their own traditions, religion, morals, thinking that “if there is anything worth praising in the world, then it has been presented only by the people of the Maghreb.”
- He emphasized that Islam’s principles are relevant in every era and not an outdated system, stating that “Islam is ancient and the arrival of the prophet, Islam can never become old, its principles are worth implementing in every era.”
1. Iqbal’s Philosophy of Self-Recognition (Khudi):
- Iqbal urged Muslims to recognize their own identity, culture, and religious values. He created the feeling that “you have lost yourself and have turned your reality around, understand your comic task, and implement your culture at your home for the sake of its height”.
- He challenged the notion that Muslims should be ashamed of their heritage, emphasizing the uniqueness and strength of Islamic culture. He taught that “nation is made by faith and our country” not “nation and language”.
- He aimed to counter the feeling that “the work of the people of the world is to just chant Allah Allah or read the Quran and Hadith in mosques and madrasas” and instead, asserted that there should be no separation of “politics from day,” because “the result of this is there can be no other explanation except barbarism and cruelty.”
1. Iqbal’s Critique of Nationalism and Patriotism:
- He critiqued the concept of nationalism, arguing that it could lead to the dissolution of Muslim identity by saying, “the nation too is a ghost and the condition of the nation is doubtful.” He rejected that “there is no threat to your nation from your nation”.
- He emphasized the importance of Islamic unity, countering communalism and the conflicts that divided Muslims.
- He instilled a sense of “Islamic community” (Ummah) in Indian Muslims, laying the groundwork for the creation of Pakistan. “If this rigidity had not been done at the time…then this Pakistan would not have existed today.”
1. Iqbal’s Views on Politics and Religion:
- He argued for the integration of religion and politics, suggesting that politics without a moral compass is destructive, “politics can be good only when God is present with it as a guide to keep it on the right path.”
- He rejected the idea that Islam was a source of backwardness, stating that the problems of the era arose because of a flawed understanding and application of Islam, and that “all the oppression, tyranny, deceit, poor and humanity that is being cried for, is all the work of these Islam.”
- He believed that the solution to the problems faced by Muslims lay in the implementation of Islamic principles. “If there is any solution to the problems of the Muslims, then it is only in the implementation of the Islamic principles, then it is in me.”
1. Iqbal’s Stance Against Socialism:
- The source addresses the claim that Iqbal was a socialist. It argues that such an interpretation is a misrepresentation of his work, which was consistently focused on Islamic principles. “He was never convinced that by adopting anything with Islam or anything with Islam, we can be saved.”
- It explains that his use of the term “Islamic Socialism” was incidental and not an endorsement of the political system, but rather an assertion that Islam encompasses social justice. “There is no need to go towards any socialism for shruti and justice, all this is present in Islam also, rather it would be more correct to say that it is present only in Islam.”
- The source argues that Iqbal’s poetry and writings were often interpreted incorrectly, specifically citing his couplet about burning fields as a metaphor for divine justice, not a call to action for humans. “The sequence of words was that Allah Taala is ordering his angels that the oppression and cruelty that is going on in the world is inviting our punishment.”
1. Iqbal’s Devotion to Islam and the Quran:
- The document emphasizes Iqbal’s deep devotion to Islam, particularly during the final phase of his life. It notes his shift towards a more Quran-centric approach, that “in the last phase, Iqbal had separated all the books from the Quran and he would not keep any other book in front of him.”
- He saw the Quran as the ultimate source of wisdom and guidance, and he approached life and philosophy through its lens. “Whatever he thought, whatever he saw, he saw it from the point of view of the Quran.”
- His devotion to the Prophet Muhammad was profound and unquestioning.
1. Iqbal’s Legacy and Pakistan:
- Iqbal’s vision was instrumental in the creation of Pakistan, which was founded on the idea of a separate Islamic identity. It is said that “Iqbal ( may Allah have mercy on him) gave you a country on the basis of this. He gave you concern and vision.”
- The document warns against deviating from the founding principles of Pakistan, emphasizing the importance of maintaining its Islamic foundation. “If the basic vision of this country or in other words the foundation of its vision or its vision is removed, then this country cannot survive.”
- It calls on the Muslim community to unite and uphold the principles of Islam.
Conclusion:

These sources present a multifaceted view of Allama Iqbal, emphasizing his role as a catalyst for Islamic revival and self-awareness among Muslims in India. The text stresses that he fought Western cultural dominance, promoted the idea of a separate Muslim identity and community, and laid the intellectual foundation for the creation of Pakistan. The sources also highlight the importance of understanding Iqbal in his full complexity and not to reduce his message through simplistic interpretations. His deep love of the Quran and his devotion to Islam are emphasized, as well as his rejection of socialism as a separate doctrine from Islam. The enduring significance of his vision for Muslims globally is also emphasized.

Allama Iqbal: Life, Thought, and Legacy

Frequently Asked Questions about Allama Iqbal
1. What were the key challenges faced by Muslims in India between 1924 and 1938, the period during which Allama Iqbal was particularly active?
2. During this period, Indian Muslims experienced significant disillusionment and challenges. They had invested heavily in the Khilafat Movement, hoping to preserve the institution of the Caliphate and protect Muslim holy sites. However, their efforts were ultimately unsuccessful. Furthermore, they faced increasing hostility from Hindus and the Congress party, with whom they had previously cooperated, leading to a series of Hindu-Muslim riots. This resulted in a sense of betrayal and a loss of faith in their leadership, coupled with rising internal discord, a perceived threat of Hindu dominance, and the spread of Western and communist ideas which challenged traditional religious practices and beliefs.
3. How did Allama Iqbal respond to the challenges faced by the Muslims of India?
4. Allama Iqbal emerged as a powerful voice against the prevailing despair. He actively worked to revive Islamic fervor and self-respect among Muslims. He did this primarily through his poetry and philosophical writings, attacking Western culture and its influence on Muslims, which he saw as a form of mental slavery. He sought to reawaken a sense of Islamic identity, pride in their heritage, and the belief that Islam was a viable and relevant way of life for the modern era. He emphasized that a Muslim’s strength was in their own culture, religion, and morals, not by emulating the West. He stressed that Islam was not an outdated system, but a timeless truth relevant to any era.
5. What was Allama Iqbal’s view on nationalism and how did it relate to his concept of the Muslim community?
6. Iqbal strongly critiqued the concept of territorial nationalism, arguing that it was a “ghost” and a “doubtful condition.” He asserted that a nation is not defined by territory or language, but by faith and shared culture. He emphasized that Muslims, due to their shared beliefs and culture, formed a distinct community (or Ummah) separate from other communities, including Hindus. This viewpoint was meant to counter the idea of Muslims being absorbed into a larger Indian national identity and is often seen as a key step towards the eventual demand for a separate Muslim state.
7. How did Allama Iqbal view the relationship between Islam and politics?
8. Iqbal believed that politics divorced from religion was dangerous, leading to barbarism and cruelty. He argued that politics must be guided by God and that the contemporary problems plaguing humanity were a result of such separation of politics and faith. He rejected the notion that Muslims should confine themselves to religious practices alone, with no engagement in political matters, as he saw Islamic principles as applicable to all aspects of life, including governance. In essence, he advocated for a political order guided by Islamic principles and values.
9. What was Allama Iqbal’s view of western thought and philosophy and why did he criticize it?
10. While deeply knowledgeable about Western philosophy and culture, Iqbal strongly critiqued it. He believed that its dominance over Muslims was leading to a loss of their own cultural identity and values, in turn causing them mental and spiritual enslavement. He specifically criticized western materialism, secularism, and what he viewed as its corrupting influence on morality. He sought to expose the flaws of Western civilization and its incompatibility with Islamic values, motivating Muslims to return to their own heritage for solutions. He believed that a society based solely on secularism was doomed to fail.
11. How did Allama Iqbal’s view of Islam influence the idea of Pakistan?
12. Allama Iqbal is considered a key intellectual figure behind the idea of Pakistan. He believed that Muslims could not preserve their culture and identity within a united India where the Hindu majority was increasingly dominant. His 1930 speech, while not explicitly using the word “Pakistan,” laid out the foundation for a separate Muslim state where Islamic principles could guide society, providing Muslims with the space needed to safeguard their identity and culture.
13. Was Allama Iqbal a socialist, and what does the source say about this claim?
14. The sources strongly refute the idea that Allama Iqbal was a socialist, either of a Western or Islamic variety. While he occasionally used terms like “Islamic Socialism,” this was to make the point that the justice and social concern that they claim to address are found within Islam, but are superior as God is their basis. The sources argue that attributing socialism to him is a misrepresentation of his lifelong commitment to promoting Islam. He did not develop or preach a systematic socialist ideology but rather emphasized Islamic principles and values as the solution to the issues of his time. His criticisms of injustice should not be confused with advocating socialism.
15. What was the importance of the Quran in Allama Iqbal’s life and thought?
16. The sources depict the Quran as the absolute center of Iqbal’s life and thought, especially towards the end of his life. It’s described that he distanced himself from all other books, finding that the Quran contained all wisdom. He interpreted everything from a Quranic perspective. His actions were seen as an attempt to live a life according to its principles, and he had deep devotion and unwavering faith in the teachings of the Quran and Prophet Muhammad’s teachings, even if it went against the conventions of his era. His approach was to live and act in line with Quranic teaching and the actions of the Prophet.
Iqbal: Reviving Islamic Identity

Allama Iqbal’s life was marked by his efforts to revitalize Islamic thought and identity in the face of various challenges, particularly during the period of British colonial rule in India.
- Historical Context: From 1924 to 1938, Muslims in India experienced a critical period, marked by the failure of the Khilafat movement and increasing Hindu-Muslim tensions. Muslims faced a “double defeat” with the collapse of the Khilafat and attacks from those they had allied with. This period also saw a rise in Western cultural influence and criticism of Islam, leading to a sense of despair and a loss of faith among Muslims.
- Iqbal’s Response to the Crisis: In response to this, Iqbal emerged as a powerful force for the revival of Islamic spirit and values. He aimed to combat the mental slavery and feelings of shame that had gripped the Muslim community, encouraging them to recognize their own worth and the value of their culture, religion, and morals. He emphasized the timeless relevance of Islam and its principles, and challenged the notion that it was outdated or incompatible with the modern world.
- Iqbal’s Critique of Western Culture: Iqbal was critical of the influence of Western culture (“Maghribiyat”), which he saw as a threat to the Muslim identity. He attacked what he perceived as the negative aspects of Western civilization, including materialism and a focus on nationalism at the expense of religious identity. He also criticized Western politics.
- Iqbal’s Focus on Islamic Identity: Iqbal emphasized the importance of a distinct Muslim identity based on faith and culture. He argued that Muslims were a unique community with their own beliefs and traditions, separate from other groups in India. He stressed the concept of Islamic unity, countering communalism and divisions within the Muslim world. He worked to instill a sense of Islamic pride and purpose in Muslims, particularly the youth.
- Iqbal’s Philosophy and Vision:
- Iqbal’s philosophy was centered on the idea of self-realization for Muslims, urging them to understand their true selves and their potential. He believed that Muslims had lost sight of their own heritage and had become overly influenced by Western thought.
- He advocated for the implementation of Islamic principles in all aspects of life. He believed that the solution to the problems faced by Muslims was in adhering to the Quran and the teachings of Islam.
- He emphasized that political freedom was not the ultimate goal, but rather the protection of Islam and the ability for Muslims to live according to its principles. He was a proponent of a separate and independent Muslim state, which ultimately led to the idea of Pakistan. He believed that Muslims could not maintain their culture while living with Hindus.
- Iqbal’s Later Life: In his later years, Iqbal increasingly focused on the Quran, using it as his primary source of knowledge and guidance. He rejected any form of non-Islamic viewpoints. He also emphasized the importance of following the example of the Prophet Muhammad. He was critical of those who saw Islam as a source of sorrow and instead believed it to be a source of guidance and truth.
- Iqbal’s Legacy:
- Iqbal’s work was instrumental in shaping the intellectual and political landscape of the Muslim community in India. He is credited with inspiring the creation of Pakistan, with the vision of the country coming before the actual formation.
- His poetry and writings are known for their depth and powerful articulation of Islamic ideals. He used his art to promote Islamic values and challenge the status quo.
- He is considered a key figure in the revival of Islamic thought and the development of a modern Muslim identity. He believed in the importance of action and the implementation of Islamic principles in the world.
Iqbal’s life can be seen as a struggle against cultural and political subjugation, and his lasting legacy lies in his passionate defense of Islamic values and his vision for a vibrant and self-aware Muslim community. He is seen as a figure who used his education, including his knowledge of Western thought, to advocate for the importance of Islam and Muslim identity.

Muslim Disillusionment in India (1924-1938)

The sources describe a period of significant disappointment for Muslims in India, particularly between 1924 and 1938. This disappointment stemmed from a combination of political setbacks, social challenges, and a perceived crisis of faith.
- Failure of the Khilafat Movement: Muslims had invested considerable resources and effort in the Khilafat movement, aiming to protect the institution of the Caliphate and Muslim holy places. The ultimate failure of this movement was a major blow, leading to a sense of disillusionment. The Khilafat, which they had tried to save, was ruined, and the residents of the holy places became divided and engaged in conflict.
- Betrayal by Allies: Muslims had allied with Hindus and the Congress party during the Khilafat movement. However, after the movement’s collapse, they faced attacks from their former allies, leading to Hindu-Muslim riots. This betrayal contributed to their disappointment, as they had trusted leaders like Gandhi, who did not stand up for the Muslims against oppression.
- Double Defeat: Muslims experienced a “double defeat,” having failed to achieve their goals in the Khilafat movement and facing hostility from those with whom they had allied. This left them in a state of despair and broke their courage.
- Loss of Faith in Leadership: The disappointment led to a loss of faith in the leadership that had advocated for the Khilafat and allied with Congress. Muslims felt that their leaders had failed them, contributing to a sense of being lost and without direction.
- Fear for the Future: There was a widespread fear that non-Muslims were working to completely occupy India, while Muslims were ill-prepared to face the situation. This fear further intensified their sense of disappointment and helplessness.
- Internal Crisis: In addition to the political and social challenges, Muslims also faced an internal crisis. There was a rise in open criticism of Islam and a decline in religious observance. People began to question the value of traditional practices like prayer and fasting, and some felt ashamed of their religious identity.
- Influence of Western Culture: The rise of Western culture and communism influenced the education of Muslims, and religious texts began to be openly challenged. This further contributed to the sense of crisis and the weakening of traditional values and faith.
- Political Disunity: Muslim leaders were also in disarray. Those who had previously defended Islam either became silent or became opponents of the Muslims, and some abandoned the path of inviting people to Islam for inviting them to community and religion. This lack of unified and effective leadership added to the community’s challenges.
In the midst of this widespread disappointment and despair, Allama Iqbal emerged as a powerful figure, working to revive the Islamic spirit and address the root causes of Muslim disillusionment. He challenged the mental slavery imposed on Muslims and urged them to recognize their own value and potential, aiming to restore their faith in themselves and their religion.

The Khilafat Movement: Failure and Disillusionment

The Khilafat Movement was a significant effort by Muslims in India to protect the institution of the Caliphate and Muslim holy places, but it ultimately ended in disappointment. The movement’s failure, coupled with other factors, led to a period of disillusionment and crisis for the Muslim community.

Here are the key aspects of the Khilafat Movement:
- Goal: The primary goal of the Khilafat Movement was to save the institution of the Caliphate (Khilafat) and to liberate Muslim holy places from what they perceived as the clutches of the enemy. Muslims invested significant resources and efforts into this cause.
- Muslim Investment: Muslims dedicated their wealth and lives to the Khilafat movement. They spared no effort in their attempt to save the Khilafat and free their holy places. They united with Hindus, despite historical differences, hoping that this alliance would help them achieve their goals.
- Alliance with Hindus: Muslims, putting aside centuries of experience and feelings regarding Hindus and their relationship with Islam, united with them, on the hope of saving the Khilafat and freeing their holy places. They even trusted leaders like Gandhi and made him their leader.
- Failure and Disappointment: Despite their efforts, the Khilafat Movement ultimately failed. The institution of the Khilafat, which they had fought to protect, was ruined. The residents of the holy places became divided, engaging in conflict and animosity among themselves.
- Double Defeat: The failure of the Khilafat Movement was a major blow to the Muslims, leading to what is described as a “double defeat”. Not only did they fail to achieve their goals, but they also faced attacks from the Hindus and the Congress party with whom they had allied.
- Betrayal and Riots: After the collapse of the Khilafat movement, the Congress and Hindus, with whom the Muslims had allied and fought, turned against them, leading to a series of Hindu-Muslim riots beginning in 1924. The leaders of the Congress did not address the oppression faced by the Muslims.
- Loss of Faith: The movement’s failure led to a significant loss of faith among Muslims, both in their leadership and in the alliances they had formed. They were disappointed by the outcome of their efforts and by the betrayal of their former allies. This left them in a state of despair and broke their courage.
The Khilafat Movement’s failure was a major factor contributing to the disappointment and disillusionment experienced by Muslims in India during the 1924-1938 period. The collapse of the movement, along with the subsequent betrayal by former allies, created a crisis of faith and identity among Muslims, which Allama Iqbal sought to address through his work.

Iqbal’s Islamic Revival in India

The sources describe an Islamic revival led by Allama Iqbal in response to a period of significant disappointment and crisis for Muslims in India. This revival was marked by a renewed emphasis on Islamic identity, values, and principles, and a rejection of Western cultural and political dominance.

Key aspects of this Islamic revival include:
- Context of Crisis: The revival occurred in the context of the failure of the Khilafat Movement, which left Muslims disillusioned and facing attacks from former allies. There was a widespread sense of despair, a loss of faith in leadership, and a fear for the future. Additionally, Western cultural influence and criticism of Islam led to a questioning of traditional values and practices.
- Iqbal’s Role: Allama Iqbal emerged as a key figure in this revival, working to counter the mental and spiritual decline of the Muslim community. He aimed to restore their sense of self-worth, pride in their heritage, and faith in Islam. He used his knowledge of both Islamic and Western thought to address the challenges faced by Muslims.
- Emphasis on Self-Realization: Iqbal’s philosophy focused on the idea of self-realization for Muslims, encouraging them to recognize their true potential and identity. He argued that Muslims had lost sight of their own heritage and had become overly influenced by Western thought and culture.
- Rejection of Western Culture: Iqbal was critical of Western culture (“Maghribiyat”), which he saw as a threat to Muslim identity. He attacked the materialism and perceived negative aspects of Western civilization, including Western politics. He also spoke out against what he saw as the negative influence of Western ideas on Muslim women.
- Focus on Islamic Identity: Iqbal emphasized the importance of a distinct Muslim identity based on faith and culture. He argued that Muslims were a unique community with their own beliefs and traditions, separate from other groups in India. He stressed the concept of Islamic unity, countering communalism and divisions within the Muslim world. He worked to instill a sense of Islamic pride and purpose, particularly in the youth.
- Timeless Relevance of Islam: Iqbal stressed the timeless relevance of Islam and its principles, challenging the idea that it was outdated. He argued that Islam’s principles were applicable in every era. He believed that the solution to the problems faced by Muslims lay in adhering to the Quran and the teachings of Islam.
- Political Vision: Iqbal also had a political vision. He believed that Muslims could not maintain their culture while living with Hindus in India. This view led to his advocacy for a separate and independent Muslim state, which ultimately contributed to the idea of Pakistan. He saw the need for a country where Muslims could live according to the principles of Islam.
- Critique of Nationalism: He challenged the concept of nationalism, arguing that it was a “ghost” that could dissolve Muslims into the larger Hindu community. He emphasized that the basis of a nation should be faith, not language or territory.
- Return to the Quran: In his later life, Iqbal increasingly focused on the Quran, using it as his primary source of knowledge and guidance. He is described as having separated all other books from the Quran, dedicating himself to understanding and living by its teachings.
- Legacy of Revival: Iqbal’s work was instrumental in shaping the intellectual and political landscape of the Muslim community in India. He is credited with inspiring the creation of Pakistan, and his work is viewed as essential to the formation and survival of the country. His legacy is viewed as a passionate defense of Islamic values and a call for a vibrant and self-aware Muslim community.
Overall, the Islamic revival led by Iqbal was a comprehensive movement that sought to address the challenges faced by Muslims in India through a renewed focus on their faith, culture, and identity. His emphasis on self-realization, Islamic unity, and the timeless relevance of Islam had a profound impact on the Muslim community, and his ideas continue to be influential today.

Iqbal’s Philosophy: Self-Realization and Islamic Revival

Allama Iqbal’s philosophy was a comprehensive response to the challenges faced by Muslims in India during a period of significant crisis and disappointment. His philosophy aimed to revitalize the Muslim community by emphasizing self-realization, a return to Islamic principles, and a rejection of Western cultural dominance.

Here are the key components of Iqbal’s philosophy:
- Self-Realization (“Khudi”): A central theme in Iqbal’s philosophy is the idea of self-realization. He believed that Muslims had lost sight of their true potential and had become ashamed of their own culture, religion, and morals. He argued that Muslims had been subjected to a form of “mental slavery” by adopting Western ideas and values, and he called on them to recognize their own inherent worth and strength. He encouraged them to take pride in their Islamic heritage and to understand their unique role in the world. He stressed that a nation is made by faith and not by language or territory.
- Rejection of Western Culture (“Maghribiyat”): Iqbal was a sharp critic of Western culture, which he saw as a major threat to Muslim identity and values. He attacked the materialism and moral decay that he associated with the West. He argued that Muslims should not blindly adopt Western ways but should instead draw strength from their own traditions and principles. He believed that the dominance of Western culture was a form of slavery that prevented Muslims from recognizing their own worth.
- Timeless Relevance of Islam: Iqbal emphasized the timeless nature of Islam and its principles. He argued that Islam was not an outdated or irrelevant system but a source of guidance and strength that was applicable to all eras. He believed that the solution to the problems faced by Muslims lay in adhering to the Quran and the teachings of Islam. He saw the Islamic system as providing the framework for a just and prosperous society.
- Emphasis on Islamic Identity and Unity: Iqbal stressed the importance of a distinct Muslim identity based on faith and culture. He argued that Muslims were a unique community with their own beliefs and traditions, and they should not be absorbed into other communities. He called for unity among Muslims worldwide, countering divisions and communalism. He also advocated for a political structure that would allow Muslims to live according to Islamic principles.
- Critique of Nationalism: Iqbal was critical of the concept of nationalism, which he saw as a threat to Muslim unity. He believed that nationalism could lead to the dissolution of the Muslim community into the larger Hindu community. He argued that faith should be the basis of a nation, not language or territory.
- Political Vision: Iqbal believed that Muslims could not maintain their culture while living as a minority in India. He advocated for a separate and independent Muslim state where Muslims could live according to Islamic principles. This vision ultimately led to the idea of Pakistan.
- Return to the Quran: In his later life, Iqbal increasingly focused on the Quran as his primary source of knowledge and guidance. He is described as having separated himself from all other books, dedicating himself to understanding and living by its teachings. He believed the Quran contained all the answers for the problems of his time.
- Concept of “Faqr”: Iqbal used the word “Faqr” extensively, which according to him does not mean poverty and puritanism, but having faith in Allah in all circumstances, being self-respecting in front of others, and being humble only before God.
Iqbal’s philosophy was not just a theoretical framework but a call to action. He sought to inspire a sense of purpose and pride among Muslims, urging them to take control of their own destiny and to create a just and prosperous society based on Islamic principles. His work had a profound impact on the Muslim community in India, shaping both the intellectual and political landscape of the time. He is credited with inspiring the creation of Pakistan and is viewed as a key figure in the Islamic revival of the 20th century.

Iqbaaliyaat Audiobook By Maulana Maududi || اقبالیات از مولانا مودودی

Zindagi Baad A Maut book by Maulana Syed Abul-Ala Maududi – Audiobook

Touhid o Risalat by Syed Abul Aala Maududi – توحید و رسالت – Audio Book in Urdu

Deeniyat book by Maulana Syed Abul-Ala Maududi – Audiobook دینیات – سید ابو الاعلىٰ مودودی

Al Jihad Fil Islam by Abul Aala Maududi Chapter 1/7

Al-Jihad Fil Islam by Abul Aala Maududi Chapter 2/7

Al-Jihad Fil Islam by Abul Aala Maududi Chapter 3/7

Al-Jihad Fil Islam by Abul Aala Maududi Chapter 4/7

Al-Jihad fil Islam by Abul Aala Maududi Chapter 5/7

Al Jihad fil Islam by Abul Aala Maududi Chapter 6/7

Al Jihad Fil Islam by Abul Aala Maududi Last Chapter 7/7

By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog

Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
March 4, 2025
Dilip Kumar: Tragedy King, Timeless Legend
This text is a passionate tribute to the Indian film actor Dilip Kumar. It recounts his life, highlighting his humble beginnings and rise to legendary status. The writing emphasizes his acting prowess, his impact on Indian cinema, and his enduring legacy as a symbol of secularism and national unity. The author expresses deep admiration and personal connection to Kumar, sharing anecdotes and reflections on his influence. Finally, the text mourns his passing and celebrates his lasting impact on generations of fans and the film industry.

Dilip Kumar: A Study Guide

Quiz

Answer each question in 2-3 sentences.
1. What is Dilip Kumar’s given name and date of birth according to the text?
2. What are some of the major awards and recognitions Dilip Kumar received, as mentioned in the text?
3. The text mentions Dilip Kumar’s visit to Pakistan in 1988. What did he do or say during that visit?
4. How does the text describe Dilip Kumar’s acting style, particularly his delivery of dialogues?
5. According to the text, what important message did Dilip Kumar convey through his films, beyond mere entertainment?
6. In what ways did Dilip Kumar serve as a unifying figure in India, as described in the text?
7. The text mentions Devdas as an example of a role. What makes Dilip Kumar’s portrayal of Devdas unique, according to the text?
8. The author says that “every movie sung is a masterpiece.” Give one example from the list of films mentioned and explain what about the film makes it a masterpiece.
9. How did Dilip Kumar influence other actors, according to the text?
10. What is the author’s view about the physical state of Dilip Kumar in his old age, and why does it pain the author?
Quiz Answer Key
1. Dilip Kumar’s given name is Mohammad Yusuf Khan, and he was born on December 11, 1922.
2. Dilip Kumar received the Dada Saheb Phalke Award from the Government of India and the Nishan-e-Pakistan Award from Pakistan.
3. During his 1988 visit to Pakistan, Dilip Kumar wrote an article that is now part of a magazine and this trip demonstrated his love of borders.
4. The text suggests Dilip Kumar’s acting style was natural and authentic, with dialogues delivered as if they were heartfelt and not rote. The text describes pearls falling from his body due to the heat of his liver.
5. Beyond entertainment, Dilip Kumar’s films conveyed messages of humanity and love, acting as a voice for universal principles of goodness and the fight against evil.
6. Dilip Kumar acted as a bridge between communities by erasing Hindu-Muslim differences and being a secular human being and championing the true role of a bridge.
7. The text emphasizes the unmatched quality of Dilip Kumar’s portrayal of Devdas, suggesting that the novel might as well have been written after seeing him.
8. Mughal-e-Azam is a masterpiece because it demonstrates how Dilip Kumar infused his roles with so much life, thus leaving a lasting impact on art enthusiasts.
9. Dilip Kumar influenced other actors by setting a high standard for acting and becoming a guru for many actors who moved ahead in new ways, including Shahrukh Khan and Big B.
10. The author is pained to see pictures of Dilip Kumar in his old age, illness, and weakness, and wished that Saira Banu would not share his suffering because he prefers to remember him as a hero.
Essay Questions
1. Analyze Dilip Kumar’s legacy as described in the text. How does the author portray him as not just a film star, but a cultural and historical figure?
2. Explore the symbolism of light and shadow in the text. How does the author use these metaphors to describe Dilip Kumar’s character and impact?
3. Discuss the theme of love and empathy in the text. How does the author use the story of Dilip Kumar to make a point about the importance of these qualities?
4. How does the text use language to convey Dilip Kumar’s profound influence on Indian culture and film? Pay particular attention to the author’s descriptions of his roles and his personal qualities.
5. Reflect on the author’s call for museums to be created in the actor’s honor, especially in light of the fact that he is now deceased. What would be the function of such a museum, as presented by the author?
Glossary of Key Terms

Aftab-e-Fan: Literally, “sun of art.” A metaphor used in the text to describe Dilip Kumar’s immense talent and influence, like the sun shining in the world of art.

Darvesh: Refers to a person who has chosen a life of simplicity and spirituality. In this text, it seems to be a term the author uses for himself.

Khuda Das Likeness: A reference to Dilip Kumar’s God-given appearance and personality, suggesting that he was uniquely made for his craft.

Mayusis: Refers to feelings of hopelessness or despair. The text mentions Dilip Kumar’s connection with these emotions, which he transformed into a unique strength in his acting.

Mujhaat: A unique or special situation or person, someone with unique qualities and capabilities. Used to describe how Dilip Kumar does not panic.

Nishan-e-Pakistan Award: The highest civilian award given by the government of Pakistan. Dilip Kumar received this award, symbolizing his cross-border recognition.

Raj: A term meaning rule or kingdom; the text refers to the Raj of India, meaning the rule of India.

Shariat Taaba: An event or achievement of great magnitude. Used to describe Dilip Kumar’s impact, saying he proved to be such an event.

Shivling: A symbolic representation of a Hindu deity, usually an oval-shaped, phallic icon. In this text, it signifies a powerful award.

Tahzeebom: An Arabic word referring to culture or civilization, specifically the refinement and sophistication of a society. The term is used to describe Dilip Kumar’s character, which embodies a mixture of culture, fear, and wealth.

Dilip Kumar: A Legacy of Art and Virtue

Okay, here’s a detailed briefing document summarizing the provided text about Dilip Kumar:

Briefing Document: Analysis of Text on Dilip Kumar

Date: October 26, 2023

Subject: In-depth Review of Text Detailing the Life and Impact of Dilip Kumar

Source Material: Excerpts from “Pasted Text”

Overview:

This document analyzes a passionate and somewhat unstructured tribute to the legendary Indian actor Dilip Kumar. The text, written in a highly emotive style, goes beyond a simple biographical account to celebrate Kumar as a cultural icon, a moral compass, and an embodiment of true artistry. The author uses a very personal lens, interweaving personal experiences and opinions with a reverence for Kumar’s talent and influence. The piece explores his acting prowess, his impact on society, and his lasting legacy.

Main Themes and Key Ideas:
1. Dilip Kumar: More Than Just an Actor:
- The text consistently emphasizes that Dilip Kumar was not just a film star, but a profound figure who transcended his profession. He is described as a “hero of the race,” a “priceless masterpiece of nature”, and “the god of love.”
- Quote: “The tragedy king Dilip Kumar shone like the sun, he was called the hero of generations because of his true identity.”
- His dedication, hard work, and “true identity” are praised. The author sees him as an ideal and an inspiration.
- He is portrayed as a “Dervish” a “lover of borders,” suggesting a spiritual and universal appeal.
1. A Paragon of Virtue and Truth:
- Kumar is lauded as a man of truth, goodness, and integrity. He is presented as a fighter against evil and an inspiration for those fighting for their rights.
- Quote: ” He is the hero of goodness and the villain of all evil. He is the hero of every youth and every person who fights for the truth and for their rights in society.”
- The author highlights Kumar’s ability to deliver dialogues with such sincerity that they felt like “pearls falling from his body” suggesting deep emotional truth and authenticity in his performances.
- His stories seem to be “a collaboration of truth.”
- The author contrasts Dilip Kumar’s honesty with the perceived lack thereof in other actors, calling them “liars without a mother”.
- He’s the “doctor of love” who “applies the balm of love to the hearts shattered by pain and sorrow.”
1. Artistic Mastery and Influence:
- The text highlights Dilip Kumar’s exceptional acting talent. He is called a “few emperors of acting” and “the god of love” in the field of arts.
- His roles are not seen as mere performances, but as embodiments of the characters, suggesting a deep level of immersion and authenticity.
- The author believes Kumar influenced the way actors express emotion, with his ability to make words feel natural and heart felt rather than rote.
- He’s described as bringing such depth to his roles that they remain “famous forever”.
- He is described as a “tower of light” and a “lighthouse for the lost caravan”.
1. Personal Connection and Impact:
- The author repeatedly states the personal impact Dilip Kumar’s work had on their life, providing courage and determination during times of personal hardship.
- Quote: ” The push-up winds and time of life were falling on me without any gloves. At that time it gave me new courage, passion and bat. Even in the storms of the cruel world, I was determined to stay updated with it.”
- The author mentions specific films, like “Devdas,” and reflects on how Kumar’s acting led him to reject the “sacrifice yourself for love” interpretation of the story.
- The author was encouraged by Kumar to learn from his experiences.
- The text suggests that Kumar’s influence was immense not just for the author, but also on millions of lives across the world.
1. Legacy and Immortality:
- The text posits that Dilip Kumar is a legend whose fame will last for centuries, making him a truly immortal figure.
- Quote: “a star who will keep shining for centuries, a lotus whose fan is amazing, who is immortal and will remain immortal”.
- He’s compared to a “lotus” and a “bud that sprouted” to become something great.
- The author believes his impact is so deep that a film academy or university would benefit from using his work as a guide.
- Despite his passing, the author suggests that Kumar’s legacy lives on and his films will remain in the memories and hearts of his fans.
1. Dilip Kumar’s Secularism and Humanitarianism:
- The text highlights that Kumar was a secular and unifying figure, bridging religious and cultural differences. He is described as a “bridge throughout his life with full knowledge, erasing the Hindu-Muslim differences.”
- He was a unifying figure who “erased Hindu-Muslim differences”.
- His songs and films promoted “Insaaniyat and Love.”
- The text portrays him as a “truly secular human being.”
- The author also notes that he had relationships with people from diverse backgrounds.
- He is praised for being a voice of humanity and bringing people together.
1. Contrasting Interpretations of Roles:
- The text compares the three actors, Sehgal, Dilip Kumar, and Shahrukh Khan, that played Devdas and concludes Dilip Kumar’s interpretation was unique and more resonant.
- The author specifically mentions the character of Devdas to highlight Kumar’s impact on how people viewed love and life.
1. Personal Encounters and Reflections:
- The author includes anecdotes of personal encounters with Dilip Kumar, such as visiting his house in 1988 and working near his residence in 1993, adding an intimate dimension to the narrative.
- The text reflects on the author’s personal experiences related to Dilip Kumar, highlighting the author’s strong sense of admiration and connection.
- The author expresses sadness about Dilip Kumar’s declining health in old age, and wishes he could continue to be seen as the hero he always was.
Key Quotes:
- “Perhaps the biggest award of Dilip Kumar’s life is the deep love of millions of people for him which fascinates many races.”
- “He is the hero of goodness and the villain of all evil. He is the hero of every youth and every person who fights for the truth and for their rights in society.”
- “When he delivers dialogues, the jokes that come from his tongue do not seem to be rote. It seems that pearls are falling from his body due to the heat of his liver.”
- “a star who will keep shining for centuries, a lotus whose fan is amazing, who is immortal and will remain immortal”.
- ” The real thing is that more than how much of a hit they were at the box office He has hit hearts, he has lit Diwali lamps in the dimagon“.
Conclusion:

The text presents a highly personal and emotional tribute to Dilip Kumar, portraying him not merely as a talented actor, but as a beacon of truth, morality, and artistic excellence. The author’s deep admiration and connection to Kumar’s work are evident, emphasizing the lasting legacy and impact he left on the Indian film industry and beyond. The unstructured and passionate tone highlights the profound emotional response Dilip Kumar’s presence and work evoked. The piece also reveals the author’s personal journey of growth and resilience inspired by the actor.

Dilip Kumar: The Tragedy King and Beyond

FAQ: Understanding the Legacy of Dilip Kumar
1. Who was Dilip Kumar and what made him a significant figure in Indian cinema? Dilip Kumar, born Mohammad Yusuf Khan, was a highly acclaimed and influential actor in the Indian film industry. He was known as the “Tragedy King” for his powerful portrayals of emotionally complex characters. His dedication, hard work, and unique acting style established him as a hero of generations, admired not just in India but worldwide. He was not just an actor; he was considered an “Aftab-e-Fan” (sun of art), whose influence was profound and lasting.
2. Beyond acting, what other aspects of Dilip Kumar’s personality were emphasized? The source emphasizes that Dilip Kumar was more than an actor. He was a “hero of goodness,” a fighter against evil, and a champion for truth and justice. His dialogues were not mere rote recitations but seemed to come from deep within him, filled with emotion and authenticity. He was described as a spiritual entity, a healer of broken hearts, and a secular figure who bridged divides between Hindu and Muslim communities. He was a Darvesh (ascetic) at heart.
3. How did Dilip Kumar’s work impact his audience and society? Dilip Kumar’s performances were deeply impactful, inspiring audiences with courage and passion. He taught viewers to listen to the truth, to stand up for themselves, and to strive for success through hard work. His work not only entertained but also encouraged critical thinking, urging people to distinguish between reality and illusion. He emphasized living in the real world rather than becoming lost in dreams. He made an impact by lighting ‘Diwali lamps’ in the minds of his viewers, and brought true characters to the screen with deep dedication.
4. What accolades did Dilip Kumar receive and how did they compare to the love he received from the public? Dilip Kumar received numerous prestigious awards, including the Dada Saheb Phalke Award from India and the Nishan-e-Pakistan Award. While these accolades were significant, the text suggests that the biggest award of his life was the immense love and admiration he received from millions of fans across different backgrounds. This deep affection was a more profound measure of his impact.
5. How was Dilip Kumar perceived by his fellow artists and the younger generation of actors? Dilip Kumar was revered by fellow artists, from senior actors like Sairabdhi Ji and Dada Muni Ji to younger generations of actors like Shahrukh Khan and Amitabh Bachchan. He was seen as an “emperor of the Indian film industry,” and many actors considered him their guru and friend. His influence was so profound that he set the bar for excellence, and even his biggest fans always paid him respect, for instance not offering gifts better than shawls and books.
6. How does the source compare Dilip Kumar’s portrayal of Devdas to other actors who have played the same role? The source compares Dilip Kumar’s Devdas with that of K.L. Saigal and Shahrukh Khan. While acknowledging the technical aspects and performances of the other actors, the author asserts that Dilip Kumar’s portrayal was unmatched. It was as if the character of Devdas was written with Dilip Kumar in mind, highlighting his unique ability to embody the depth and pathos of the character.
7. What does the text emphasize about Dilip Kumar’s film choices and the lasting impact of his roles? The text notes that Dilip Kumar did not appear in a large number of films, but that each film he did demonstrated his skill and dedication. Rather than focusing on his box office success, it underscores his ability to touch the hearts and minds of his audience through unforgettable and powerful roles. It is suggested that his ability to bring his characters to life with such authenticity is what has kept his art and legacy alive, and not his hits in the box office.
8. How did Dilip Kumar’s secularism and humanism come through in his life and work? Dilip Kumar is portrayed as a deeply secular individual who strived to bridge the gap between Hindu and Muslim communities. He was depicted as someone who maintained harmony in life and did not discriminate. His films, his public life, his dialogues, and his actions emphasized the messages of humanity and love. He carried an attitude of respect, love and grace throughout his whole life, and he used his art to bring all together.
Dilip Kumar: A Life in Cinema

Okay, here is the timeline and cast of characters based on the provided text:

Timeline of Main Events
- December 11, 1922: Mohammad Yusuf Khan (later known as Dilip Kumar) is born in Oman Gali of Malik Mohalla Khuda Dad, in the Kissa Khani Bazaar area.
- Early Life/Career: Dilip Kumar works his way up from humble beginnings to become a celebrated actor known as “The Tragedy King”.
- 1988: Dilip Kumar visits Pakistan, and an article is written about his visit which becomes part of an ADV Magazine.
- 1993: The narrator of the text lives near Dilip Kumar’s childhood home while attending a training course at the Peetu University.
- Throughout his career: Dilip Kumar acts in over 60 films, receiving immense popularity and becoming a figure of great cultural significance.
- Later Career: Dilip Kumar is regarded as the “emperor” of the Indian film industry, with younger actors and fans considering him a guru-like figure. He is known for his dedication and the emotional depth he brings to his roles.
- Later Life/Illness: Dilip Kumar’s old age, illness, and weakness become a subject of concern and sadness for fans. There is discussion of his house becoming a museum.
- Death: Dilip Kumar passes away due to his illness and breathlessness, leaving behind a lasting legacy. The text notes that his film roles are now like “moving fast on the screen”.
- Posthumous: Dilip Kumar’s legacy is assured with his roles continuing to live on in the hearts and minds of fans. It is mentioned that there is a plan to convert his and Raj Kapoor’s homes into museums.
Cast of Characters (Principal People Mentioned)
- Dilip Kumar (Mohammad Yusuf Khan): The central figure of the text. Born in 1922, he is described as an unparalleled actor, “The Tragedy King” of Indian cinema, known for his dedication, emotional depth, and impact on Indian culture. He is presented as not just an actor but also a secular, moral force.
- Saira Banu: Dilip Kumar’s wife, mentioned in connection to the narrator’s wish that she would help fans continue to see him as the hero, despite his old age and illness,
- Shah Rukh Khan: A younger contemporary actor who is seen as a fan of Dilip Kumar, he also played the role of Devdas and is compared to Dilip Kumar’s version.
- Devka Rani: Film actress, her eyes are said to have been stuck on Dilip Kumar’s acting abilities.
- Jawaharlal Nehru: Mentioned as one of those who admired Dilip Kumar’s art, illustrating his wide recognition from the common man to prominent leaders.
- Raj Kapoor: A contemporary actor of Dilip Kumar’s, whose house is also being planned to be converted into a museum.
- Madhubala: Another legendary actor who Dilip Kumar worked with, and the author mentions a song sung for her that now seems meaningful after the passing of Dilip Kumar.
- Sehgal: Actor who, along with Shah Rukh Khan and Dilip Kumar, played Devdas, his performance is compared to the other actors.
- Prithvi Rajput Ji Actor who is mentioned in the text.
- Dada Muni Ji: Actor who is mentioned in the text.
- Big B: Actor who is mentioned in the text.
Note:
- The text is written in a highly metaphorical and passionate style, making the distinction between literal and figurative language necessary for interpretation.
- The author of the text considers themself to be a life-long fan of Dilip Kumar.
- There are a few names mentioned without further explanation, suggesting they are part of Dilip Kumar’s larger artistic circle but lacking specific context in this text.
Dilip Kumar: Tragedy King, Cultural Icon

Dilip Kumar, born Mohammad Yusuf Khan on December 11, 1922, was a significant figure in the Indian film industry, known as the “tragedy king” and “hero of generations” [1]. Here’s a summary of his life based on the provided sources:
- Early Life and Identity: Born in Oman Gali of Malik Mohalla Khuda Dad [1], Dilip Kumar’s true identity and “Khuda Das likeness” were recognized not just in India but worldwide [1].
- Film Career:
- He was known for his “fanaticism, true dedication and hard work” which led to his success in the Indian film industry [1].
- He did not do many films, not going beyond 60, but the ones he did were done with full dedication and were very impactful [2].
- He is known as an “emperor of Indian film industry” [2].
- He was known for bringing a unique depth to his roles and delivering dialogues that felt natural and authentic [3].
- He made a lasting impact on the hearts of his fans and lit “Diwali lamps” in their minds [2].
- Impact and Recognition:
- He received the Dada Saheb Phalke award from the Indian government and Nishane Pakistan Award from Pakistan [1].
- His biggest award was the “deep love of millions of people” [1].
- He is considered a “god of love” and an “emperor of acting” and a “lighthouse for the lost caravan” [1].
- He is seen as a “hero of goodness” and a fighter against evil, and for the truth and rights in society [3].
- He inspired many with his courage and determination [3].
- He taught people to listen to the truth and to persevere [2].
- He is seen as a unifier, bridging Hindu-Muslim differences [4].
- He was a “truly secular human being” [4].
- Legacy:
- Dilip Kumar’s work is seen as a “priceless masterpiece of nature” [1].
- His films are so highly regarded that they could be the basis of an “academy and a university” for aspiring fans [4].
- He has a special connection to the character of Devdas, and his performance is considered unmatched [4].
- He is considered immortal and his characters will always be alive in the hearts and minds of his fans [5].
- He maintained his dominance and was considered a guru and friend to his fans and successors [2].
- Even in his old age, he continued to be a source of inspiration and admiration, though his illness caused concern among his fans [5].
- His house may be turned into a museum [5].
- Personal LifeHe had a close connection with his fans and made efforts to meet them [5].
- He had interactions with people of all backgrounds and was a confluence of fear, wealth, and tahzeebom [5].
- He was a lover of borders [1].
- He kept smiling while meeting with Unnas [5].
- He went to Pakistan in 1988 and wrote an article about it [1].
In summary, Dilip Kumar was more than just an actor; he was a cultural icon who embodied goodness, truth, and dedication, leaving an indelible mark on the Indian film industry and the hearts of millions [1-5].

Dilip Kumar: Emperor of Indian Cinema

Dilip Kumar had a remarkable film career that cemented his place as a legend in the Indian film industry [1]. Here’s a detailed look at his career, based on the sources:
- Dedication and Impact: Dilip Kumar was known for his “fanaticism, true dedication, and hard work,” which were crucial to his success [1]. He did not act in many films, not going beyond 60, but he put his full dedication into the roles he did take [2]. He brought a unique depth to his characters and delivered dialogues with a natural, authentic feel [3]. His performances had a lasting impact on the hearts of his fans and he is said to have lit “Diwali lamps” in their minds [2].
- Recognition and Titles: Kumar is known as the “emperor of Indian film industry” [2]. He was called the “tragedy king” [1]. The Government of India gave him its biggest film award, the Dada Saheb Phalke, and Pakistan gave him its biggest award, the Nishane Pakistan Award [1]. However, the “deep love of millions of people” is considered his biggest award [1].
- Unique Qualities: He is considered a “god of love” and an “emperor of acting” [1]. He is also seen as a “hero of goodness” who fought against evil, and for the truth and rights in society [3].
- Influence on Others: His beautiful acting not only gave passion but also ignited the ability to think and understand [3]. He was a source of inspiration, providing courage and determination to many [3]. He also taught people to listen to the truth and to persevere [2]. He maintained his dominance and was considered a guru and friend to his fans and successors [2].
- Roles and Films: He is seen to have a special connection to the character of Devdas, and his performance in that role is considered unmatched [4]. Some of his notable films include Aur Bata Milan, Jugnu, Mughal-e-Azam, Deedar, Andaaz, Jogan Mela, Sandil, Daag, Naya Daur, Tarana, Madhumati, Ram and Shyam, Dil Diya Dard Liya, Laborer, Leader, Azad, Jew Koi Door, Ganga Jamuna, Traveler Gopi, Amar Das, Udaan Khatola, Dastan, Revolution, Karma, Shakti, and Vidha. His movies are considered to be masterpieces [4].
- Impact on the Industry: Kumar’s films are so highly regarded that they could be the basis of an “academy and a university” for aspiring fans [4].
In summary, Dilip Kumar’s film career was marked by his dedication, unique acting style, and the profound impact he had on the hearts of his fans [2]. He was not just an actor but a cultural icon whose work is seen as a “priceless masterpiece of nature,” and he is considered immortal and his characters will always remain alive in the hearts and minds of his fans [1, 5]. His influence can be seen in the work of many actors who followed him, including Shahrukh Khan and Big B [2].

Dilip Kumar: A Legacy of Acting

Dilip Kumar’s acting legacy is profound and multifaceted, marked by his unique approach to character portrayal and his lasting influence on the Indian film industry. Here’s an overview of his acting legacy, based on the sources:
- Unique Style and Depth: Dilip Kumar was known for his “fanaticism, true dedication and hard work” [1]. He brought a unique depth to his roles, delivering dialogues that felt natural and authentic [2]. His performances had a lasting impact on the hearts of his fans [3]. He is described as having the ability to make his characters seem as if they were born into the roles, connecting with their emotions and experiences [2]. His acting style was so powerful that it inspired people on “countless wakes in life” [2].
- Impact and Influence: Kumar’s acting was not just about entertainment; it ignited the ability to think and understand [2]. He is considered an “emperor of acting” [1]. He is also considered a “god of love” and a “hero of goodness” who fought against evil and for truth and rights in society [1, 2]. He inspired many with his courage and determination, and taught people to listen to the truth and persevere [2, 3]. He maintained his dominance and was considered a guru and friend to his fans and successors [3].
- Lasting Legacy:Immortal Characters: Dilip Kumar’s characters are considered immortal and will always remain alive in the hearts and minds of his fans [3, 4]. His work is described as a “priceless masterpiece of nature” [1].
- Academy and University Potential: His films are so highly regarded that they could form the basis of an “academy and a university” for aspiring fans [5].
- Unmatched Performance as Devdas: He has a special connection to the character of Devdas, and his performance in that role is considered unmatched [5]. The source suggests that the Bengali novel Devdas may have been written after seeing Kumar’s acting [5]. Other actors who have portrayed Devdas, such as Sehgal and Shahrukh Khan, are compared to his performance, highlighting the singularity of Kumar’s acting [5].
- Inspiration to Future Generations: He has inspired many generations of actors including Shahrukh Khan and Big B [3]. Even the biggest fans who came after him, never accepted anything less than a shawl or a book to bow down before his greatness [3].
- Continued Admiration: Even in his old age, he continued to be a source of inspiration and admiration, though his illness caused concern among his fans [4].
In summary, Dilip Kumar’s acting legacy is marked by his profound dedication, his ability to connect with his characters on a deep emotional level, his impact on the Indian film industry, and his influence on future generations of actors. His performances are not just remembered, they are celebrated as timeless masterpieces. He is considered a true legend whose work has left an indelible mark on Indian cinema [3].

Dilip Kumar: Cultural Icon of India

Dilip Kumar’s cultural impact is far-reaching and profound, extending beyond his acting career to influence society, inspire generations, and bridge cultural divides. Here’s a detailed look at his cultural impact, based on the sources:
- Inspiration and Role Model:
- Dilip Kumar was seen as a “hero of goodness” and a fighter against evil, and for the truth and rights in society [1]. He inspired many with his courage, determination, and his ability to stay updated even amidst the storms of a cruel world [1].
- He taught people to listen to the truth, to persevere, and to not pass by crying [2].
- His beautiful acting not only gave passion but also ignited the ability to think and understand [1].
- He inspired people on “countless wakes in life” [1].
- Secularism and Unity:
- Dilip Kumar played the true role of a bridge throughout his life, erasing Hindu-Muslim differences [3]. He was a “truly secular human being” [3].
- He was a confluence of fear, wealth, and tahzeebom, embracing both Urdu and Hindi cultures [4].
- Influence on the Film Industry:
- He is considered the “emperor of Indian film industry” [2, 5]. His acting was so profound that it has been called a “priceless masterpiece of nature” [5].
- He maintained his dominance and was considered a guru and friend to his fans and successors [2].
- He has inspired many generations of actors, including Shahrukh Khan and Big B [2].
- His films are so highly regarded that they could form the basis of an “academy and a university” for aspiring fans [2, 3].
- Emotional Connection with Fans:
- Dilip Kumar had a deep emotional connection with his fans. He is said to have “hit hearts” and lit “Diwali lamps” in the minds of his audience [2].
- The “deep love of millions of people” is considered his biggest award [1, 5].
- He made efforts to meet his fans [4].
- His fans viewed him as a “god of love” [5].
- His fans would not accept anything less than a shawl or a book to bow before his greatness [2].
- His old age and illness caused concern among his fans [4].
- Timeless Legacy:
- Dilip Kumar’s characters are considered immortal and will always remain alive in the hearts and minds of his fans [2, 4, 5].
- He is a star who will “keep shining for centuries” [5]. He is described as a “lotus whose fan is amazing” and as a “bud that sprouted… which proved to be such a big Shariat Taaba” [5].
In summary, Dilip Kumar’s cultural impact is immense, encompassing his role as an inspiration and role model, his efforts to bridge cultural and religious divides, his significant influence on the Indian film industry, his deep emotional connection with fans, and his lasting legacy as an immortal figure in Indian culture. He was not just an actor, but a cultural icon whose influence extends beyond the realm of cinema to inspire unity and goodness.

Dilip Kumar: A Legacy of Humanitarian Action

Dilip Kumar’s humanitarianism is evident through his actions and the values he embodied, which significantly impacted society and his fans. Here’s an overview of his humanitarian contributions, based on the sources:
- Fighting for Truth and Rights: Dilip Kumar was not just an actor; he was a “hero of goodness” who actively fought against evil and stood up for truth and the rights of people in society [1]. This commitment to justice and righteousness highlights a key aspect of his humanitarianism.
- Secularism and Unity: He played the role of a bridge throughout his life, working to erase the differences between Hindus and Muslims [2]. He was a “truly secular human being,” which reflects his inclusive and humanitarian approach [2]. This effort to foster unity is a significant aspect of his impact on society.
- Inspiration and Guidance: He inspired people with his courage, determination, and his ability to stay updated even amidst the storms of a cruel world [1, 3]. He taught people to listen to the truth, to persevere, and to not pass by crying [1, 3]. This guidance and inspiration served as a way of empowering people and helping them navigate their lives. His acting ignited the ability to think and understand [1].
- Emotional Connection and Compassion: He was known as the “doctor of love,” who applied the balm of love to hearts shattered by pain and sorrow [4]. His words were like a rosary of love, suggesting his compassionate nature [4]. This indicates a deep emotional connection with his audience and a commitment to alleviating their suffering. The “deep love of millions of people” is considered his biggest award [4].
- Beyond Entertainment: His work wasn’t limited to entertainment, but was geared toward teaching people about what is real and what is not [1]. He showed people the true form of life and explained to them how to decide for themselves what is real [1]. By using his art to instill these values, he took on a humanitarian role [1].
- Reaching Out to People: He was a “lighthouse for the lost caravan In this dark city” [4]. This metaphor highlights his role as a beacon of hope and guidance for those who were lost or struggling, signifying a significant aspect of his humanitarianism.
- Role Model: Even the biggest fans, who came after him, never accepted anything less than a shawl or book to bow before his greatness [3]. This shows how they looked at him not only as a role model for acting, but also for life.
In summary, Dilip Kumar’s humanitarianism is characterized by his fight for truth and rights, his promotion of secularism and unity, his role as an inspiration, and his compassionate nature. He used his position to promote good and make a positive impact on society. He is not just remembered as an actor, but as someone who embodied values of kindness, empathy, and unity, which left a lasting legacy of humanitarianism.

By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog

Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
March 4, 2025
Al Riyadh Newspaper: March 03, 2025 Global Economy, and Ramadan Features
A diverse array of topics are covered within the provided sources. These include development projects in Saudi Arabia, an overview of Saudi culture, and updates from the world of soccer. The sources also touch on international relations and conflict, including a potential Israeli operation in Gaza. Additionally, there’s discussion of global economics, like the relationship between the US and China. The sources also include announcements and updates about local Saudi events and the services being offered during Ramadan. These cover both infrastructure improvements, and the distribution of aid, as well as religious observances and more.

Revitalizing Heritage: A Study Guide to Saudi Arabia’s Cultural Landscape

I. Quiz: Short Answer Questions

Answer each question in 2-3 sentences.
1. What are the primary goals of the Prince Mohammed bin Salman Project for the Development of Historical Mosques?
2. According to the article, how is the Kingdom of Saudi Arabia working to improve and update modern mosques?
3. What sector is seeing growth in Saudi Arabia that represents about 25% of all managed assets in the financial market?
4. How is the Kingdom of Saudi Arabia working to address security concerns in the country?
5. What is the “Good Areas 2” campaign and what goals is it seeking to achieve?
6. What steps are being taken to improve the operation and preparedness of the Prophet’s Mosque in Saudi Arabia?
7. How has the Kingdom of Saudi Arabia updated the Islamic teaching industry to align with its 2030 vision?
8. What challenges was the Kingdom of Saudi Arabia facing concerning trade with China and the United States of America?
9. How was Saudi Arabia working to improve the sustainability of residential programs for citizens in 2024?
10. What is the Saudi Arabia Minister of Culture doing to preserve and promote the cultural heritage of the Kingdom?
II. Answer Key
1. The primary goals of the Prince Mohammed bin Salman Project are to rehabilitate and restore historical mosques, revive their original character, and highlight their architectural and religious significance. The project aims to showcase the Kingdom’s cultural heritage while aligning with the Kingdom’s Vision 2030 by preserving original architectural features.
2. The Kingdom is focusing on modernizing the design of mosques while retaining elements of local culture. Also, the kingdom is using AI to improve the recitation of the Quran.
3. The real estate sector is seeing growth, representing about 25% of all managed assets in the Saudi financial market. This growth reflects the increasing role of the real estate sector in boosting the national economy and aligns with the Kingdom’s Vision 2030.
4. The Kingdom is committed to ongoing operations to eliminate the last terrorist, referencing actions against groups like the Kurdistan Workers’ Party (PKK). The Kingdom is actively investing in the security sector by allocating billions of dollars to modernize its military.
5. “Good Areas 2” is a campaign launched to promote social awareness and community participation in supporting deserving families. It aims to empower society to contribute to providing sustainable housing solutions in various regions of the Kingdom, reinforcing a culture of giving and social solidarity.
6. The Kingdom is expediting procedures in accordance with judicial regulations to ensure fairness and efficiency. This was done to be prepared to welcome visitors in the month of Ramadan.
7. The Kingdom has focused on remote teaching, allowing students to receive lessons in recitation and intonation from anywhere. Some smart applications are used to provide automated feedback for the student.
8. The Kingdom was facing growing tensions in trade relations between China and the US. This included additional tariffs on Chinese goods.
9. The Kingdom of Saudi Arabia signed an agreement to provide 5,000 sustainable housing units to eligible families in various regions of the Kingdom. This agreement aims to enhance housing empowerment efforts and support the Vision 2030 goals of increasing homeownership among citizens.
10. The Minister of Culture is focused on preserving and showcasing Saudi culture and heritage through various initiatives. These include supporting heritage preservation efforts, promoting cultural exchanges, and nurturing Saudi talent in diverse cultural fields.
III. Essay Questions
1. Analyze the Prince Mohammed bin Salman Project for the Development of Historical Mosques, addressing its goals, implementation strategies, and potential impact on Saudi cultural identity and tourism.
2. Examine the growth of the real estate sector in Saudi Arabia in relation to the Kingdom’s Vision 2030.
3. Discuss the challenges and strategies involved in the modernization of Saudi society, with a focus on cultural and social change.
4. Assess Saudi Arabia’s role in regional and international affairs, focusing on its efforts to combat terrorism and promote stability.
5. Evaluate the potential of the cultural sector in Saudi Arabia to contribute to economic diversification and job creation, in line with the Kingdom’s Vision 2030.
IV. Glossary of Key Terms
- Vision 2030: Saudi Arabia’s strategic framework to reduce the country’s dependence on oil, diversify its economy, and develop public service sectors such as health, education, infrastructure, recreation, and tourism.
- Historical Mosques: Mosques in Saudi Arabia that hold significant historical and cultural value, often reflecting unique architectural styles and historical events.
- Real Estate Funds: Investment vehicles that pool capital to purchase, manage, and develop real estate properties, contributing to the growth of the real estate sector.
- Cultural Heritage: The legacy of physical artifacts and intangible attributes of a group or society that are inherited from past generations, maintained in the present, and bestowed for the benefit of future generations.
- Social Solidarity: The degree to which members of a group or society feel united, bound together, and committed to supporting one another.
- Sustainability: The ability to meet the needs of the present without compromising the ability of future generations to meet their own needs. In the context of development, it refers to balancing economic growth, environmental protection, and social well-being.
- Modernization: The process of social change that involves the transformation of a society from a traditional state to a more advanced and technologically sophisticated one.
- Terrorism: The unlawful use of violence and intimidation, especially against civilians, in the pursuit of political aims.
- Economic Diversification: The process of shifting an economy away from a single income source toward multiple sources from a growing range of sectors and markets.
- Cultural Exchange: The interchange of ideas, information, and cultural values among different societies or groups, promoting understanding and appreciation of diverse cultures.
Al Riyadh, March 3, 2025: Saudi Arabia in Brief

Okay, here’s a detailed briefing document summarizing the main themes and ideas from the provided text excerpts.

Briefing Document: Analysis of “20704.pdf”

Overview:

This document synthesizes information from excerpts of the Saudi Arabian newspaper “Al Riyadh,” issue number 20704 dated Monday, March 3, 2025. The excerpts cover a diverse range of topics, including:
- Restoration and development of historical mosques in Saudi Arabia.
- Economic developments, including growth in commercial registrations and the real estate sector.
- International relations, focusing on tensions between China and the United States.
- Regional conflicts, particularly the Israeli-Palestinian conflict and Turkish-Kurdish relations.
- Cultural preservation and development within Saudi Arabia.
- Sports news and events.
Key Themes and Ideas:
1. Preservation and Development of Historical Mosques:
- Theme: A significant project is underway to restore and rehabilitate historical mosques across Saudi Arabia.
- Details: The “Prince Mohammed bin Salman Project for the Development of Historical Mosques” is in its second phase, encompassing 30 mosques across 10 regions. Some of these mosques date back to the time of the Prophet’s companions, while the newest are about 60 years old.
- Quote: “Some of them date back to the era of the Companions, may God be pleased with them, while the newest of them is 60 years old…”
- Quote: “… ensuring their restoration, and restoring the spirit of the special architectural details of each mosque since its construction in accordance with the place and its identity.”
- Significance: The project balances modern construction standards with the preservation of historical architectural elements. It aligns with Vision 2030, which emphasizes preserving the Kingdom’s cultural heritage and utilizing original architectural features in modern mosque designs.
1. Economic Growth and Diversification:
- Theme: Saudi Arabia is experiencing growth in various economic sectors, particularly in commercial registrations and real estate.
- Details: There’s a significant increase in commercial registrations, with Riyadh having the highest concentration. The real estate sector is also growing, supported by government projects under Vision 2030. Real estate funds now represent about 25% of the total assets managed in the financial market.
- Quote: “The Kingdom of Saudi Arabia has become witnessing a noticeable growth in real estate funds, representing about 25% of the total assets managed in the financial market.”
- Significance: The growth reflects the government’s ongoing efforts to diversify the Saudi economy and promote investment in various sectors beyond oil.
1. International Relations and Geopolitical Tensions:
- Theme: The excerpts highlight ongoing tensions between China and the United States, as well as regional conflicts in the Middle East.
- Details: The Trump administration’s announcement of additional tariffs on Chinese goods has strained trade relations. The Israeli-Palestinian conflict remains a significant issue, with reports of Israel preparing a large-scale military operation in Gaza. There’s also mention of the Turkish-Kurdish conflict, with the PKK announcing a ceasefire.
- Quote: (Regarding China-US relations) “Deputy Prime Minister He Lifeng affirmed, during his hosting of the American Chamber of Commerce in China, that his country seeks to enhance cooperation with the United States despite the increasing economic differences.”
- Quote: (Regarding the Israeli-Palestinian conflict) “Israeli media reported yesterday that the Israeli occupation army is preparing to bring in more than 50,000 soldiers for a joint attack in various parts of the Gaza Strip.”
- Significance: These excerpts underscore the complex geopolitical landscape and the challenges facing various regions and international relations.
1. Cultural Development and Preservation within Saudi Arabia:
- Theme: Saudi Arabia is actively working to develop and preserve its cultural heritage.
- Details: The Ministry of Culture is focused on supporting cultural heritage, encouraging artistic production, and preserving Saudi traditions. There is a focus on balancing tradition with modernity.
- Quote: “Saudi culture is characterized by its richness in elements of tangible and intangible heritage, human customs and traditions, and historical urban development, in line with the continuous growth of visual, musical, performing arts and literary creativity.”
- Significance: This reflects a broader effort to strengthen national identity and promote Saudi Arabia’s cultural contributions on the world stage.
1. Philanthropic Initiatives and Community Support:
- Theme: There’s a visible emphasis on social responsibility and community support through various campaigns and initiatives.
- Details: The launch of “Jood Al-Manatiq 2” campaign highlights the efforts to provide sustainable housing solutions for those in need across different regions of the Kingdom. The campaign is supported by both public and private sectors.
- Quote: “Princes of the regions of the Kingdom launched yesterday their participation in the ‘Jood Al-Manatiq 2’ campaign launched by the ‘Jood Housing’ platform, one of the initiatives of the ‘Sakan’ Developmental Housing Foundation.”
- Significance: This reflects a growing emphasis on social welfare and creating a more inclusive society in line with Vision 2030.
Noteworthy Quotes:
- Prince Mohammed bin Salman (2016): “Our ambition will swallow housing, unemployment and other problems.”
- US Secretary of State Marco Rubio: (Regarding aid to Israel) “signed an order to activate emergency authorities to accelerate the delivery of weapons to Israel.”
Overall Impression:

The excerpts provide a snapshot of Saudi Arabia in early 2025, highlighting its efforts to modernize and diversify its economy, preserve its cultural heritage, and play a more prominent role on the global stage. However, they also acknowledge the challenges facing the Kingdom and the broader region, including geopolitical tensions and regional conflicts. The newspaper reflects a generally positive view of the Kingdom’s progress while addressing complex issues in a rapidly changing world.

King Salman Project: Restoring Saudi Arabia’s Historical Mosques

What is the King Salman Project for Historical Mosques?

The King Salman Project for Historical Mosques is an initiative focused on the rehabilitation and restoration of historical mosques throughout Saudi Arabia. It aims to preserve the architectural identity of these mosques while incorporating modern construction standards, ensuring sustainability and appropriate utilization of space. This project reflects the Kingdom’s vision to highlight its cultural heritage and architectural characteristics. The project includes mosques dating back as far as the era of the Companions (may Allah be pleased with them), with the most recent ones being 60 years old. Some mosques represented scientific beacons in different historical periods.

What are the key objectives of this mosque restoration project?

The project has several strategic objectives:
- Revitalizing and restoring the authenticity and spiritual atmosphere of historical mosques.
- Highlighting the architectural significance of these mosques.
- Emphasizing the cultural and historical dimension of the Kingdom of Saudi Arabia.
- Contributing to the Kingdom’s cultural presence and identity, in line with Vision 2030, by preserving original architectural features.
How many mosques are included in this project, and where are they located?

The project’s second phase includes 30 mosques across 10 regions of the Kingdom. The first phase, launched in 2018, also included the rehabilitation and restoration of 30 historical mosques in 10 regions. The distribution includes 6 mosques in the Riyadh region, 5 in the Makkah region, 4 in the Madinah region, and varied numbers in other regions like Asir, the Eastern Province, Al-Jouf, Jazan, the Northern Borders, Tabuk, Al-Baha, Hail, and Al-Qassim.

Who is involved in the implementation of the historical mosques restoration project?

The implementation relies on Saudi companies with expertise in rehabilitating historical buildings. This ensures the preservation of Saudi identity and architectural authenticity of each mosque since its establishment. The project ensures meticulous details and careful consideration of the place and its identity, in accordance with its spirit since its establishment.

How does the project balance modern needs with historical preservation?

The project seeks to strike a balance between modern and traditional building standards to incorporate sustainability while maintaining the historical and architectural integrity of the mosques. Suitable materials are used, and modern development influences are integrated into the existing structure in a way that respects its heritage. The components of the mosques are designed to blend the old with the new.

What role does Vision 2030 play in the context of this historical preservation?

The restoration of historical mosques is directly linked to the Saudi Vision 2030, which focuses on preserving the Kingdom’s heritage, benefiting from its authentic architectural features, and developing modern mosque designs that reflect local culture and identity.

Beyond the mosques restoration project, what other projects is Saudi Arabia undertaking related to the real estate and construction sectors?

Saudi Arabia’s real estate sector is experiencing significant growth, representing approximately 25% of total assets managed in the financial market. This growth supports the national economy by implementing major projects within Saudi Vision 2030, contributing to urbanization, increasing real estate projects, and developing the real estate fund sector. The government continues to implement mega-projects to boost the national economy, aligning with Vision 2030. There’s an expectation for continued urban expansion alongside an increase in real estate projects.

What does the article say about the growth of Saudi Arabia’s real estate sector?

The Saudi Arabian real estate market is witnessing notable growth, representing about 25% of the total assets managed in the financial market. This growth reflects the increasing role of the real estate sector in boosting the national economy, especially with the government continuing to implement major projects under Saudi Vision 2030. The sector is expected to continue to grow due to urban expansion and the increasing number of real estate projects. The article also highlights the important role of real estate funds in attracting local and international investments and fostering urban development, contributing to the realization of Saudi Vision 2030 goals.

Mosque Restoration Projects

Several sources discuss mosque restoration projects, with a particular focus on historical mosques:
- The Prince Mohammed bin Salman Project for the Development of Historical Mosques aims to highlight the unique architectural heritage and achieve originality and integration through reconstruction.
- The project was launched in 2018 to develop and restore 130 historical mosques.
- A specific example is the reconstruction of the Al-Qibli Mosque in the Riyadh.
- The Al-Qibli Mosque is located in the Al-Manfouha neighborhood and is one of the oldest mosques targeted in the project’s second phase.
- Its history dates back more than 300 years to 1100 AH. It was rebuilt in 1364 AH by King Abdulaziz Al Saud.
- The restoration of Al-Qibli Mosque uses the Najdi architectural style, which retains the natural elements of the original design.
- The mosque’s ceiling is made of three natural elements: mud, Athel wood, and palm fronds.
- This style is characterized by its ability to adapt to the local environment and hot, desert climate.
- Another example is the restoration of the Al-Ruwaiba Mosque in Buraidah, which is part of the Prince Muhammad bin Salman project.
- The restoration maintains the characteristics of its ceiling, which consists of three natural elements: mud, Athel wood, and palm fronds.
- The development of modern mosque designs is also mentioned.
Saudi Arabia: Economic Diversification & Vision 2030

Several sources touch on the topic of economic diversification in Saudi Arabia, primarily in the context of achieving the goals of Saudi Vision 2030:
- Saudi Vision 2030 emphasizes diversifying the Kingdom’s economy.
- Development of the real estate sector and Real Estate Funds contribute to economic diversification and the realization of Vision 2030.
- Privatization is seen as a strategic economic transformation to achieve a balance between financial sustainability, quality services and development. It is a means to diversify the economy, increase the efficiency of services, and promote innovation.
- The Kingdom’s privatization program aims to achieve non-oil revenues of 143 billion riyals and attract investments worth 62 billion riyals by 2025.
- Developing the cultural sector is considered a component of economic growth within the Kingdom’s Vision 2030.
- The sources also mention the development of the tourism sector as part of economic diversification.
These sources suggest that Saudi Arabia is actively pursuing economic diversification through various initiatives, including developing specific sectors, privatization, and promoting cultural and environmental tourism.

Israeli Military Actions & Regional Tensions

The sources discuss several aspects of Israeli military actions and related issues:
- Israeli-Palestinian Conflict:The Kingdom of Saudi Arabia condemned the Israeli government’s decision to halt the entry of humanitarian aid into the Gaza Strip.
- Saudi Arabia views this action as a form of collective punishment and blackmail, which is a violation of international law.
- The Kingdom called on the international community to ensure sustained access for aid and to activate international accountability mechanisms.
- Potential Israeli intervention in Syria:The Israeli Defense Minister threatened military intervention in Syria if the Syrian regime harmed the Druze population.
- This threat was made against the backdrop of clashes in Jaramana, a suburb southeast of Damascus.
- Israeli military preparedness on the Gaza border:The Israeli army raised its alert level along the Gaza border, with increased readiness in the area.
- Additional battalions are preparing for a potential ground operation, which could last for several weeks.
- The operation would focus on countering Hamas’ infrastructure, including tunnels, explosive device locations, and booby-trapped streets.
Ramadan Initiatives: Kingdom’s Charitable, Religious, and Cultural Activities

The sources highlight various initiatives undertaken during Ramadan, spanning from charitable campaigns to religious and cultural activities, as well as facilitation for visitors to the holy sites:
- “Jood Eskan” Campaign:Aims to raise awareness and encourage community participation in providing sustainable housing solutions for deserving families throughout the Kingdom during Ramadan.
- The campaign is carried out in different regions of the Kingdom.
- This initiative is an extension of previous successful campaigns that contributed to the stability and quality of life for beneficiary families in different regions.
- The General Presidency for the Affairs of the Two Holy Mosques’ ( হারামين شريفين ) initiatives:
- Focus on enriching the experience of visitors and worshipers in the Two Holy Mosques.
- These initiatives are based on strategic pillars that emphasize caring for time, respecting the place, and enhancing its position.
- The initiatives aim to highlight the Kingdom’s efforts in developing the Two Holy Mosques and to emphasize the religious and moderate values of Islam.
- Facilitating Access and Services for Pilgrims and Visitors:
- Expanded guidance services are provided in multiple languages to assist visitors.
- Coordination with relevant authorities to organize crowd management and guidance in the Two Holy Mosques.
- Provision of directional signage and designated routes to ensure smooth movement.
- Preparation of the Two Holy Mosques to accommodate a large number of worshipers.
- Health Services:
- Dr. Sulaiman Al Habib Hospital and medical centers provide comprehensive diagnostic and treatment services during Ramadan.
- Maintaining a state of readiness to receive emergency cases with specialized teams.
- Activation of services such as “Habib Care” for remote consultation with doctors and prescription refills via an app.
- Municipality Services in Mecca:
- The Municipality of Mecca intensifies its field and monitoring efforts to maintain public cleanliness, control pests, and monitor commercial establishments.
- Ensuring the safety of food products and water, and monitoring and maintaining municipal facilities.
- Cultural and Social Traditions:
- Highlighting the customs and traditions in the villages, such as community participation, preparing traditional dishes, and celebrating the first day of fasting for children.
- Traffic Management:
- Increased efforts by traffic police to organize traffic flow, ensure safety, and promote adherence to traffic rules.
- Distribution of Iftar meals:
- Distributing Iftar meals to those in need is a common practice during Ramadan.
Al-Nasr Soccer Team: Elite League Round of 16 Preview

The sources provide information about the Al-Nasr soccer team:
- Al-Nasr is preparing to meet the Esteghlal team in Tehran in the first leg of the Round of 16 of the Elite League.
- The team’s coach, Stefano Pioli, decided to exclude a number of players from the delegation to Tehran.
- Cristiano Ronaldo, Sultan Al-Ghannam, Abdullah Al-Khaibari, and Portuguese Otavio are among the injured players.
- The Al-Nasr team includes foreign players such as Brazilian goalkeeper, French Seko Fofana, Croatian Marcelo Brozovic, Brazilian Anderson Talisca, Ghislain Konan and Sadio Mane.
- Al-Nasr looks forward to returning with a positive result in the second leg, which will be held at the Al-Awwal Park Stadium in Riyadh.
- Al-Nasr has presented good performances during the league stage, achieving five wins, two draws, and one loss.
- A source mentions the Al-Nasr team is ignoring the Esteghlal’s tactical discipline. It claims that Al-Nasr was beaten by the Al-Orouba team because of Al-Nasr’s lack of tactical discipline.
- The source suggests a former player, Fuad Anwar, deserves to be honored by the Al-Nasr club.
- Another source claims that Al-Hilal achieved the Asian Cup in the same season and participated in the Club World Cup in the Emirates, which had a great impact on Al-Ittihad because Al-Ittihad played its matches and stopped for forty days. It goes on to say that this is not the case when Al-Nasr achieved another league title.
By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog

Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
March 3, 2025
Linux System Administration: Security, Networking, and Virtualization
This comprehensive guide explores essential Linux system administration tasks, focusing on security, resource management, and cloud technologies. It covers network configuration, firewall management using ufw and iptables, and secure communication via SSH and GPG. User authentication methods, including password-based and key-based authentication, are examined. Furthermore, the guide details file system security, including file permissions, Access Control Lists (ACLs), and the use of chroot jails for isolating processes. Disk usage analysis, cleanup procedures, system performance monitoring tools like top, free, and vmstat are explained. Finally, it provides an introduction to virtualization and cloud computing concepts, Docker, and container orchestration using Kubernetes and Docker Swarm.

Network Fundamentals and Security: A Comprehensive Study Guide

Study Guide Outline

I. Basic Networking Concepts * IP Addressing: IPv4 vs IPv6 * Subnets and Subnet Masks: Calculation, Network vs Host Bits * Domain Name System (DNS): Resolution Process, Hierarchy (Root Servers, TLD Servers, Authoritative Servers)

II. Linux Network Configuration * Interface Configuration: ifconfig (Legacy) vs ip (Modern) * Network Manager Command Line Interface (NMCLI): Connection Management, Wi-Fi Management

III. Network Troubleshooting * Ping: Testing Reachability, Packet Loss * Traceroute: Path Analysis, Hop Count * Netstat & SS: Monitoring Network Connections, Listening Ports

IV. Network Security Fundamentals * Firewall Management: Uncomplicated Firewall (UFW), IP Tables * AppArmor: Application Security Policies * Password Management: Best Practices, Multi-Factor Authentication (MFA)

V. Encryption and Key Management * GPG (GNU Privacy Guard): Public Key Cryptography, Encryption/Decryption, Key Management (Import/Export)

VI. System Monitoring and Logging * System Logging: Syslog, Authentication Logs, Kernel Logs * Disk Usage Analysis: DF, DU * Process Monitoring: Top, Htop * Memory Monitoring: Free, VMStat

VII. Virtualization and Cloud Computing * Virtualization Concepts: Virtual Machines (VMs), Hypervisors (Type 1 vs Type 2), KVM * Containerization: Docker, Docker Commands

VIII. VM/Container Management Tools * Libvert: Vert, Vert-install * Docker: Docker CLI

Quiz: Short Answer Questions
1. What is the primary difference between IPv4 and IPv6 addresses? IPv4 uses a 32-bit numerical label while IPv6 uses a 128-bit alphanumeric label. IPv6 was developed to overcome the address limitations of IPv4.
2. Explain the purpose of a subnet mask. A subnet mask is used to divide an IP address into network and host portions, determining how many addresses are available within a network. It also defines which part of the IP address identifies the network and which part identifies the host.
3. Describe the steps in the DNS resolution process. The DNS resolution process begins with a query from a client to a DNS resolver, which may recursively query root servers, TLD servers, and authoritative servers until the IP address corresponding to the domain name is found. The resolver then returns the IP address to the client.
4. What are the key differences between using ifconfig and ip commands in Linux? ifconfig is a legacy tool for network interface configuration while ip is the modern replacement; ifconfig is still in use. ip is part of the IP Route 2 package and offers more comprehensive functionality and features than ifconfig.
5. How does the ping command help in network troubleshooting? ping tests the reachability of a host by sending ICMP packets and measuring the round trip time for those packets. This helps identify network connectivity issues and packet loss.
6. What information does the traceroute command provide about a network route? traceroute identifies the path a packet takes to reach a destination, including each hop (router) along the way. It also measures the time it takes to reach each hop, helping pinpoint delays or failures.
7. What is the role of the Uncomplicated Firewall (UFW) in Linux systems? UFW is a user-friendly interface for managing iptables firewall rules in Linux. It simplifies the process of configuring firewall rules to allow or deny network traffic based on specific criteria.
8. Explain the purpose of Multi-Factor Authentication (MFA). MFA enhances password-based authentication by requiring users to provide multiple verification factors such as passwords and one-time codes sent to a phone. This reduces the risk of unauthorized access even if the password is stolen.
9. Describe the difference between Type 1 and Type 2 hypervisors. A Type 1 hypervisor (bare metal) runs directly on the hardware, offering better performance, while a Type 2 hypervisor runs on top of an existing operating system. Type 2 hypervisors tend to be easier to install.
10. What is the purpose of Docker containers? Docker containers package applications and their dependencies into portable units that can run consistently across different environments. This ensures that the application behaves the same regardless of the host system.
Answer Key: Short Answer Questions
1. IPv4 uses a 32-bit numerical label while IPv6 uses a 128-bit alphanumeric label. IPv6 was developed to overcome the address limitations of IPv4.
2. A subnet mask is used to divide an IP address into network and host portions, determining how many addresses are available within a network. It also defines which part of the IP address identifies the network and which part identifies the host.
3. The DNS resolution process begins with a query from a client to a DNS resolver, which may recursively query root servers, TLD servers, and authoritative servers until the IP address corresponding to the domain name is found. The resolver then returns the IP address to the client.
4. ifconfig is a legacy tool for network interface configuration while ip is the modern replacement; ifconfig is still in use. ip is part of the IP Route 2 package and offers more comprehensive functionality and features than ifconfig.
5. ping tests the reachability of a host by sending ICMP packets and measuring the round trip time for those packets. This helps identify network connectivity issues and packet loss.
6. traceroute identifies the path a packet takes to reach a destination, including each hop (router) along the way. It also measures the time it takes to reach each hop, helping pinpoint delays or failures.
7. UFW is a user-friendly interface for managing iptables firewall rules in Linux. It simplifies the process of configuring firewall rules to allow or deny network traffic based on specific criteria.
8. MFA enhances password-based authentication by requiring users to provide multiple verification factors such as passwords and one-time codes sent to a phone. This reduces the risk of unauthorized access even if the password is stolen.
9. A Type 1 hypervisor (bare metal) runs directly on the hardware, offering better performance, while a Type 2 hypervisor runs on top of an existing operating system. Type 2 hypervisors tend to be easier to install.
10. Docker containers package applications and their dependencies into portable units that can run consistently across different environments. This ensures that the application behaves the same regardless of the host system.
Essay Format Questions
1. Discuss the evolution of network configuration tools in Linux, comparing and contrasting ifconfig and ip. Explain the advantages of using ip over ifconfig in modern network management.
2. Explain the significance of the Domain Name System (DNS) in the context of network communication. Describe the hierarchy of DNS servers and the steps involved in resolving a domain name to an IP address. What security vulnerabilities are associated with DNS?
3. Analyze the role of firewalls in network security and discuss the advantages and disadvantages of using UFW and IP Tables for managing firewall rules. In what scenarios might an administrator prefer one over the other?
4. Compare and contrast Type 1 and Type 2 hypervisors. Discuss the advantages and disadvantages of each type, providing specific examples of virtualization technologies that fall under each category. In what scenarios would you recommend each type of hypervisor?
5. Explain the benefits of containerization using Docker. Discuss the key Docker commands and concepts, such as Docker images, containers, and Dockerfiles. How do Docker containers improve application deployment and scalability?
Glossary of Key Terms
- IP Address: A unique numerical identifier assigned to each device connected to a network, enabling communication.
- Subnet Mask: A mechanism for dividing an IP address into network and host portions, defining network size.
- DNS (Domain Name System): A hierarchical system that translates domain names into IP addresses.
- Resolver: A DNS server that performs recursive queries to resolve domain names.
- TLD (Top-Level Domain) Server: DNS servers for top-level domains like .com, .org, and .net.
- Authoritative DNS Server: A DNS server that holds the definitive answer for a domain’s DNS records.
- ifconfig: A legacy command-line tool for configuring network interfaces on Linux.
- ip: A modern command-line tool for configuring network interfaces on Linux, part of the IP Route 2 package.
- NMCLI (Network Manager Command Line Interface): A command-line tool for managing network connections in Linux.
- Ping: A network utility used to test the reachability of a host.
- Traceroute: A network utility used to trace the path a packet takes to a destination.
- Netstat: A command-line tool for displaying network connections, routing tables, and interface statistics.
- SS (Socket Statistics): A modern command-line tool that provides similar functionality to netstat.
- UFW (Uncomplicated Firewall): A user-friendly interface for managing firewall rules in Linux.
- IP Tables: A powerful firewall utility in Linux for configuring packet filtering rules.
- AppArmor: A Linux kernel security module that allows administrators to restrict application capabilities.
- MFA (Multi-Factor Authentication): A security measure that requires users to provide multiple verification factors.
- GPG (GNU Privacy Guard): A tool for encrypting and decrypting data using public key cryptography.
- Hypervisor: Software that creates and runs virtual machines (VMs).
- Virtual Machine (VM): A software-based emulation of a physical computer.
- Type 1 Hypervisor: A bare-metal hypervisor that runs directly on the hardware.
- Type 2 Hypervisor: A hosted hypervisor that runs on top of an existing operating system.
- KVM (Kernel-based Virtual Machine): A type 1 hypervisor integrated into the Linux kernel.
- Ver: Command line tool that interacts with KVM.
- VirtualBox: A popular type 2 hypervisor for running virtual machines.
- Containerization: A virtualization method that isolates applications and their dependencies into portable containers.
- Docker: A popular containerization platform for building, shipping, and running applications in containers.
- Image (Docker): An immutable, packaged snapshot of an application and its dependencies.
- Container (Docker): A running instance of a Docker image.
- Libvert: A toolkit providing APIs and management tools for virtualization environments.
- df: Displays disk space usage for file systems.
- du: Displays disk space usage for files and directories.
- Top: Displays a dynamic real-time view of running processes.
- Htop: Displays a dynamic real-time view of running processes with a user-friendly, colorful interface.
- Free: Displays the amount of free and used memory in the system.
- Vmstat: Displays information about virtual memory, system processes, and CPU activity.
- Syslog: A standard protocol for logging system events and messages.
- Chroot: An operation that changes the apparent root directory for the current running process and their children.
- ACL (Access Control List): A list of permissions attached to an object. It specifies which users or groups have access to the object and what operations they are allowed to perform.
Linux System Administration and Networking Fundamentals

Okay, here’s a briefing document summarizing the key concepts and ideas from the provided text, with quotes as requested.

Briefing Document: Networking and System Administration Fundamentals

This document summarizes core concepts and tools related to networking, security, and system administration within a Linux environment. The information is derived from a training series focusing on fundamental principles and practical commands.

I. Networking Fundamentals
- IP Addressing: IP addresses are unique identifiers for devices on a network, enabling communication. “IP addresses are unique identifiers assigned to devices that are connected to a network. they allow you communicate with each other uh and are very important for Network management and communication.”
- IPv4: The original IP addressing scheme, using 32-bit numerical labels. Limited to approximately 4.3 billion unique addresses. Each section can range “from zero and uh go all the way to 254”. The standard notation includes numbers separated by three dots, such as 192.168.1.1.
- Subnet Masks: Defines the network portion and host portion of an IP address. Example: “192.168.1.1 was a subnet subnet mask of 255 2555 2555 so the first three o octets are exactly the same which means that this portion 1921 1681 this first three the numbers right here represent the network and then the last piece would be the actual host”. A subnet mask of 255.255.255.0 means the first three octets represent the network, and the last octet identifies the host.
- Reserved Addresses: Two addresses within a subnet are always reserved: the network address (often .0) and the broadcast address (often .255).
- DNS (Domain Name System): Translates domain names (e.g., google.com) into IP addresses. This process involves a hierarchy of DNS servers.
- The user’s computer sends a DNS query to a “DNS resolver.” The resolver then contacts root DNS servers.
- “The resolver will contact one of the root DNS servers and these are at the top of the DNS hierarchy so these are the actuals and the orgs and the Nets.”
- Root servers direct the query to the appropriate Top-Level Domain (TLD) server (e.g., .com, .org). The TLD server then finds the IP address.
- “the authoritative server is going to be at this particular location so that gets sent back to the local DNS server”.
- The DNS resolver receives the IP address and provides it to the user’s computer, loading the website.
- DHCP (Dynamic Host Configuration Protocol): Automatically assigns IP addresses to devices on a network.
II. Network Interface Configuration
- ifconfig (Interface Configuration): A command-line utility used to configure network interfaces on Unix-based systems (Linux, macOS). Allows viewing and assigning IP addresses, controlling interface states (up/down).
- Despite being “deprecated supposedly,” ifconfig remains in use on some systems. The command ifconfig without arguments lists all network interfaces and their configurations.
- “The simplest version of the command is to just type if config and press enter and it’ll lists all the network interfaces that are on your system along with all of the current configurations meaning the IP addresses that are assigned to them if there are any network masks or broadcast addresses and everything else that would be appropriate for that particular configuration”
- ip (IP Route2): The modern replacement for ifconfig. Provides similar functionality for managing network interfaces. IP space a or IP space address displays network interfaces and details. The command ip a will produce a result very simliar to ifconfig
- nmcli (NetworkManager Command-Line Interface): A command-line tool for managing network connections on Linux.
- nmcli connection up <interface>/nmcli connection down <interface>: Activates or deactivates a network interface.
- nmcli device status: Displays the status of network devices (connected, disconnected, unavailable).
- nmcli device wifi list: Lists available Wi-Fi networks, including SSIDs, signal strength, and security type.
- nmcli device wifi connect <SSID> password <password>: Connects to a Wi-Fi network.
III. Network Troubleshooting Tools
- ping: Tests the reachability of a host (computer or server) by sending ICMP packets. Measures round-trip time.
- “you basically ping the IP address or you ping the website and you can also measure the round trip time for the messages that are sent to that to just uh establish how strong the connection is or how quick that particular host is uh to respond to you”
- The -c option specifies the number of packets to send.
- The -i option sets the interval between packets.
- The -f option floods the target with packets.
- traceroute: Tracks the route a packet takes to reach a destination by incrementing the “time to live” (TTL) value. Helps identify delays or failures along the route.
- The -m option specifies the maximum number of hops.
- The -p option sets the packet size.
- netstat (Network Statistics): Displays network-related information, including connections, routing tables, and interface statistics.
- Options include: -t (TCP ports), -u (UDP ports), -l (listening ports), and -n (numerical addresses).
- ss (Socket Statistics): A modern alternative to netstat, offering better performance and more detailed output. Part of the IP Route2 suite.
- Options are very similar to netstat, such as displaying TCP, UDP, and listening ports
IV. Firewall Management
- ufw (Uncomplicated Firewall): A user-friendly command-line interface for managing iptables firewall rules.
- sudo ufw enable: Activates the firewall.
- sudo ufw disable: Deactivates the firewall.
- sudo ufw allow <service>: Allows traffic for a specific service (e.g., SSH).
- sudo ufw deny <port>: Blocks traffic on a specific port.
- sudo ufw status: Shows the current firewall status and active rules.
- sudo ufw allow from <IP address> to any port <port>: Allows traffic from a specific IP address to a specific port.
- sudo ufw logging on/off: Enables or disables firewall logging.
- sudo ufw allow in: Allows all incoming traffic.
- sudo ufw deny in: Denies all incoming traffic.
- sudo ufw allow out: Allows all outgoing traffic.
- sudo ufw deny out: Denies all outgoing traffic.
- iptables: A more complex, low-level firewall management tool.
- Uses chains (INPUT, OUTPUT, FORWARD) to define packet filtering rules.
- sudo iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE: Enables NAT masquerading, hiding internal IP addresses. “it’ll change the source IP address when it gets sent out to the world to whatever the masquerad the disguised IP address would be”.
- The mangle table allows for packet alteration, such as changing the type of service.
- sudo iptables -L: Lists the current rules for the filter table.
V. Security Enhancements
- Chroot Jails: Creates an isolated environment for a process, limiting its access to the file system.
- “effectively you’re isolating a subset of the file system and you create what’s known as the chroot jail”.
- Enhances security by restricting the damage caused by untrusted programs.
- Useful for testing, development, and system recovery.
- Steps include creating a directory, populating it with necessary binaries and libraries, and using the chroot command.
- File Permissions and Ownership: Controls access to files and directories based on user, group, and others.
- Permissions: Read (r), Write (w), Execute (x). Numerical values: r=4, w=2, x=1.
- chmod: Command to change file permissions. Can use symbolic notation (e.g., chmod u+rwx file.txt) or numerical notation.
- chown: Command to change file ownership.
- Access Control Lists (ACLs): Provides fine-grained control over file and directory permissions, allowing specific access levels for multiple users and groups.
- “Access Control lists are a way to provide more more fine grained control over file and directory permissions”.
- setfacl: Sets ACL entries. Options include -m (modify), -x (remove specific entry), -b (remove all entries), and -d (designate directory).
- getfacl: Views ACL entries.
- AppArmor: A security module that confines programs to a limited set of resources.
- “AppArmor is a security module it’s actually installed natively in ubuntu it enhances the security of an application or a set of applications it works by creating these profiles that will confine the action of the application or the group of the applications you’re protecting to that profile”.
- Modes: Enforce (blocks unauthorized access) and Complain (allows access but logs it).
- aa-status: Displays the current AppArmor status.
- aa-enforce: Sets a profile to enforcing mode.
- aa-complain: Sets a profile to complain mode.
- Password Security: Strong passwords are crucial. Multi-factor authentication (MFA) enhances password security.
VI. Encryption
- GPG (GNU Privacy Guard): A versatile tool for securing files and communications using public and private key pairs.
- “it’s a very versatile tool for securing files and Communications using public and private key pairs”.
- Commands include:
- gpg –gen-key: Generates a new key pair.
- gpg -e -r <recipient> <filename>: Encrypts a file for a specific recipient.
- gpg -d <filename.gpg>: Decrypts a file.
- gpg –import <public_key_file>: Imports a public key into the key ring.
- gpg –export -a <user_id> > <public_key_file>: Exports a public key to a file.
- gpg –list-keys: Lists the keys in the key ring.
- SCP (Secure Copy Protocol): Securely copies files between systems. Uses SSH for encryption.
- “securely copies files between a local and a remote machine or between two remote machines”
- scp <source> <destination>: Copies files.
VII. System Monitoring and Troubleshooting
- Log Files: Crucial for system administration and troubleshooting. Located in the /var/log directory.
- syslog (Debian-based): General system log.
- messages (Red Hat-based): General system log.
- auth.log: Authentication events.
- secure (Red Hat-based): Security-related events.
- dmesg: Kernel-related messages.
- Use tail -f <logfile> to monitor logs in real-time.
- Disk Usage Analysis and Cleanup:df: Displays information about available and used disk space. The -h option provides human-readable output.
- du: Estimates and displays disk space used by files and directories. The -sh option provides a summary in human-readable format.
- Process Monitoring:top: Displays a dynamic real-time view of running processes. Allows sorting by CPU usage or memory usage.
- htop: An enhanced version of top with a more user-friendly interface.
- Memory Management:free: Displays the amount of free and used memory in the system.
- “provides human readable output which essentially gives you the measurements in what it thinks are the best measurement”.
- watch -n 1 free -h: Monitors memory usage in real-time.
- System Statistics:vmstat: Reports virtual memory statistics, including memory usage, CPU performance, and I/O operations.
VIII. Virtualization and Cloud Computing
- Virtualization: Enables running multiple virtual machines on a single physical machine.
- “Virtual machines are basically simulations of physical computers”.
- Hypervisors: Software or firmware that creates and manages virtual machines.
- Type 1 (Bare-Metal): Runs directly on the hardware. Examples: VMware ESXi, Microsoft Hyper-V, Xen.
- Type 2 (Hosted): Runs on top of an existing operating system. Examples: VirtualBox, VMware Workstation.
- KVM (Kernel-based Virtual Machine): A type 1 hypervisor integrated into the Linux kernel.
- virsh: Command-line tool for managing KVM virtual machines.
- virsh start <VM name>: Starts a virtual machine.
- virsh list –all: Lists all virtual machines.
- virsh shutdown <VM name>: Shut down a virtual machine.
- VirtualBox: A popular type 2 hypervisor.
- “commonly used for testing and deploying environments”.
- vboxmanage: Command-line interface for managing VirtualBox VMs.
- vboxmanage startvm <VM name>: Starts a virtual machine.
- vboxmanage list vms: Lists all virtual machines.
- vboxmanage controlvm <VM name> poweroff: Powers off a virtual machine.
- Containers (Docker): Package applications and their dependencies into portable containers.
- docker: Containerization tool.
- docker run <image>: Runs a container.
- docker ps: Lists running containers.
- docker stop <container_id>: Stops a container.
- docker rm <container_id>: Removes a container.
- Cloud Computing: Provides on-demand access to computing resources (servers, storage, databases, etc.) over the internet. Types: IaaS, PaaS, SaaS.
- “IaaS is one version of what they would provide for you which is access to the infrastructure that you would otherwise maintain if you weren’t using the cloud”.
- “PaaS would be the service to develop all the platforms”.
- “SaaS which is software as a service that we’re going to provide this software or an email or anything like that on demand”.
I hope this is helpful! Let me know if you’d like me to elaborate on any of these points.

Networking Fundamentals and Security: FAQ

FAQ on Networking Fundamentals and Security

1. What is an IP address, and why is it important?

An IP (Internet Protocol) address is a unique numerical identifier assigned to every device connected to a network. It enables devices to communicate with each other and is crucial for network management and communication. IPv4, the original version, uses a 32-bit numerical label format (e.g., 192.168.1.1), while IPv6 was developed to address the limitations of IPv4’s address space.

2. What is a subnet mask, and how does it relate to IP addressing?

A subnet mask is used to divide an IP address into network and host portions. For example, a subnet mask of 255.255.255.0 indicates that the first three octets of the IP address represent the network, while the last octet identifies the host within that network. Different subnet masks allow for varying numbers of hosts within a network. Two addresses are reserved in each subnet for the network address (usually the first address) and the broadcast address (usually the last address).

3. What is DNS, and how does it work to resolve domain names to IP addresses?

DNS (Domain Name System) is a hierarchical system that translates human-readable domain names (like google.com) into IP addresses that computers use to communicate. When you type a domain name into your browser, your computer sends a query to a DNS resolver, which may then contact root DNS servers, top-level domain (TLD) servers (like .com or .org), and authoritative DNS servers to find the corresponding IP address. This process, although complex, happens very quickly in the background.

4. What are ifconfig and ip, and how are they used to manage network interfaces?

ifconfig (interface configuration) is a command-line utility used to configure network interfaces on Unix-based operating systems. It allows you to view interface configurations, assign IP addresses, and control the state of interfaces. The ip command, part of the iproute2 package, is intended as a modern replacement for ifconfig, offering similar functionalities with a different command syntax. Examples of using the ip command are ip a or ip addr.

5. How can nmcli be used to manage network connections in Linux?

nmcli (NetworkManager Command Line Interface) provides a powerful command-line interface for managing network connections on Linux systems. It allows you to view and modify connections, assign static IP addresses, control connection states (up/down), and manage Wi-Fi networks. For instance, you can use nmcli device wifi list to see available Wi-Fi networks and nmcli connection up <connection_name> to activate a connection.

6. How do the ping and traceroute commands help in troubleshooting network connectivity issues?
- ping tests the reachability of a host by sending ICMP packets and measuring the round-trip time. It can help determine if a host is online and how reliable the connection is.
- traceroute tracks the route packets take to reach a destination, identifying the intermediate routers and delays along the path. This helps pinpoint where connectivity issues or delays occur.
7. What are firewalls, and how do tools like ufw and iptables contribute to network security?

Firewalls act as a barrier between a network and the outside world, controlling incoming and outgoing traffic based on configured rules. * ufw (Uncomplicated Firewall) is a user-friendly front-end for managing iptables rules, making it easier to set up basic firewall configurations. Examples include sudo ufw allow SSH and sudo ufw deny 80. * iptables is a more complex command-line tool that provides direct control over the Linux kernel’s packet filtering capabilities. It allows for highly customized firewall rules.

8. What is a chroot jail, and how does it enhance system security?

A chroot jail is an isolated environment created by changing the root directory for a process and its children. This limits the access of that process to a specific subset of the file system, enhancing security by preventing compromised programs from accessing or modifying files outside the jail. It’s useful for testing software in a controlled environment or repairing a system from a rescue environment.

Network Security: UFW, IP Tables, SELinux, and Best Practices

Network security is crucial, requiring firewalls to act as barriers between internal and external networks by monitoring and controlling traffic based on established rules. Important tools for network security include Uncomplicated Firewall (UFW) and IP tables.

Uncomplicated Firewall (UFW)
- It is a simple but powerful firewall with an easy syntax.
- To activate, use the command sudo ufw enable.
- Traffic can be allowed or denied by specifying incoming or outgoing along with the port. For example, sudo ufw allow in port 22 allows incoming traffic on port 22 (SSH), while sudo ufw deny out port 80 denies outgoing HTTP traffic on port 80.
- To check the status and active rules, use ufw status.
- Traffic can be allowed from specific IP addresses using ufw allow from [IP address] to any port 22.
IP Tables
- It is a more complex tool that allows detailed control over the network and enables creation of complex rules for packet filtering and network address translation.
- To view current rules, use sudo IP tables -L. The default table is the filter table, displaying input, forward, and output chains.
- To add a rule, the command is IP tables -A INPUT -p tcp -dport 22 -j ACCEPT to allow TCP traffic on destination port 22. To block traffic, use IP tables -A INPUT -p tcp -dport 80 -j DROP.
- To save rules, use iptables-save > /etc/iptables/rules.v4. To restore rules, use iptables-restore < /etc/iptables/rules.v4.
SELinux (Security-Enhanced Linux) is a security module in the kernel that provides access control policies. SELinux defines rules for processes and users accessing resources, enforcing strict policies. Its modes of operation include enforcing (blocks violations), permissive (logs violations), and disabled. Common commands include SE status to view the status and setenforce 1 to enable enforcing mode, or setenforce 0 for permissive mode.

App Armor is another security mechanism that uses application-specific profiles for access control. Commands include aa-status to get the status of App Armor, and aa-enforce to enforce a profile for a specific application.

Additional points on network security:
- Changing the default SSH port (Port 22) can reduce the risk of automated brute force attacks. This is done in the sshd configuration file.
- Disabling root login forces attackers to log in as standard users and escalate privileges. This is configured in the sshd configuration file by setting permit root login no.
- Limiting SSH users involves whitelisting specific users who can log in via SSH by using the allow users parameter. The SSH service must be restarted to apply configuration changes.
- GPG (GNU Privacy Guard) is used for encrypting data. It uses asymmetric encryption with public and private key pairs.
- Secure file transfer can be achieved with SCP (secure file copy) or SFTP (secure file transfer protocol). SCP securely copies files between hosts.
- Analyzing authentication logs can reveal unauthorized access attempts. Key log files include auth.log and secure log.
- rsync can be used to back up data, including syncing over SSH for secure transfers.
Linux File Permissions and Access Control

File permissions are essential for system security, dictating who can access and modify files and directories. Understanding and managing these permissions ensures that sensitive data remains protected and that only authorized users can make changes.

Levels of File Permissions
- Categories: Permissions are assigned based on three categories: the owner (a specific user), the group, and others.
- Permissions: Each category has three types of permissions: read (r), write , and execute (x). Read permission allows users to view the file’s contents, write permission allows modification, and execute permission allows running a file or entering a directory.
- Numerical Values: Each permission has a numerical value: read is 4, write is 2, and execute is 1. These values are combined to represent the total permissions for each category. For example, read and execute (4+1) would be 5.
Commands to Change Permissions
- chmod (Change Mode): This command is used to change the permissions of a file or directory. It can be used in two ways:
- Symbolic Mode: Uses symbols like r, w, and x to add or remove permissions. For example, chmod u+rwx,g+rx,o+rx file.txt gives the owner read, write, and execute permissions, and the group and others read and execute permissions.
- Numerical Mode: Uses numerical values to set permissions. For example, chmod 755 file.txt gives the owner read, write, and execute permissions (7), and the group and others read and execute permissions (5 each).
- chown (Change Owner): This command changes the ownership of a file or directory. For example, chown user:group file.txt changes the owner to “user” and the group to “group”.
- chgrp (Change Group): This command changes the group ownership of a file or directory. For example, chgrp group file.txt changes the group owner to “group”.
Access Control Lists (ACLs)
- ACLs provide a more fine-grained control over file and directory permissions, allowing definition of permissions for multiple users and groups on a single file or directory.
- Entries: Each ACL entry specifies permissions for a user or group, consisting of the type (user or group), an identifier (username or group name), and the permissions.
- Types of ACLs:User ACL: Specifies permissions for a specific user.
- Group ACL: Specifies permissions for a specific group.
- Mask ACL: Defines the maximum effective permissions for users and groups other than the owner.
- Default ACL: Specifies the default permissions inherited by new files and directories created within a directory.
- Commands:setfacl (Set File ACL): Sets the ACL for a file or directory. For example, setfacl -m u:user:rwx file.txt adds read, write, and execute permissions for the user “user” on “file.txt”.
- getfacl (Get File ACL): Displays the ACL entries for a specified file, showing all users and groups with their defined permissions.
Removing ACL Entries
- -x option: Removes a specific user or group entry from the ACL. For example, setfacl -x u:user file.txt removes the ACL entry for the user “user”.
- -b option: Removes all ACL entries from a file or directory. For directories, the -d option is used in conjunction with -b to remove default ACL entries.
By understanding and utilizing these commands, file permissions and access control lists (ACLs) can be effectively managed to maintain a secure and well-organized Linux system.

User Authentication Methods: Password, MFA, and Public Key

User authentication involves methods to verify the identity of a user trying to access a system or application. Common methods include password-based authentication, multi-factor authentication (MFA), and public key authentication.

Password-Based Authentication
- This is the default method where users enter a username and password to gain access.
- To improve security, password-based authentication can be enhanced with Multi-Factor Authentication (MFA).
Multi-Factor Authentication (MFA)
- MFA adds an extra layer of security by requiring users to provide multiple verification factors.
- This often includes sending a code to a user’s phone or email, or using biometric methods like fingerprint or face scans.
- MFA reduces the risk of unauthorized access, even if an attacker obtains the user’s password.
Public Key Authentication
- This method uses a key pair consisting of a private key and a public key.
- The private key is kept secret by the user, while the public key is placed on the server.
- Public key authentication is more secure than password-based authentication and is not subject to brute force attacks.
- It allows for automated, passwordless logins, which are useful for scripts and applications.
- To generate a key pair, the command SSH key gen is used.
- After running SSH key gen, a file path to save the key is required, and a passphrase can be set for additional security.
Key Transfer and Authentication
- To enable passwordless access, the public key must be transferred to the authorized Keys file on the server.
- The user must authenticate themselves with a password at some point before transferring the key.
- Without initial password authentication, the system will not trust the user to transfer the key.
System Monitoring with top, htop, free, and vmstat

System monitoring is crucial for maintaining system performance and troubleshooting issues. Key tools for this purpose include top, htop, free, and vmstat.

**top**
- Provides a dynamic, real-time view of running processes and their resource usage.
- Displays CPU usage, memory usage, and process IDs (PIDs).
- To sort by CPU usage, press P while top is running.
- To sort by memory usage, press M.
- To quit, press Q.
**htop**
- It is a user-friendly alternative to top with enhanced features and an intuitive interface.
- Offers interactive process management and color-coded output.
- Can use function keys (F1-F12) or keyboard shortcuts for navigation.
- F3 key can be used to search for processes.
- F9 key can be used to kill a process.
- To quit, press Q or F10.
**free**
- Displays information about the system’s memory usage, including physical memory and swap space.
- The command free -h formats the output in a human-readable format (KB, MB, GB).
- Shows the total, used, free, shared, buffer, and cached memory.
- To monitor memory usage in real-time, use watch -n 1 free -h.
- Detailed memory information can be obtained from the /proc/meminfo file.
**vmstat**
- Virtual memory statistics (vmstat) monitors system performance, providing statistics on CPU, memory, and I/O operations.
- The basic command is vmstat 1 5, where the first number is the update interval in seconds, and the second is the number of iterations.
- Key fields in the output include processes (runnable and blocked), memory (swap, free, buffer, cache), swap (in and out), I/O (blocks received and sent), system (interrupts and context switches), and CPU usage (user, system, idle, wait, stolen).
- The st field refers to the CPU steal time, which is the percentage of time a virtual CPU is waiting for resources because the hypervisor is allocating resources to another VM.
- Running vmstat 1 updates data every second until interrupted, while vmstat provides a single snapshot.
These tools provide different perspectives and can be used together to get a comprehensive understanding of system performance.

Virtualization, Cloud Computing, and Containerization Technologies Overview

Virtualization is a technology that allows multiple virtual machines to run on a single physical machine, improving resource use and providing isolated environments. Key concepts include virtual machines and hypervisors.

Virtual Machines (VMs)
- VMs are software-based simulations of physical computers, each running its own operating system and applications independently of others on the same physical host.
- VMs offer isolation, so a failure in one VM does not affect others.
Hypervisors
- A hypervisor is software or firmware that creates, manages, and deploys virtual machines, allocating resources to each.
- There are two types of hypervisors:
- Type 1 (Bare Metal): Runs directly on the physical hardware without needing a host operating system, common in enterprise environments for high performance. Examples include VMware ESXi, Microsoft Hyper-V, and Xen.
- Type 2 (Hosted): Runs on top of an existing operating system. It uses the host’s resources and is suited for desktop virtualization and smaller environments. Examples include VirtualBox, VMware Workstation, and Parallels Desktop.
Advantages of Virtualization:
- Resource Efficiency and Scalability: Virtualization allows efficient use of physical resources and easy scaling up or down based on needs.
- Isolation and Security: Each VM operates independently, isolating it from other VMs on the network. A compromised VM does not affect the rest of the network.
- Flexibility and Agility: Enables easy testing, deployment, and development in isolated environments. New virtual machines can be quickly deployed.
- Disaster Recovery: Simplifies backups and recovery by storing entire virtualized environments that can be easily accessed and restored, especially with redundancies in place.
Kernel-Based Virtual Machine (KVM)
- KVM is a type 1 hypervisor integrated into the Linux kernel, transforming the OS into a virtualization host.
- It leverages Linux features for memory management, process scheduling, and I/O handling.
- KVM supports hardware-assisted virtualization via Intel VT or AMD-V technology.
- The virsh command-line tool manages KVM-based VMs. Common virsh commands include virsh start to start a VM, virsh list to list running VMs, and virsh shutdown to shut down a VM.
VirtualBox
- VirtualBox is a type 2 hypervisor developed by Oracle, compatible with various operating systems like Linux, Windows, and macOS.
- It offers an easy-to-use GUI and command-line interface for managing VMs.
- Key features include snapshot functionality for backups and guest additions to enhance performance.
- VBoxManage is the command-line interface for VirtualBox, with commands like VBoxManage startvm to start a VM, VBoxManage list vms to list VMs, and VBoxManage controlvm to control VMs.
Cloud Computing Cloud computing provides on-demand access to computing resources over the Internet, including servers, storage, databases, and software. It allows users to provision and manage these resources easily.

Cloud Service Models:
- Infrastructure as a Service (IaaS): Provides virtualized hardware resources like virtual machines, storage, and networks. Users deploy and manage operating systems, applications, and development environments. Examples include AWS EC2, Microsoft Azure Virtual Machines, and Google Compute Engine.
- Platform as a Service (PaaS): Offers a development and deployment environment in the cloud, including tools and services to build, test, deploy, and manage applications without managing the underlying infrastructure. Examples include AWS Elastic Beanstalk, Google App Engine, and Microsoft Azure App Service.
- Software as a Service (SaaS): Delivers applications over the Internet on a subscription basis. Users access these applications via a web browser without needing to install or maintain anything. Examples include Microsoft Office 365, Google Workspace, and Salesforce.
Advantages of Cloud Computing:
- Scalability: Easily scale resources up or down based on demand.
- Cost Efficiency: Reduces upfront costs by eliminating the need for physical hardware.
- Flexibility and Accessibility: Access services from anywhere with an internet connection.
- Reliability and Availability: Redundant locations ensure high availability and reliability.
- Disaster Recovery: Scheduled backups prevent data loss.
- Automatic Updates: Services are automatically updated without user intervention.
Major Cloud Providers:
- Amazon Web Services (AWS): Offers a wide range of services, including computing power, storage, and networking.
- Microsoft Azure: Provides seamless integration with Microsoft products and a variety of cloud services.
- Google Cloud: Known for capabilities in data analytics and machine learning, with a robust set of cloud services.
Containerization Containerization involves packaging applications and their dependencies into portable containers that run consistently across different environments. Docker is a popular containerization tool.

Containers vs. Virtual Machines:
- Containers share the host operating system’s kernel, making them lightweight and fast to start.
- They include everything needed to run the application (code, runtime, etc.) but do not include an OS.
- Virtual machines, in contrast, run on a hypervisor and include a full operating system.
- Docker containers run on anything supporting Docker, ensuring consistency across development, testing, and production environments.
Benefits of Containers:
- Efficiency: Lightweight and use fewer resources.
- Scalability: Easily scale up or down based on demand.
- Portability: Can be transferred and run on various operating systems.
- Isolation: Multiple applications can run on the same host without interfering with each other.
Basic Docker Commands:
- docker run -it <image_name>: Runs a container image interactively.
- docker ps: Lists running containers.
- docker stop <container_id>: Stops a running container.
- docker pull <image_name>: Downloads a Docker image from Docker Hub.
Container Orchestration Container orchestration tools automate the deployment, scaling, and management of containerized applications.
- Kubernetes (K8s): An open-source platform for automating deployment, scaling, and management of containerized applications. Key features include automated deployment and scaling, load balancing, self-healing, and secure management of sensitive information.
- Example commands: kubectl create deployment nginx –image=nginx to create a deployment and kubectl scale deployment nginx –replicas=3 to scale the deployment.
- Docker Swarm: Docker’s native clustering and orchestration tool, simpler than Kubernetes. It offers simplified setup, scaling, load balancing, and secure communication between nodes.
- Example commands: docker swarm init to initialize swarm, docker service create –name web –replicas=3 -p 80:80 nginx to create a service, and docker service ls to list services.
Virtual Machine Management (libvirt)
- Libvirt is a toolkit with an API for interacting with VMs across different virtualization platforms like KVM, Xen, and VMware.
- It provides a unified API for managing VMs across different hypervisors, simplifying VM management.
- Key features include virsh for management and virt-install for creating new VMs.
Common libvirt Commands:
- virt-install: Installs a new virtual machine.
- Example: virt-install –name myubuntuvm –memory 2048 –vcpus 2 –disk path=/var/lib/libvirt/images/myubuntuvm.qcow2,size=20 –os-variant ubuntu20.04.
- virsh destroy: Forcibly stops a specified VM.
- Example: virsh destroy myubuntuvm.
- virsh list –all: Lists all VMs managed by libvirt.
Full Linux+ (XK0-005 – 2024) Course Pt.2 | Linux+ Training

The Original Text

this training series is sponsored by hackaholic Anonymous to get the supporting materials for this series like the 900 page slideshow the 200 Page notes document and all of the pre-made shell scripts consider joining the agent tier of hackolo anonymous you’ll also get monthly python automations exclusive content and direct access to me via Discord join hack alic Anonymous today okay now it is time to talk about networking and the fundamentals of networking again this is not going to be a replacement for Network Plus or anything like that but it will be fairly comprehensive and we’re going to go through a lot of the fundamentals as well as some of the commands and tools that you will need to uh navigate the network and the network connections and the interfaces of a Linux environment so first and foremost let’s go into some basic networking Concepts IP addressing so IP addresses are unique identifiers assigned to devices that are connected to a network they allow you communicate with each other uh and are very important for Network management and communication uh so anything when you hear something like a network or anytime that you hear the word network uh think IP addresses and uh IP addresses are very much the main uh the I mean address the main identifiers that uh are assigned through various devices so your TV has an IP address your phone will have an IP address address your computer obviously will have an IP address um anything that’s connected to a network anything that’s connected to the internet will have an IP address ipv4 is the original IP address and it’s a 32bit numerical label uh what you see right here in the green right here that’s traditionally what it looks like it’s separated by three dots and it has three digits or it can have up to three digits on each portion of this thing and it can go from uh one actually Zer it can go from zero and uh go all the way to 254 I want to say um we’ll verify that in a couple of slides uh so what it does is it provides approximately 4.3 billion unique addresses but then what happened is that a lot of devices were developed so you know the average household has or the average person even has multiple uh devices that are connected to the internet and quickly uh way faster than I think people anticipated uh the ipv four addresses ran out um but what happens is that each a uh ISP each internet service provider assigns a series of private IP addresses to each individual person and uh for the most part you will not run across a a duplicate IP address although being that there’s only 4.3 billion unique variations uh it can run across so it can actually have duplicates and that’s one of the issues that resulted in them developing IPv6 so uh ipv4 is the most commonly used version of Ip but because of the fact that there were so many devices they developed a new uh IPv6 format and the IPv6 format looks very different from what we saw previously it is 128bit whereas ipv4 is 32bit so when you have a 128bit identifier it obviously looks a little bit different so in this particular case this is a s Le of an IPv6 address and instead of offering the billions this actually offers 340 unilan addresses and I don’t I I had not even heard of this word prior to looking at IPv6 IP addresses um clearly it is way more than what is available with ipv4 so it’s designed to replace ipv4 eventually but your current computer my current computer uh they have both of these so they’ll actually have the ipv4 as well as the IPv6 But ultimately at some point not exactly sure when IPv6 will replace ipv4 IP addresses to understand networking and IP addresses you also need to understand subnetting so what a subnet is and subnetting is a method that’s used to divide a larger Network into smaller chunks so that they’re easier to manage um and those small chunks are called subnet and it improves the network organization uh the efficiency of the network and the security of the network it also helps to reduce the congestion of the network meaning that there won’t be uh too many things happening at the same time it won’t be blocked off or clogged so to speak um a subnet mask is what you can see as an example in this particular case in the green right here so subnet mask determine the network and the host portions of the IP address so for ipv4 a common one is what you see here now the network portion are these first series of 255s and then the host portion would be this very last thing that is represented by a zero here so in this particular case we have three octets that represent the network portion so that’s one octet that’s another octet that’s another octet Al together you have four octets and four time octet so octet repres repr is8 bits right so when you have 4 * 8 you have a total of 32 so this is a total of 32 bits now when you have the first three octets that are represented by the network that means that these are the network itself so in uh in this particular Network these three portions are going to look exactly the same and each device is going to have a different number at the very end of it so the first three portions will be exactly the same because that’ll be that Network work that they’re connected to and then that last bit is what’s going to change that last octet is what’s going to change to assign an unique identifier for each one of those devices so for example in this particular case we have you know 1 192 16811 was a subnet subnet mask of 255 2555 2555 so the first three o octets are exactly the same which means that this portion 1921 1681 this first three the numbers right here represent the network and then the last piece would be the actual host and then if there’s three hosts that presumably it would be one 2 and three so 1 1921 16811 1 1921 16812 1 1921 16813 so on and so forth so this is the uh subnet that is represented with this mask right here we have this subnet mask now if we wanted to expand this this is what it actually looks like right here so the binary representation is we have eight ones here8 ones here 8 ones here and then we have the zeros at the end representing the portion that can change and this is the subnet mask right here uh the network bits in this case are 24 so you have 8 * 3 which would be 24 the host bits would be eight this last eight uh octet or this eight bits right here the calculation it’s a little bit complicated but not really so you have the number of the hosts per subnet which would be two to the power of the number of host bits so the number of host bits in this case would be eight so 2 to the power of 8 is what we see here minus 2 and I’ll explain what this means right here but 2 to the^ of 8 minus 2 would be 256 so 2 to ^ of 8 would be 256 minus 2 which ends up being 254 so this particular subnet can have 254 individual IP addresses okay so 254 hosts can reside on this particular subnet mask that’s how it breaks down now we look at the next sample of this where you have a subnet mask that actually has the first two octets reserved for the network and then you have the next two octets reserved for the host so you have 16 bits this portion 2 * 8 is 16 bits you have these 16 bits reserved for the network and then you have these 16 bits reserved for the host and when we actually do the calculation here it would be 2 to the^ of 16 which is 65,536 65,536 potential host except you have to subtract that two so it ends up being 65534 so very different from this 254 that is on this one octet right here if you just free up two of these octets for this particular particular subnet mask you now have 65,535 potential host IP addresses that you can assign to people inside of this particular subnet or this particular Network right so this is what it looks like now why do we subtract two this is a very important question so there are two addresses that we need to reserve for any given Network and the first one is the network address so the first address in the subnet which is reserved for the network itself which ends up being represented by just it could be the zero right for example and then the next one is the broadcast address which is the last address that’s represented in the network which would be technically the 256 for example or 255 excuse me so the zero and then the 255 for example would be the ones that are reserved so you actually can go up to 254 right so it go from 1 to 254 for as an example but for the most part and you can assign this when you’re actually assigning your gateways and you’re uh developing your network subnet and The Mask itself you will assign your network address and then you will assign your broadcast address and then that will take up two of those variations and then the remaining uh 253 of the variations will allow for the rest of the devices the rest of the hosts that can reside on that subnet so as a summary we can have have this subnet that allows for 254 potential hosts to reside on it and then you can have this subnet that can allow for 65,535 potential hosts so uh this would obviously be a company this would be probably a home or a company that has less than 254 devices and being that each employee might get let’s say I don’t know their cell phone would be one their work cell phone would be another one their computer would be one maybe an IP address or iPad or something like that a tablet that would be one their TVs that would be uh if they have any Smart TVs sprinkled around the office so on and so forth so for the most part you’re going to have maybe around 50 employees that can reside in a network that like that looks like this but anything more than that they would have a larger subnet because there’s just multiple devices per employee per person and so you just need more than 254 potential addresses and this is how subnet masks and the calculations of those hosts work when a device connects to a network it is assigned an IP address this address can either be ipv4 IPv6 depending on the Network’s configuration subnetting helps organize the network by breaking it into smaller segments making it easier to manage and enhance security by isolating different parts of the network from each other so this is the whole purpose behind it number one to make it easier to manage and enhance the security by isolating different parts of the network from each other so uh when you have multiple subnets it makes it easier for you to find out which host was uh for example uh exploited and which host was hacked into and because they reside in their own little segment they won’t affect the rest of your network and it can be contained uh within that specific segment it can be isolated within that segment the domain name system is the next level up when it comes down to addressing so it’s basically the phone book of the internet typically domain names are related to actual websites but each website actually has an IP address as well so what happens is that most people aren’t going to remember the IP address for the website so you the IP address for Google for example is whatever the series of numbers are but you’re not going to remember that but you will remember Google so you can remember example.com in this case but example.com actually is pointing to an actual IP address right so even websites have IP addresses web servers web applications all of these things actually have IP addresses if they are connected to the internet but for the most part people aren’t going to remember the IP address most people aren’t that good with numbers but they are fairly good with names everybody can remember facebook.com they won’t remember the Facebook IP address but they’ll remember the version of that IP address which would be their domain name and so the do domain name will go through the DNS uh Port as well as the domain name system protocol right and so this is how IP addresses get translated into domain names now how DNS works is that when you type something into your browser the computer needs to find the IP address that’s connected to that actual name the domain name so because they’re most uh easier for us to remember and we enter that the computer how works with the individual IP addresses so it needs to resolve that domain name to some kind of an IP address so what it does is that it’ll send out a query right so it’ll looks like this right you will put your uh input as the user you’ll enter a domain name and that’ll be example.com and you’ll put that into the browser you press enter and then the resolver of the DNS in the computer sends a query right the computer will send send the query to the DNS resolver the resolver is typically provided by the ISP the internet service provider and then the if the resolver doesn’t actually have that in its cache so if you don’t have it stored in your cache or if the ISP doesn’t have it stored in its cache then it performs an actual lookup and it finds it for you and then it stores it right so it’ll query multiple DNS servers to find the correct IP address and then once it finds that IP address address it will resolve it and it’ll connect that IP address to the domain name so that the next time that you go and enter that name it’ll just load it for you quickly if you clear your cache then it’ll perform that process all over again the next time that it goes through the search for the DNS resolver right so you enter something the computer will send that query to the DNS resolver that’s typically with the internet service provider so AT&T for example it’ll send it to AT&T’s DNS resolver and the AT&T’s DNS resolver will go through its database of all of the IP addresses that are connected to domain names and if it has it in there it’ll just send it and uh it’ll send you to the website that you’re trying to go to if it doesn’t have it it does a recursive look up and it try to find what that IP address is connected to or what that domain name is connected to and then it’ll load that IP address onto your browser all of this happens in a matter of a second or two depending on how fast your internet services so all of these things happen very very quickly um and you don’t really see them in the foreground all you see is you typed in the IP address you pressed enter a second or two later a website loads but this is what’s happening in the background the DNS server hierarchy um is connected via the uh the resolver itself so the resolver will contact one of the root DNS servers and these are at the top of the DNS hierarchy so these are the actuals and the orgs and the Nets so these servers are at the top of the DNS hierarchy and they direct the query that DNS query that you made to the appropriate top level domain server which would be any of these guys so if it’s a website it’ll go to that domain server and then from that list of domain names it’ll find the IP address if it’s a.org it’ll go to that server and from that list will pull whatever it is right so there are individual uh servers typically because the fact that there are literally probably billions of domain names at this point as well so there’s a bunch of different dos. org.net now there’s doco and. us. coffee there’s a bunch of different top level domain names now so because of the fact that there is so many different variations each one of them are going to have their own server and then when you send that query that query is going to go to the appropriate server so that it can find the domain names and this is mainly to make the process a little bit faster if they were housing doc.org domnet and every other top level domain name if they were housing all of those things on one server at AT&T it would probably take a much longer amount of time for that domain name to be pulled up and for that IP address to be found so the TLD DNS server um once they’re contacted if you’re looking for example.com it would find the cont I already explained all this so I’m just going to I’m just going to Breeze through this particular portion I got a little bit ahead of myself but again so so if you’re looking for the com it’ll go to the contact of theom TLD server and it it’ll direct the query to that specific authoritative dnf server to get you example.com the authoritative DNS server which is the final step of this whole thing is the one for this particular example for example.com these servers um host the actual domain name itself and then they have the actual DNS records that map that to the IP address and then that’s where the data gets pulled so it goes from the the top level server the TLD server it’ll find that and then from there It’ll point to the authoritative DNS server that’s actually hosting that domain name and that could be at AWS it could be at GoDaddy or a variety of different hosting providers um once it is found and once the data has been pulled it returns that to you right it sends it back to your computer and then it just uh populates it onto your browser so that you can actually look at whatever you want want to look at and watch the video or watch Netflix or all these things so all of this stuff is happening so that you can get access to the data that you want to get access to so to give you a visual representation of everything that we just talked about typically these things are a little bit easier when you look at kind of the flow of it like this right so on your computer you would look for self-repair apple.com so selfrepairing which would be uh either the cache that you have on your computer the cache and host file that you can see on the left right here or it’ll be with your ISP right so if that doesn’t exist uh if you don’t have it if you haven’t checked it then what’s going to happen is that this local DNS server is going to send that question out it’s going to send that query out and it’s going to say where is this place where is this location what’s the IP address of this place and it’ll go to the root server and in the root server says I have no idea try the authoritative server for the people right for all the dot domain names and the authoritative server is going to be at this particular location so that gets sent back to the local DNS server local DNS is like all right fine you don’t know so I’ll go to this guy and this would be the Doom top level domain authoritative DNS server so it’ll go to those guys and it’ll say hey where is this thing and they’re like I have no idea why don’t you try apple right why don’t you try the actual Apple server so that you can find out what the the self-repair apple.com location for that IP address is so that would be the TLD authoritative servers response and then that comes back and then the local DNS is like okay fine let’s go to Apple so you and every single time you see that this one this one says I don’t know why don’t you look for the dot server that’s hosted at this IP address and then this is like oh okay so let me go to that IP address and they go and then this is the top level domain authoritative server and then this thing sends a respon back I have no idea why don’t you go to this IP address which belongs to Apple and he’s like oh okay fine and it’ll come back and it’ll be like hey Apple where is this particular thing and then Apple says I don’t know why don’t you try the authoritative server for repair. apple.com so now there’s the extra piece that comes in for this and then it says that is housed over here and then finally it sends it over here and it says okay fine hey where is this thing and then this says oh no problem there is no self-re there is no self-d out repair that awful and it’ll send it back and then the website says okay there is nothing and this is where like you probably will get an error or something on the screen that doesn’t load it now if there is something like that then what will happen is that it’ll say oh this is the IP address for it and then it’ll send it to the DNS server and the DNS server will send it to your computer and it’ll load it for you right that’s the process right here this I just think that this is so funny that it send this like it send this on this wild goose chase and you go back and forth and then you’re like oh there is nothing here and it’s like oh Jesus um and so uh presumably right so if this uh particular location was not self. repair. apple.com if it was I don’t know repair. apple.com is it would stop at this portion right it would stop at this place and it would be like yeah okay here’s the IP address for repair. apple.com if we were just trying to go to apple.com and you send it Hey where’s apple.com and it’ll say I don’t know why don’t you try a doom server and you’ll come to theom TLD authoritative server and you say Hey where’s apple.com it’ll be like here’s the IP address for apple.com right so it depends on how deep we’re trying to go with this inquiry that we’re making and where we’re trying to land obviously but it goes from your local DNS server it’ll go to the root server the root server will send you to theom tldd server that place will send you to the potential authoritative server for that actual domain name for that actual website right and this is the process of literally everything that we just talked about with this whole DNS part so just to give you that explanation we have the DNS servers which are the specialized servers that are responsible for handling the process of translating a name to an IP address and there are different types of server so you have the DNS resolver that we just talked about which is the recursive resolver and it receives the original query from your computer so that was that thing in the bottom left corner it receives that thing from your computer and it handles the process of contacting everybody else to find that for you and then finally it’ll send you if it finds it it’ll send you to the page that you were looking for from that the DNS resolver from that location it’ll go to the root DNS server which was the thing in the top right and then it’ll say this is the first stop and this is what we’re looking for why don’t you go to the appropriate TLD server which might be.com or might be.org or whatever and then that will go to the actual or.org or.net whatever that top level domain server and then from there it’ll redirect you if it has the address it’ll redirect you if it doesn’t it’ll send you to the authoritative DNS server that will find help you find the other thing but typically this is where it stops so it’ll go from the TLD DNS server and then it’ll say okay go this is the Doom location for the authoritative DNS that you’re looking for and then you can go to this particular place place and these are the servers the authoritative DNS servers are the ones that actually store the records for domain names and provide the final IP address in response to what you’re looking for in that particular case we were trying to go to store.apple.com and then we were trying to go to repair. store so that went a little bit further but typically this is where it stops you’ll go from T the TLD server which is like it’s a.org website oh okay I want to go to.org website it’ll send you to that particular authoritative server and then that.org server will say oh okay this is the address for this particular website that you’re looking for it makes it a lot easier so instead of you having to remember IP addresses you just remember a name that’s it that’s kind of one of the biggest benefits of uh DNS right once you get past the user friendliness it becomes scalable so if you wanted to look at all of the freaking devices that are on the internet and the domain names that are on the internet and the IP address that are connected to these domain names on the internet and you want to expand that then you’re looking at scalability so DNS supports all of the massive number and it’s constantly growing so it supports the growing number of devices that people keep buying and they’ll add another TV to their house and all of the domain names that people keep buying so it supports that scalability right that the massive number that keeps growing and gets bigger and then when you’re considering the amount of of queries that are being made to the Internet so imagine how many billions of queries are being made right now as you’re watching this how many millions and billions of people are trying to access millions and billions of locations across the internet when you consider the massive amount of traffic that’s just going back and forth as you’re watching this right now across the internet you need to also consider well these things can’t crash right there needs to be some kind of a a redundancy means there’s a backup server for Microsoft for example so if there initial server there’s some kind of earthquake at the first server’s location and that particular building has a power outage there needs to be a backup server that kicks in immediately so that microsoft.com doesn’t crash and you can still go and access that website that was actually something that happened relatively recently where the Microsoft service um and all of the computers and all the devices that r Li on Microsoft couldn’t work and this was like a couple of months ago it was pretty recent that that actually happened and in that day Microsoft lost over I think $150 million something like this some crazy number that they lost during that one day that they had an outage so this is where redundancies come in this is where reliability comes in the DNS system and the variety of different uh TLD servers so it’s not just one TLD server that has es all of the domains it’s literally dozens of them probably in the hundreds that are across the globe in different locations so just in case one crashes you can go to the another one and you still get reliable connections if you wanted to and just expand that extrapolate that to the orgs and the Nets and all of the different countries and the different cities and so on and so forth this is where redund redundancy is a big deal this is where this structure this hierarchy of structure that’s available is a big deal and it provides that reliability that people actually really need because I mean what would you do without the internet and once you get past DNS once you understand DNS then you go into DHCP which is the dynamic host configuration protocol and this is a network management protocol that’s used to automate the process of configuring devices on IP network so this is essentially what assigns the IP addresses to any new device that’s connected to your router to your internet right so when somebody comes and connects to your Wi-Fi this is the protocol that assigns the IP address and any other configuration parameter to that device that just connected to your Wi-Fi so it’s DHCP that assigns the IP address to your Wi-Fi to your to the device that just connected to your Wi-Fi uh the way that it works is that a device like a computer whatever connects to your Wi-Fi it connects to the network when it does that it doesn’t have an IP address so it sends out a discovery broadcast message to a d DHCP server the DHCP server makes the offer and it so it receives that Discovery message and then it responds with the offer message and it says hey here’s the available IP address that we have on our Network and the network configuration information that you need like the subnet mask that’s connected to uh the default gateway that would exist and any DNS server addresses so on so and most people really don’t give a crap about this they just want Wi-Fi so most people really don’t care about this and of course the devices don’t display all of this stuff to the person who’s connecting their cell phone to the Wi-Fi that you say oh I have Wi-Fi connection but what happens is that your cell phone sends out the request and then the DHCP server actually responds with an offer and says this is the IP address that I got for you and these are all of the rules and uh configuration details for our specific Network and then the device receives that offer and responds with an actual request message that says hey I accept the offer that you’ve made thank you for this IP address and thank you for giving me connection to this network and finally the server will acknowledge that request and it’ll say all right cool this is your actual IP address from now on and every time that you connect to me this is the IP address that’s going to be assigned to your particular device and now the device can actually use the internet because it has its official address and it can communicate with the network the vast wide network of the internet the worldwide web there are a lot of benefits to this obviously right so it simplifies the network setup imagine if you had to be the one to assign an IP address to every single device that connected to your network and believe it or not at a certain point in the history of the internet this was actually the case and people had to manually assign IP addresses to the computers and to the devices that were connected to the network so uh especially when you have a massive Network where configuring each device would not be possible which is what Microsoft has or what Amazon has they have literally tens of thousands of employees so imagine if somebody had to sit there assigning IP addresses that would be like a full-time 247 type of a job so um it avoids the conflict of addresses this is the other piece that is uh very much prone to human error right so somebody might forget that oh shoot I already assigned this address to this other device and now I can’t reuse this address and now I got to go do this other thing and try to find a new address own and so forth the DHCP protocol the the uh system itself just keeps track of all these things and avoids any conflict so it assigns an actual unique address to the person or to the device that’s being connected to the network and then there’s the management of the IP address so um people want to use their IP addresses uh efficiently if there’s any kind of a security issue or a disconnection from the network the IP address can be resigned to another uh reassigned excuse me to another device um if it has to do anything with uh firewalls which we’re going to get into a little bit as well it helps with the management of IP addresses and blocking or meaning denying access to the network via the IP address or allowing access to the network so on and so forth so it’s the management of these IP addresses that comes in um and again it has to do a lot with uh networks that have a lot of temporary or what they call transient devices so if you go to a uh a coffee shop Wi-Fi hotspot right so if you go to a coffee shop and you’re connecting to that Wi-Fi that specific Wi-Fi at that coffee shop has probably seen tens of thousands of devices that just roll through and they’re there for just one day because they need Wi-Fi and then they leave you know the person came into town for a trip and then they left and they’re never going to visit that uh Wi-Fi spot again but that IP address that was temporarily assigned to that person now needs to be freed up so that it can be assigned to somebody else right so this is how these transient devices or these temporary devices get managed through the DHCP protocol and this is very important to understand especially when you just go into a large Enterprise environment these are the types of things that are very very useful so when you’re thinking about the management of IP addresses what assigns the IP address inside of the network that would be the DHCP protocol what assigns the the name to an IP address that would be the DNS protocol right what types of IP addresses are there ipv4 and IPv6 right so these are the key things that you need to remember when it comes down to the networking fundamentals when you go further along with this the DHCP lease is basically the time the the temporary amount of time that that IP address will be assigned to that given device right so in a public Wi-Fi environment to the DHCP lease time is obviously way less than your personal home internet so when the lease time is about to expire the device has to renew the lease by sending a new request to the DHCP server and then the DHCP server will send the offer um if it stays connected to the network the server typically just renews the lease which extends the time that the address is assigned so that you don’t have to get a new IP address every single time for your laptop to be connected to your home Wi-Fi right so if it’s constantly connected to it or it’s connected uh perpetually like you never take your desktop computer out of your house it stays right the laptop might be put in your backpack and it’ll leave and it’ll come back but the desktop will never get disconnected from that Wi-Fi so in that particular ular case it’ll just renew the lease and it’ll just keep granting access to that particular IP address but when your friend comes over and that you don’t see them for the next 3 months or 6 months or whatever that lease time will lapse and then they’ll need to reapply they’ll need to submit a new request to the DHCP protocol the DHCP server so that they can get assigned a new IP address when they come back to your house so as a summary DHCP streamlines Network management it assigns IP addresses to devices making sure that the IP addresses and any conflicts and everything are all handled without you even having to worry about it um it also simplifies the process of connecting devices to a network by making sure that the network efficiency is overall uh just running smoothly it it’s enhanced and again you just don’t have to worry about it all of these things happen behind the scenes you don’t even think about these things for the most part if you don’t know anything about networking you probably have never even heard of this and you are like oh wow I didn’t know all this was happening but yeah there’s something that assigns the IP address to somebody that gives them access to your Wi-Fi and that is called DHCP all right now we need to address what a network interface actually is so the interface configuration uh Legacy meaning the old school version and actually this is what’s running on my current Macbook so this is not uh Legacy in the regard that it’s no longer being used there’s still a lot of conf uh computers that use if config um but this is the interface configuration that happens with this particular tool or command so I have config uh short for the interface configuration is a command line utility that is used to configure your network interfaces on Unix based operating system so for example Linux or Mac OS um it connects it it creates those interfaces it configures the network interfaces to the actual IP address so it’s been deprecated supposedly um but it’s uh still very much in use so but when I try to run this just to test on my MacBook uh and I ran IP uh it didn’t work it said IP doesn’t exist when I ran if config if config worked so uh it’s not as deprecated as they make it sound and it’s still very much in use um when you go to Windows if config is IP config and it serves the same exact purpose the simplest version of the command is to just type if config and press enter and it’ll lists all the network interfaces that are on your system along with all of the current configurations meaning the IP addresses that are assigned to them if there are any network masks or broadcast addresses and everything else that would be appropriate for that particular configuration this is an example output right here so eth0 is the actual interface in question these are all the flags that are attached to it so it’s up and running it has the broadcast so on and so forth multicast um the inet IP address for this is this piece right here the network mask for it is this so this is the subnet mask that we were talking about and this is the broadcast IP address that’s assigned to this particular Network right so this is the this is just a sample of the output of what would happen when you ran if config now if you wanted to configure the IP address then you would do pseudo if config and then that specific interface that we were just talking about and then the IP address that you would want for it and the net mask that you would want and then you would do up to just make sure that the IP address has been assigned to it and the subnet mask is what it is and then the up keyword brings the interface up meaning it actually activates it right so if you wanted to take it down you would just type down but this is a configuration command so you say uh this specific IP address is what I want to assign to this particular interface and I want it to be on this type of a subnet mask and I want it to be up I want it to start running right so this is what uh how you actually configure something like that you don’t have to necessarily do it because typically the DHCP protocol will do it for you but in case you actually need to do it manually then this is what it would look like to configure an IP address manually for any given interface which is in this case the eth0 and this is the detailed breakdown of everything that I just said so uh maybe I I just should like click forward and see if I actually have these notes in the future slides um so pseudo would be running the command as a super user which is the administrator privilege the eth0 is the inter uh the network interface that you want to configure so it could be anything that has to do with your interface it could be eth1 it could be TP typically not Lo which is the local uh host you typically would not modify that that ends up having the same exact IP address on every single machine which is 127.0.0.1 .1 um so eth0 would be the particular interface in this case that we’re going to be configuring this is the IP address that we’re going to assign to this interface this is the network mask the subnet mask that we are going to assign to this interface and then we want to activate the interface and make sure that it’s up and just like you can bring an interface up you can shut it down or you can deactivate it essentially and so without having to configure the IP address or anything like that if you just wanted to make sure that this particular interface is active you would do ETA Z up if you want to take it down or deactivate it you would just do ea0 down they control the state of the interface itself when you bring it up you activate it when you take it down you deactivate it um bringing interface up or down the usage itself as an act uh as an example over here would just be exactly what we repeated in this particular case and so these are just some examples so if config e0 up or down in this particular case so in conclusion and summary we you know that if confi is supposedly deprecated but it is not because it is active and running on my computer right now on my MacBook right now um it remains a widely recognized tool for managing network interfaces on Unix based systems and allows you to view the interface configurations assign IP addresses control the state of interfaces and that is what I if config does IP is supposed to be the modern replacement for if config it’s part of the IP Route 2 package and it basically does everything that we just talked about with if config except the command in this particular case would be IP you would just do IP space a and press enter or you would just do IP space address so addr and then press enter and it’ll just display all the network interfaces very similar to what if config would do including the IP addresses Mac addresses and any other detail that is relevant to them and this is what the example output looks like so this is very similar to what we talked about previously um The Local Host right here is what this is at the very very top right here and the IP address as I mentioned for the local host on every single device that I have scanned and uh pen testing or anything like that this is the Local Host IP address so this one is universal across every single machine that I’ve ever messed with okay then there’s the eth0 which is the actual interface that is being assigned an IP address on your Wi-Fi on your local network and in this particular case this is the IP address for it right so this is your act actual IP address for or in this particular example this is the actual IP address for this particular machine right and so this would be the broadcast address and then where is their inet mask we see the mask or no I think the mask is not available on this particular example but you see the MAC address as well you see the so this is the MAC address that is connected physically to the ethernet and the MAC address can be spoofed that’s a whole other conversation but uh Mac addresses are not permanent neither are IP addresses you can also spoof an IP address I don’t know why I even said that anytime I think Mac address I’m like oh Mac address spoofing um but this is typically what it looks like so it’s very very similar to the if config output um and in this particular case what we’re seeing over here is the MAC address as well and honestly in a lot of cases when I run if config I also see the MAC address for my devices on if configs output so this is not just limited to the IP Command itself um you have the the assigning of IP address is very similar to what we did with if config uh which would be pseudo IP addr and then you’re going to add this particular IP address with the back uh the forward slash of 24 to the e0 interface and then there’s going to be the breakdown of this so I’m not going to try to explain all of it on this particular screen but what we’re doing is that we’re assigning this particular IP address with a subnet mask which is 24 bits so meaning the first three right here the first first three octets are assigned to the network and then the last octet is assigned to the host itself and then so you’re assigning that particular subnet mask to this particular piece actually so we can actually see this right here so instead of showing the 2555 255 2550 this is the piece right here that tells us what the subnet mask is so in this particular case it’s saying that the the network is assigned 24 uh bits which would be three octets which is what this represents right here okay so breaking it down you have the pseudo command that runs it as an administrator you have IP address add which indicates adding an IP address this is the IP address that we’re adding and we’re putting it with a subnet mask of three octets which would be 24 bits and then we’re going to add it to the interface which is this particular interface the eth0 interface which is typically for the most part also what I’ve seen is that that is the interface name that is assigned to the very first IP address that you get assigned on your particular computer and the overall concept of bringing an interface up or down basically activating or deactivating an interface is also it also applies here it’s just the command is a little bit different so it would be IP Link set eth up or IP Link set eth0 down and this is how the whole thing breaks down so up obviously activates down deactivates it but you’re talking to the interface a little bit differently and you’re saying link I want you to set this particular interface up or down I want you to activate it or deactivate it and these are the examples of what it would look like if you wanted to bring something up or deactivate it and put it down so displaying the routing information um is mostly done so that you know what the paths that the network traffic is taking to reach various destinations uh it includes the information about the default routes that it will take the specific routes that it will take and the interfaces that have been used uh this is to try to troubleshoot any kind of connectivity issues and see if there is any individual connections that are being made along that particular path that the the traffic takes to actually reach its particular destination if any of those things are glitches this is also uh information that can be used for security analysis and Pen testing as well but it’s mostly used for network connection troubleshooting so it displays the routing table which shows the paths that the Network traffic will take and so this is what it looks like this is the sample output for this piece so you’ll have this whole thing right here and I’ll actually break this down in a little bit further detail for you so you kind of understand what this going on it’s not I don’t think this is actually inside of the the scope of the Linux plus uh studies and the examination but just to give you a good idea of what you’re looking at when you’re looking at the routing information so that your networking knowledge is a Little Bit Stronger so let’s actually look at what this piece right here means and what all of these different elements represent okay so now breaking it down in particular segments here so the first portion was the default via 19216811 Dev eth0 so the default indicates the default gateway which is used when no specific route for Destination is found in the actual routing table the Via 192 yada yada yada uh specifies the next hop address so it’ll go from the default gateway to the next address which is going to be the IP address of the default gateaway router that the traffic will be sent through so it’s going from this it’s technically just doing this it’s going to go from the Gateway but it’s going to go as this IP address instead of the default gateway and then this is going to indicate the network interface that’s going to go uh through the traffic that’s going to be routed through right so the default via so this is going to say this Gateway via this IP address on this particular interface is going to start traveling right so it tells the system to send any traffic that doesn’t match a specific route in the routing table to the default gateway at this particular IP address using this particular interface now the next line is this piece right here and we’re going to just break this down so we have this portion that represents a specific route for the IP address range which goes from 1.0 to 1.25 five where the 24 is the subnet mask which would be this piece right here so you should already know this now Dev eth0 is the associated with the network interface eth0 itself protoc kernel signifies that the route was added by the kernel and we already we should know what the kernel is right so the kernel usually as a result of configuring the network interface so it’s going it was added by the kernel this particular route that we’re looking at was added by the kernel and scope link indicates that the route is valid valid only for directly connected hosts on the same link which is the local network so this particular piece what we’re seeing in this line right here is only available for people who are actually connected to the same Wi-Fi the same internet that this particular device is connected to and then this is the source IP address so this is the IP address to be used when sending packets to this particular subnet so if we looked at it from the very very top right here you have the Gateway saying that this is going to be my IP address for communicating and then it’s going to say the rest of it for this particular device that’s connecting so it’s saying that this represents a specific route for this so it’s going to go from any of these to any of these and then it’s going to say that this is my interface this is my actual device that’s going to connect to it this was done by the kernel and it’s going to be on the same exact internet that we’re connected on so the local network or Wi-Fi this is what we’re going to be loc located on and then this is the IP address of the actual device right so it’s not to be confused with this IP address cuz this IP address was for the default gateway this is the IP address that is for this particular device that is communicating with the internet so when we combine both of these lines traffic for any unspecified destination so the default route will be sent through the Gateway at that address via that interface and traffic that’s specifically destined for this particular subnet will be routed through this particular interface using the IP address that was assigned to this interface again this might be a little bit confusing it might be a little bit overwhelming we’re not talking about networking we’re not talking about Network Plus or at least not in this depth I just wanted you to have this just so you can kind of see what this particular output right here kind of represents and these are this is just a sample of what would happen when you run the IP route command so in summary even though though I have config is supposedly deprecated it remains a widely recognized tool for managing network interfaces on Unix Bas systems it allows you to view the interface so on and so forth uh very similar to what is done with IP next on our list of tools is the network manager command line interface so NM CLI uh it’s a command line tool as the name implies uh and it helps you manage the network connections on Linux systems so it’ll interact with the network manager um which is a system service for managing the interfaces and the connections to the interfaces uh it’s commonly used in desktop environments and provides a convenient way to configure and control network settings without a graphical interface because the name is implying this is a command line interface so nmcli interacts with your network manager and it helps you manage the interfaces as well as the connections and so on and so forth uh these are some of the commands so if you want to look at the Active connections on your particular Network you just type in nmcli connection show and it’ll show you all of the Active network connections on your system and then it’ll show you the uu IDs for them the connection name type of the connection device associated with each connection so on and so forth and then you have the particular output as an example here so if we do the show this is typically what it looks like you have the name so this would be the interface right so this would be the machine itself this is the Wi-Fi uh router and then you have the uu ID for both of them and then you have ethernet for this guy and then you have Wi-Fi and then the device itself and then the device uh name or the device ID in this particular case so it’s going to be eth0 and then this is WLAN for this particular set of devices the set of network connections here so if you want to configure a static IP address using nmcli you would go through this now static IP addresses are IP addresses that don’t change so this is going to be a permanent IP address right so pseudo nmcli connection modify on this particular interface it’s going to be an ipv4 address and this is going to be the IP address that’s going to be assigned and it’s a subnet mask of 255 255 255 so it has three octets right the command assigns a static IP with a subnet mask of 24 to the network connection eth0 and this is our breakdown so we have pseudo we’re running it as a administrator nmcli connection modify is indicating that we’re modifying a network connection it’s going to be for this particular interface so this is the network connection that we’re going to to configure and then we’re going to have the IP address and subnet mask that we’re going to assign to it and we’re using an ipv4 uh version of the IP address and then we’re going to add it to this particular subnet which is the 24 so the 24 represents the the number of octets right so 24 divided 8 would be 3 which means 255.255 255 I’m just repeating a bunch of things that you should buy now know so if you want to enable or disable connections uh it’s very similar to everything else we’ve done so we are using the up and down keywords in this particular case we’re going to say sudo nmcli connection up for this particular interface or connection down for this particular interface and we’re going to bring the connection for that interface either up or down activate it or deactivate it and this is what that looks like right here so nmcli connection up eth and C connection down eth yada yada yada so if you want to view the status of the devices now you would look at the nmcli device status and it’ll display the status of all of your network devices whether or not they’re connected disconnected or unavailable and this would be what the example output for that look like so the device you can see right here so this would be the internet this is the Wi-Fi this is the loop back or the local uh host and then the ethernet is connected the Wi-Fi is connected and the loop Act is not being currently managed right so if you want to available it list available all the Wi-Fi networks that you have uh for your particular connection or for your particular device and this will probably be a really big list depending on how big your building is or who’s around you and then you’ll just do nmcl device Wi-Fi list and it’ll list all the available Wi-Fi networks like their ssids and the signal strength and security type and this is what that output potentially looks like so this is the SSID which is the the name essentially of the Wi-Fi so you have my Wi-Fi another Wi-Fi and then there’ll be AT&T y y y Spectrum y y y um the mode right and then the channel that it will be on the rate the speed of it the signal so if it has a signal of 70 it has a stronger signal and a stronger connection for what you have bars would be another version of looking at the signal and then you have the security which is WPA2 or WPA2 in this particular case but uh these are the Wi-Fi uh Securities the Wi-Fi encryptions and WPA 2 is one of the more common ones that are modern and most secure compared to all the the Legacy or outdated versions of wi-fi security if you want to connect to any of those networks you would run a pseudo command with nmcli and you want to connect to the device Wi-Fi and you want to connect to it and this would be the SS ID I guess that would be the name in this particular case and then the password would be password because most often than not you actually need the password to connect to a Wi-Fi and then it’ll just connect to that Wi-Fi using the provided SSID and password that you have given to it and this is the particular uh usage here so you will replace SSID with the actual name of the Wi-Fi and then you’ll replace password so in this particular example you’re connecting to this particular interface which would be for example my Wi-Fi in this case and then from there you will provide the password that would be inside of the password quotation and that will will be the actual password for that Wi-Fi network so in summary the nmcli provides a powerful command line interface for managing network connections on the Linux systems it allows you to view and modify connections assigning static IP addresses control connection States and of course manage the Wi-Fi networks so that is the power of NM CLI now on to troubleshooting so ping is a very useful utility um it’s used to test the reach ability of a host so a computer or server so you basically ping the IP address or you ping the website and you can also measure the round trip time for the messages that are sent to that to just uh establish how strong the connection is or how quick that particular host is uh to respond to you and when you send it you literally just say ping and then you give it the IP address and then what it does is it’ll send a bunch of Internet control message protocol packets so icmp packets anytime you see icmp think ping so it’ll send Internet Protocol packets and there’s something called the icmp flood and this is actually a way that you can uh do a dods uh attack or a Dos attack which is a denial of service attack by flooding that particular host with a bunch of requests with ping and because it’s getting so many requests you may take it down and it may go out of service so this is something that you can see regularly icmp will be associated with ping so just keep that in mind if you see icmp that means they’re trying to Ping it so it’ll send the request to the specified destination which is either the host name or the IP address of that host and then the receiving response will tell you whether or not that thing is up so typically if it doesn’t uh if it’s not up it’ll respond with some kind of a you know host is down or something similar to that if it is up it’ll just say connection was a established and the amount of time that it took for that connection to get established the roundtrip time so the rtt is where the measurement of the time that it takes for the echo packet to actually go to the host and reply back to you so it’s the round trip very very simpler uh very simple to understand um packet loss is the number of packets that were sent but then they got lost somewhere in the process right so the network connection or other issues so if it sends a packet and it didn’t come back or if it sends a packet and it wasn’t received it was lost in the process in the in the transition um in the transport so to speak um if that happens it’ll also report that back to you and this will help you understand the the strength of the signal to that particular host and how reliable that signal is the basic usage looks like this so you just say ping the host name or the IP address so ping google.com that’s an example and or ping you know one 5 whatever I’m just coming up with some kind of IP address but you basically just put in ping and end either the host name or the IP address itself and it’ll ping it for you just to tell you whether or not it’s up for the most part the reason why we use ping is just to see whether or not that host is actually up and when it is up this is what the result looks like so it will just say 64 bytes from this particular location was sent and being that actual host right the host IP address there were 64 bytes that was was sent back to you and this is the the response time which was 12.3 milliseconds and the next one was 11.8 12.1 but essentially this is exactly what it looks like when a host is up when it’s not up you don’t see anything like this it will typically just say uh connection was not available or host is down something along those lines but if you see this and then if you just see this every uh second or every one or two seconds and it just keeps coming cuz uh typically what happens is that unless you stop it so you would have to do uh control C which is actually I think on the next slide so we have the the breakdown here for what we just saw so we have the 64 bytes from the IP address which was the reply that we received from that IP address which was Google in this case uh the sequence of the packet so it starts at zero and it’ll just increment by one and then we have the TTL the time to live value indicating that the maximum number of hops that the packet can take before being discarded and then you have the actual time this is the roundtrip time for that packet to be received and so if you want to stop it right you would do control C which cancels that request unless you actually say that you wanted to send out 10 packets for example or five packets just to make sure that it’s up which is I think would be done with the dash C for count and then you would do that so that it’ll run and then after 10 packets it’ll stop running but if you just do pay Google by itself it’ll keep running until you stop it and you would have to do that with contrl C additional options oh there we go this is the count right here so- C is the actual count I I just keep getting ahead of myself so specifying the number of packets that you want to send and then after five packets it’ll stop pinging and then you can also set the interval between the packets to be sent and that you would do that with the I option and it’ll be a two it’ll be the number of seconds that it’s going to wait so if you don’t want it to show up on their on their intrusion detection system that you’re pinging them and if you want the Ping to wait you know 5 seconds or 10 seconds you can do that just to make sure that you still have the connection and it’s a live connection but it won’t show up on their IDs or their Network intrusion detection system as somebody’s massively pinging us and trying to see whether or not we are up whether or not we have service um you have some additional options here so we were talking about flooding right so if you wanted to flood this particular case um what happens is that you you are trying to potentially uh do a denial of service attack so if you do an F option so if you do the dash F option it’ll flood meaning it’ll send as many packets as fast as it possibly can and that you know the milliseconds that you saw will be much much smaller so it’ll be a massive flood of packets that will be sent to google.com and in this particular case I mean I don’t think Google will even care um but this is typically done to to slow down the actual host that you trying to reach because you want you want to either test their ability to handle a flood of traffic or you want to actually do a denial of service attack and stop them from operating in summary the Ping command is a vital Network troubleshooting tool because it actually tests the connection between devices and makes sure that the device is actually up it also tests the amount of time that it takes for packet to go back and forth to see how reliable that connection is and if you have any kind of a network issue it’ll provide information to you about packets that were lost in the process the length of time that is taking for the package to be sent and come back and of course if there is a status of the connection meaning if the connection is actually up and whether or not you can send a ping to it and receive something back trace route is the type of tool that helps you uh track the route that some a piece of traffic is taking or a packet is taking to go to a particular location so when you run it it’ll send a series of packets to whatever the destination IP address is and it’ll gradually increase the time to live values and that what that does is that it determines the maximum number of hops or the number of internet routers that the packet can Traverse right so when you send something out it’ll go through multiple Wi-Fi routers or multiple uh internet routers before it actually lands at its particular location and it’s rarely ever the same number of hops so um it could be you one hop it could be 10 hops right it just depends um when you increment the TTL value uh it’ll start at one and then it’ll travel only to the first hop before it gets discarded and then with each packet that goes after the fact the TTL value is going to be incremented by one so meaning it’ll take two hops on the second run before it gets discarded and then it’ll take three hops on the third run before it gets discarded and the number of hops again will be the connection to the routers the connection to the the destinations at the the kind of the intermediary connections before it actually lands at its final destination um it’ll send icmp packets icmp messages so when it the packet is discarded due to the expiration of the TTL it’ll handle the packet sent as an IP icmp which will give you the I the time exceeded message back to you as the source so it’ll include information about the actual router that I was trying to connect to allowing trace route to identify the Hop that did not make it the Hop that was discarded along the way and then when it completes the path it’ll keep going right so the process itself will continue until it actually reaches the destination and completes its path and it’ll tell tell you what the maximum number of hops that it took to actually get there um when you get the information from each hop trace route will actually construct the route that was taken by the packets and this is for again to try to measure connectivity and to try to measure the strength of a network and the strength of your particular host trying to connect to another host the strength of the internet connection that you have and the routers that you have and how long it’ll take for something to get through if something has been discarded along the way that is something that you think about if there are too many packets that have been discarded along the way that’s something that that again is kind of like a a red flag that it’s like okay we need to troubleshoot this connection because for whatever reason we’re dropping a lot of packets a lot of packets are being discarded and if it the paths are completed then trace route will actually record that information for you and it’ll show you how many of your uh pings that were sent out how many of the packets that were sent out were actually reaching its destination um if you want to run it it’s very similar to running the Ping command so you just do traceroute google.com for uh this particular example or you could say trace route and IP address.com and it’ll start sending packets out and it’ll start tracking the number of hops that it takes and whether or not those things are going to be discarded and the connection is actually being established or whether or not those things are going to reach the Final Destination that you want it to take and then when it does it’ll show you the route that it took and how many hops that it had to take in order for it to get to that particular destination so when you run it this is the kind of output that you’re going to see right so the very first one is the very first packet that was sent so this is the TTL right and so it went and it took a millisecond and then the second attempt it took 789 milliseconds and the third attempt it took 699 and then it’ll send two and then in this particular case 1.62 milliseconds and it incremented uh it was a longer time obviously because it’s sending longer packets the third one it it sends three right and then obviously the time in response is going to be longer because more packets are being sent out right and so this is what that actually means so this is kind of the breakdown of what we just saw so the very first one that we saw the IP address that we see at the beginning that’s the IP address of the first hop router and this is the first hop that it took the response time the round trip time for these three packets that perent to that first hop were the shortest that we got because it was a single packet right the lines after the fact were the IP addresses and roundtrip times for each of the Hops that we had along that particular path and if we wanted to let that keep running cuz that ultimately it’ll just keep running so this is in this particular case it took three Hops and it stopped running this what we can kind of assume but more often than not uh you’ll see a few dozen maybe it’ll be like 20 something hops that it’ll take and then the data points for each of the routers that it was connected to be before it finally landed at the location that you wanted it to land which is in this particular case the 142 250 74 etc etc this is the the IP address that the was the first hop that it took and then it got back to this particular guy and then this guy responded back to us in this particular case there was only three hops that were taken it went from us to them and it was only three hops HS which is very short to be brutally honest with you more often than not it doesn’t take only three hops to go from where we are to where this particular location was so it just traces the route that your particular Gateway took to get to the actual IP address of Google in this particular case right if you want to try to specify the maximum number of hops that you want the uh the trace route to take it’ll be the M option for Max um and you can give it the the number of hops that you want it to take before it gets to google.com um and then you can also M uh set the number of uh packet bytes the size of the packets that you can declare which can in this particular case be done with the dash P option right here so if you want to dedicate the maximum number of hops you would do it with- m if you want to dedicate the size of the packet you would do- p in summary we have the trace route command that’s a powerful tool to diagnose network connectivity issues by identifying the path that packets take to reach a destination it helps to pinpoint where the delays or failures are along the Route by looking at the actual time stamps and which packets were actually discarded and then it makes it very valuable for you to network uh to troubleshoot your network connections because you will see where how many packets were dropped what uh was taking the longest amount of time to get to a location and the specific routers that were responsible for those network uh connection drops or the delays in the connection net stat short for the network statistics is another command line tool that’ll Display Network related information like the actual connections that you have the routing tables the interface statistics masquerade connections this a really fun word and multicast memberships it’s used very regularly for monitoring and troubleshooting network issues and the basic usage would be to run it with a variety of flags so if you want to view the active listening ports using netstat you can run it with these various flags that you have right here so T would be TCP ports and TCP connections U would be UDP ports and UTP connections L would be only the ports that are in listening mode and then n would be the numerical addresses instead of of the host name so you would get the IP addresses instead of the host names in this particular case so if you just did T it’ll only show you TCP connections if you just did U it’ll only show UDP if you did if you didn’t include L it would show if it’s listening or not listening right so this is essentially looking for all of the connections TCP or UDP that are in listening mode and then you want to see the IP addresses for those particular connections so this is what the output potential would actually look like so you have the protocol which is TCP TCP 6 UDP udp6 so whether or not it’s TCP or UDP this would be right here received and send queries the local addresses that are assigned to them as well as the ports that are assigned to them so this is Port 22 this is Port 80 which is the HTTP server this is the secure shell server these two ports I don’t know actually it’s they are UDP ports we can understand that but I haven’t memorized what they would stand for or what service they would have stands for and then these are the foreign addresses if any and the state that they are in which would be listening state so the TCP ports on these particular addresses on our local addresses are actually in listen mode these UTP ports are not in listen mode SS is the socket statistics um which is the modern uh alternative to net stat so it’ll provide essentially similar information similar functionality but supposedly it has better performance and it has a more detailed output so it’s also part of the IP Route 2 Suite uh which was very similar to what we dealt with with IP itself and it’s preferred in a lot of Linux uh distributions and you run it like this so very similar to what we just did right so you just do instead of net stat you would do SS and then you would give it the flags that you want it to run and exactly the same that we had before we have the TCP UDP listening ports as well as the numerical addresses that would be resolved for them and this is what the output looks like so similar to what you saw previously except instead of the listen or uh unconnected uh state right now that we see here that was previously at the very end right here it’s at the very beginning right here it’ll give you the send number of packets that was sent to them the addresses and the ports that are connected or uh listening and then you have the pier address and ports that would be on the other side so uh this is the example output for the SS command the uh what is it called again the socket statistics command that we use um so to get some additional options here we can look at a established connections so these would be TCP connections show active established TCP connections this would be all TCP connections listening and established which is right here so you can again you can do singular uh Flags or options you can combine the flags and options um you can also look at process information so Tu TCP UDP give me the the network the so name uh resolution deny it so I just want to see the IP address I want to be in listening and and then I want to see the P ID the process ID and program name also as well when it comes down with this so you get the process information that would be connected or running in this particular case so we got everything we just added that last uh flag to it as well so that we can get the process information in summary we have both netstat and SS which are very powerful tools for monitoring and troubleshooting our network connections and they’ll give you a lot of great details whether or not it’s been connected if it’s in listening uh with the IP addresses so on and so forth netstat is technically older it’s very well known um SS is the modern version but they both offer essentially the same information that you’re looking for um I did like the output of ss a little bit better cuz it seemed a little bit more uh friendly to the eye but for the most part they give you the same information that you would need and finally we have arrived at our network uh fundamentals conclusion here which is going to be our firewall settings so so the ufw which is the uncomplicated firewall this is the most common it’s like the uh firewall management uh that is the simplest that it possibly can be via the command line so it’s straightforward and it creates and manages IP tables firewall rules and it’s particularly popular on Ubuntu and all of the derivatives because it’s simple and it’s very easy to use so this is just an example here if you want to enable the firewall you just do pseudo ufw enable and so it activates the firewall and it it enforces any and all of the configured rules that you have for this thing so once it’s actually been en enabled it’s going to start running and it’s going to filter any of the incoming or outgoing traffic based on whatever the rules that you have that’s been set up so if you want to disable it you would do the same thing pseudo ufw disable and it will just deactivate it and any of the traffic that you previously were filtering is now not going to be filtered anymore um if you want to allow a service so in this particular case SSH represents Port 22 so you do sudo ufw allow SSH so it allows traffic for that specific service which is in this particular case SSH which would be Port 22 and SSH stands for secure shell so it allows for somebody to connect remotely to the particular device that this ufw is running on um if you want to deny it you can do it via the port number as well as the service so you could say deny 80 which would be the HTTP traffic and in this particular case it’ll block HTTP traffic on Port 80 which prevents access to web services running on this particular Port then you have the status of the firewall so if you wanted to see what the current status of it is whether or not it’s actually running and what all of the active rules are so if there are any deny rules if there are any allow rules you can see all of those things by just running a simple status command on this and I’ll just show you which one of the rules that you just try to apply for example so if you apply a deny Port 80 or deny SSH if you did those things and you run a status you can check whether or not those things are actually active and currently running um you can do a port allowance in this particular case so similar to what we did with the the service right we can do the same thing with a port that’s associ with it and then you can say allow 443 which is https so this is for like the internet um and Port 80 is HTTP but https is just a secure version meaning it has TLS or SSL uh uh traffic so that it’s actually encrypting all of the communication with your browser and it would be a TCP traffic Port right so it’ll allow all of the 443 TCP traffic that is typically for the communications with browsers UDP traffic is usually for uh viewing video so if you don’t allow UDP or if you don’t specify that you want UDP or if you actually deny 443 UDP you may deny any kind of video viewing that may happen on that web browser but in this particular case we’re allowing Port 443 which is for web browsers secure web connections we want those things to be allowed and then we’re going to allow the TCP protocol associated with them um if you wanted to deny the UDP traffic of Port 25 which is used for the simple mail transfer protocol it’s typically used for mail um it would be done by just doing a simple deny command and then doing 25- UDP and it denies all the UDP traffic on that Port which is used for simple mail transfer for emailing basically that’s what port 25 has done uh it’s the emailing Port that is typically used on on our Network and then you can delete rules right so if you had previously pseudo ufw allow SSH and now you want to delete that rule you can just say delete allow SSH it’s very simple right it’s very intuitive the syntax is actually quite easy to use and then you can delete the deny rule in a very similar way as well so if we denied Port 80 traffic then we would just say delete that particular deny 80 Rule and that way now you have deleted that rule which means that now you’re allowing the port traffic uh the port 80 traffic right and then you have logging on so if you want to enable the logging that’s being done from the firewall which highly recommended that you log everything that happens with your firewall you would just enable the logging so you would say logging turned on and it’ll log all of the firewall events that are happening um and this I mean I I can’t imagine that you would run a firewall without actually collecting firewall Lo logs so it’s very important to have the logs of the traffic that’s being uh coming through and if people are trying to access your Port uh 22 for example your secure shell Port when you’ve done a deny access to that port and then you keep getting attempts to hack into your Port 22 or maybe you have allowed Port 22 traffic to come in and somebody’s trying to log into your Port 22 but they keep using the incorrect password to log in so you feel like you are now victim of a Brute Force attack against your Port 22 which is for remote connections and remote control so uh it’s very important to turn on logging and then you can of course disable logging by just doing logging off and now there’s a version that’s kind of an all encompassing rule which would be a blanket type of a rule which would be allow all incoming so it’s it’s actually not uh allow all incoming it’s just allow incoming but what it does is that it allows all of the incoming traffic to your particular uh Network or I would say in this particular case your your uh device your computer right so you want to say hey I want you to allow all of the incoming traffic which means anybody and anybody could try to communicate with your particular computer on any port that you have open which includes your Port 22 it includes your DNS server and it includes all of the other 65,000 ports that would be available if you want to deny all of the incoming traffic you could do the same thing except just use deny instead of allow and then it’ll deny any connection that’s supposed to come to your particular device this is kind of difficult to do because then you won’t be able to connect to the internet for example via Port 80 or via Port 443 you won’t be able to get any of the the responses from Google or from YouTube if you’re denying all of the incoming traffic so just keep this in mind it sounds like it might be a good idea but you can say deny all and then go and do um allow of spe specific ports because you want to access specific Services right so you can allow you can dedicate you know the port 80 allowance and dedicate the port 443 allowance and then deny everybody else right and in the same way that you would deny all incoming or allow all incoming you can do allow all outgoing traffic so any request that you make out to the world we want to make all of those requests allowed and then same thing deny all of the outgoing requests and this is something that maybe you don’t want uh people to be able to connect to the internet for example because they shouldn’t be able to connect to the internet all they need to do is just work on their local computer schools sometimes do this because they don’t want people to be able to connect to the internet and they just want somebody to just use the computer for their schoolwork so this is the type of stuff that they would put so you’re denying all of the outgoing traffic that’s going to come from that particular computer so in summary it’s a very simplified process of managing the firewall rules for traffic and you can have very intuitive commands like we just saw which would be enable or disable or allow or deny and then you can do it based on the service name like SSH or you could do it based on the port number like Port 22 which would be the same port for SSH so you could allow Port 22 you could allow Port 443 so on and so forth IP tables is another command line firewall utility that will allow admins to configure Network packet filtering and address translation and a lot more other things so it’s a little bit more complex version of the uh the uncomplicated firewall that we just went through uh it operates with the Linux kernel to provide detailed control over how packets are routed so the core tables in IP tables can be filter so this is one of the main ones so it filters packets right so you can handle all the incoming packets by input you can handle the packets that are routed through your device by forward and then you can handle all outgoing packets by using the output chain so you have the input chain the forward chain and then you have the output chain and these are all done through the filter and then you have the network address translation which is for masting and port forwarding so it it doesn’t show your actual IP address it’ll use a different type of IP address so that the world doesn’t see your actual IP address which is actually very useful for masquerading and essentially kind of disguising your IP address and then the chains would be pre- routing so altering the packets before actually routing them out post routing altering packets after routering them out and then you have output meaning altering all of the packets that are generated by your device itself and being sent out to the world and what it does is that it essentially changes the source IP address in this regard it’ll change the source IP address when it gets sent out to the world to whatever the masquerad the disguised IP address would be so that when the the if it’s ever uh you know intercepted or your traffic is sniffed by somebody they don’t see your actual Source IP address they see some other IP address that wouldn’t be used so they can’t particularly attack you uh you have mangle which is used for specialized packet alterations like changing the type of the service that’s being request for example uh you know SS or UDP or th TCP or something like that or marking of the packet so you can again pre-out the alteration so you can alter the incoming packets before it’s been actually routed you can alter the outgoing packets by using the output chain and then forward input or post routing would be available for other types of modifications as well and that’s what mangle does so if you want to look at what the current rules are you would just do the pseudo IP tables with a capital l flag and it’ll list all of the current rules for the filter table uh it shows the rules for the input forward and the output chains and so this is what the example output would look like this is a little bit uh it’s kind of one of the more shrunken uh displays that I have on my screen so hopefully you can see what I’m doing um you can also just zoom in a little bit uh on your screen but so we have the chain input here and the policy itself is to accept it and then we have the chain forward and the policy is to accept it and the chain output and the policy for that is to accept it and then it gives us what the destinations would be if there were any and then the they accept all in this particular case from anywhere to anywhere these are the particular rules so this is the this is the current uh display of what we it would look like if we just uh listed this with the IP tables command so this doesn’t necessarily mean much CU there isn’t very much um uh data that is reflecting on these exact rules that we have over here cuz the with the exception of this one right here there really aren’t any rules that’s it’s saying from anywhere to anywhere except everything that’s basically the rule that we have over here so the basic commands that we can run would be to allow the traffic so in this particular case we have it IP tables Das a and then input and then the protocol would be TCP and then the destination Port would be 22 and then the accept would be the rule so it adds a rule to the input chain that’ll allow TCP traffic on Port 22 which is used for SSH then you can have the explanation of this mofo right here so you have the- a input which appends the rule to the input chain you have the- ptcp which specifies the protocol as being the TCP protocol Dort would be the destination Port as 22 and remember this is an input rule so this is actually all of our incoming traffic and the J accept would would jump to accept the target allowing the actual traffic okay if you wanted to block something then you would just do the drop as the rule itself so everything else exactly stays the same so in this particular case they’re doing it on Port 80 but the final rule would be to drop the traffic that’s coming so in this particular case this is again input traffic so it’s incoming and it’s on Port the TCP Port Port 80 and the very last thing is the big piece right here that is the actual rule itself which is to drop the target blocking the traffic that comes through for Port 80 as the destination Port if you want to delete something you would do the Dash D which is the deleting of the input rule from the input chain right so in this particular case it’s the TCP protocol on Port 80 which is the drop rule so previous the same exact rule that we just actually added uh we are now deleting it by instead of appending which was the- a that we were doing we’re just deleting it from the input chain and everything else stays exactly the same so the rule was to drop all the traffic from Port 80 now we want to delete that rule so we can allow the traffic on Port 80 and so if you want to save a rule you need to save it to the configuration file that is stored inside of the Etsy directory and you would do that Etsy IP table so assuming that you have IP tables installed and then there’s the rules. V4 which is actual configuration file so you will take the everything that you just did and then you will do pseudo IP tables save and then you would forward that so you should remember what we did with this operator it sends this particular output and it sends it to this specific uh path this particular file that we have which is the rules. V4 for IP tables if you want to restore a rule that was deleted um you can then go ahead and pull it from the rules. V4 file that you just did and you’re restoring so this one was saving so you’re doing IP taable save and you’re sending it to that configuration file this one whoops wrong direction this one you’re using IP tables restore and you’re restoring it from the IP tables rules configuration file if you wanted to view a specific table you would just use the T which is uh for the table um the natat table which is our Network address translator table and then it’s going to list all of the rules for our net table so you would do IP tables again and then you just want to list the table so that’s what DT stands for and we want the network address translation table and we want to list all of the rules for that particular table so it’s a powerful flexible tool so obviously it’s a little bit more complex than what we just saw with ufw um it has a lot of rules that be that work for a lowlevel environment providing extensive control over the actual Network traffic handling um you can also configure the rules for filtering packets and translating Network addresses modifying packets so on and so forth so those are the features that are expanded upon that are not available in ufw so ufw doesn’t uh you know protect your IP address by changing the IP address on its way out which is the network address translation so ufw doesn’t do that right usw ufw doesn’t do the mangling that is available on IP table so it is a little bit more complex to work with but it also has a lot more functionality than ufw does all right now on to a very important chapter which is the security and access management chapter and the very first section of this is going to be on file system security the first portion of file system security is going to be CH root and the concept of the isolated environment and something called the chroot J so first chroot stands for changing root or change root um it’s a powerful Unix command and it changes the root directory for any current running process and its children processes so we’re talking about the parent process and its children processes um when you change the root directory uh effectively you’re isolating a subset of the file system and you create what’s known as the chroot jail that I uh that subsystem that isolated version of the subsystem um ensures that any uh process that’s running within it can’t access files outside of that specific isolated environment which means that it enhances the security and the control of that specific system as well as the directory or the file system that has been isolated so essentially what you’re doing is you’re taking one specific file system or let’s say one directory and all of the contents within that directory and what you’re doing is you’re isolating it so it’s separate from the rest of the file system hierarchy and then from there um you’re going to ensure that not only is the rest of the file system it protected from what goes on within that isolated environment but everything that’s inside of that isolated environment is also protected from what’s going on outside of it inside of the rest of the file system so essentially the isolation component from it just that specific isolation component guarantees security for the isolated environment as well as everything that’s sitting outside of that isolated environment a better way to break this down is to look at the question itself so why would we use this because when you isolate something when you isolate an application in a chroot jail which is that isolated directory and everything inside of it um what happens is that it limits the damage that any potential untrusted or compromised programs can do and vice versa uh because those untrusted programs can’t see or interact with the broader system right so uh I’m going to show you a visual of this real quick okay so here’s the example that we have the visual that we have so this is the standard hierarchy right here right so this is our actual system route as well as the binaries the home the system so on and so forth right and then inside of the home we have this one particular user who has all of their contents but what has happened is that we’ve done CH rude we’ve put all of their contents inside of a CH rude jail so to speak and we’ve imprisoned them and now what’s going on is that everything inside of this red box is completely separate from the rest of the system meaning if this particular user downloaded something that they shouldn’t have if they clicked on a link they shouldn’t have whatever they did in their particular environment is protected it’s isolated right it’s isolated from the rest of this environment meaning our actual root and the binaries and everything else that’s inside of our main system is not interact is not affected by what’s going on inside of this jailed portion and that’s really the the big significance here essentially we’ve created a Sandbox type of an environment which anything that happens for this user in their isolated environment it might affect their system I mean it probably will affect their file system they probably might lose the content since inside of that or the the hacker or somebody who got into their file system may have access to everything that’s going on inside of this isolated area but they won’t be able to leave this isolated area and go inside of this root portion which is very important because this user maybe they don’t have elevated Privileges and we don’t want them to uh if they do something wrong and if they get hacked if somebody exploits this particular user we don’t want that malicious attacker that malicious actor to be able to get out of this environment and actually go inside of the main environment that has root privileges and they can do some serious damage get access to certain materials that they otherwise would not have had access to so this is the whole con this is the visual concept of what it means to create a chroot jail there are also other reasons why chroot is actually useful so apart from security um if developers want to create something and test it in a controlled environment before deploying it into the production environment or deploying it to the rest of the company so to speak uh they can do that safely inside of a jailed or imprisoned type of an environment um they can test our applications and any configurations that they want um and really really testing out the software that um has different dependencies or libraries that it needs for uh upgraded versions so on and so forth so it’s actually very very useful in the development context as well and finally the uh access and repairing of the systems uh from a rescue environment meaning if something has happened in incident response and the main system is unbootable uh administrators can use CH rout to access and repair those systems from the rescue environment because uh something may have happened and there is a crash or some kind of a uh something that they need to recover from essentially so that they can go back to what’s known as business as usual so there’s a recovery point and usually they take those images every time that they do a backup where they would have hopefully frequent backups of the most recent recovery point and then administrators can actually use CH rude to be able to access the system itself from what is known as the rescue environment which is essentially a jailed environment a chro jailed environment and then they can hopefully recover the system so the use of this command uh first requires that you actually create a directory that will serve as the CH root jail so we’ve already covered how to make a directory so you would make a directory and you would need to use pseudo because you’re going to use this particular directory in an elevated privilege environment so that other people can’t interact with it without having pseudo access so you would create the directory using pseudo and then whatever the path of the directory is this is what we’re going to use as our actual jail environment and then you’re going to populate that jailed environment um by either copying or installing the necessary binaries libraries and files into that particular environment so you would create the environment and it could be what we saw from that visual example it could be something that belongs to a user and then you would populate the contents of that directory with the libraries the files and uh binaries everything else that would be necessary for that particular environment to run and this is the basic population of it that happens right so in this particular case They’re copying bin bash to that directory uh They’re copying uh recursively everything that’s inside of the library and Library 64 and everything that’s inside of the user all of that is going to be copied to the new directory that’s been made which is the jailed directory right the imprisoned direct I like saying imprisoned for whatever reason but so all of the contents of everything that needs to be used for that specific uh jailed environment it needs to actually be copied into that environment so it’s it’s this part is very very simple um the r command is a recursive command meaning it’s taking everything inside of the library as well as all of the subdirectories and the contents of those directories all of those things are going inside of the the jailed environment as well as all these other examples as well so what you want to do is ensure that the directory structure within the CH rout jail mimics the standard Linux directory layout uh essentially meaning that you would need the the binaries as well as the the system binaries all of those other things to be able to make sure that this actually works so ENT what ever you would need in a regular environment in a standard Linux directory layout whatever you would need in that standard layout for your particular environment you need to make sure that it also exists inside of this jailed environment because it is being isolated from the rest of the system once you’ve transferred everything inside of the jailed environment then you just use the CH root command to change the root directory for the current process to the specified directory so what you would do is just run pseudo CH route the path to the actual directory we’ve created and all of the contents inside of it right and so what that does is just changes the root directory for the current session to that particular directory meaning uh for when you’re logged in into the current session inside of that Linux machine once you’ve created this uh jailed environment and you put all the binaries and everything that you would need in order for it to run uh then you change the route to that particular directory and for the remaining session that you’re logged into it’s going to act as if this directory is your actual root directory and if you want to run anything and uh test any particular development upgrades or if you wanted to open up a a file attachment for example to see whether or not it runs properly or if it’s malicious or malware or anything like that you would do it inside of this particular directory to protect the rest of the system from it or or to test whatever you need to test uh without affecting the rest of the system and this is a full workflow flow from beginning to end so you have creating and populating their directory itself so you have the make directory and then this is going to be the name of the directory and then you’re going to copy everything that you need for that particular directory into it which is going to be uh in this case these are all the various options that it can possibly be so you’re going to have my CH root and then inside of my CH root you’re going to take the bin bash put it inside of the bin the lib is going to go inside of the root and then the other pieces the 64 version of it is going to go inside of the root as well and then you’re going to change the environment that you’re in to that particular environment right so you’re going to do CH root my chroot and then you’re going to run it and that’s it and then once you’re inside of the chroot environment you just want to make sure that everything is actually running as it should so you can run something like LS and the forward slash which means that you’re you’re trying to list the contents of that root environment so when you do the forward slash you’re essentially listing the contents of the root directory and if you have done everything correctly you should only see the contents of this new root environment uh instead of everything that you would normally see when you look inside of the root directory so it should technically only just be uh these pieces right here right cuz we copied all of these things so you should only see the bin and the lib and then lib 64 those should be the only things that you see once you run this LS command to see what’s inside of the root directory some considerations and best practices to keep in mind so you want to keep the uh chroot environment the jailed environment as minimal as possible M making sure that you only have the stuff that you actually need all the necessary binaries or libraries or software anything like that to just make sure that you’ve reduced the attack surface if you copy everything that’s inside of your normal environment inside of the jailed environment it’s kind of defeating the purpose of creating this isolated environment so you only want to use the things that you actually need for that exercise because again it’s only for that session anyway ways and once you’re done with that session you I mean you can reuse it it’s not like you can’t reuse it um but for that particular session you should just only be using the things that you actually need um make sure that all the file permissions within that Jail uh are set correctly to prevent any privileged escalation uh attempts and the Escape prevention to uh to make sure that the person if in case anybody has actually uh attacked that particular jail directory uh to make sure that they can’t escape essentially uh you want to avoid running any services or granting access to tools that can allow processes to escape the CH route jail which goes back to the very first point which means only include the necessary binary so you don’t want to give access to or you don’t want to copy any binaries inside of the jail that would potentially Grant the attacker uh an Escape Route right they don’t want to have a vector to get out of the jail and then get into the rest of your system in summary CH route is a very valuable tool for creating an isolated environmental Linux essentially creating a Sandbox for yourself uh which enhances the security by restricting programs to a specific part of the file system and it’s commonly used for running potentially untrusted applications like we said um development and testing as well as system recovery so this is actually a very useful little strategy um that comes embedded within Linux and you can essentially turn any new directory that you’ve created into this little sandbox environment to protect the rest of your system from anything potentially that may go wrong or just if you wanted to test something so we don’t you don’t even necessarily need to worry about uh hacks or anything like that a lot of times you just want to test a new development uh or a new upgrade in the code or a new upgrade in the software so on and so forth and you just want to make sure it doesn’t affect the rest of the system and uh wipe something on the system or crash the system or accidentally delete data so on and so forth forth so it’s a very useful tool to create an isolated environment also known as a Sandbox type of an environment all right now we’re going to take another look at file permissions and ownership and this time we’re going to dive a little bit deeper into this particular concept so uh as you should already know there are different levels of file permissions that we can apply to something and there are different groups that can have ownership as well as access to uh any of the files or directories or anything that exist on a particular system the three different categories would be the owner which would be the user or any other user so it falls under the other category as well but there’s the particular owner which would which would technically be a user um there’s the group and then there are others and then the others would also considered to be users and they could also be considered to be the group that may have access to it so we have these three levels of ownership or access okay and then these three levels also have three levels of permissions which is the read permission the write permission and the execute permission and so if we remember the read permission has a numerical value of four the write permission has a numerical value of two and the execute permission has a numerical value of one so if you had all of these turned on you would have a value of seven right uh if you only have the read and write you only have a value of six if you have to read and execute you have a value of five so on and so forth so these are the permissions and then they can apply to any of these particular ownership categories okay so the first one that we want to think about in this case is just looking at the breakdown of what these things are again and then looking at what these particular breakdowns are again just to kind of give you an idea here the very first character which is represented by this Dash right here um if it is a dash it is a regular file if there’s a d right here instead of that Dash it means that this particular item is a directory and then an L would be a symbolic link so on and so forth so this particular uh item this very first character represents the type of the file that it actually is the next three characters represent the owner’s permission so in this particular case there’s a read write and execute permission that has been attached to this particular item and then the following three characters represent the group permission so it is read not write permissions but only execute and then this will be the other category which is read no write permission and execute so the group as well as the others category both get read no writing permissions and execute and writing represents modification or writing to the file so on and so forth so you can read the file or you can execute the file you can read the binary you can execute the binary so on and so forth but you won’t be able to write to it you won’t be able to modify it okay that’s what this particular example represents knowing this now we can look at what it uh means to change the permissions of something right how to change the permissions of something and we can do this with ch mod also known as change mode and in this particular example we’re doing it with the symbolic version so instead of using numerical values we’re actually using the symbols which is read write execute so on and so forth so in this case we’re looking at this particular example again what we’re doing is we’ve given the user read write and execute permission so you plus read write execute and then group plus read and execute and other plus read and execute and then the name of the file name usually this also requires a pseudo right in front of it so it would be pseudo chod yada y y and so we’ve given the user these permissions we’ve given the group these permissions and we’ve given the others category these permissions as well and then there’s the numerical value the numerical mode of changing permissions which happens with the actual numbers themselves and so the first number represents the user the second number represents the group and the third number represents the others category and as we already established the read write and execute permissions total seven and so in this case they only have read and execute so read would be four execute would be one which totals five and then read and execute would again total five in this particular case so this is what it looks like to change the permission numerically for that particular file name and this exactly is representing what we did in this previous case which was we were doing this whole Spiel right here so everything that you see here actually turns into 755 in this particular case now if we wanted to change the ownership of this particular file we would do it using the CH own Chone uh command and so you can see that they’ve used pseudo in this particular example and we’re doing CH own user and group and then for this file name and so what you would do is you would just replace these two Fields right here with the actual user and the actual group um that you would want to dedicate to this particular file name so it would be you know user one for the developers group and that would be uh for this particular file name so you change the ownership of the file to the specified user and group and you replace user with the username and group with the group name and then it would change the ownership of whatever this file is to those entities that we have declared and this is what that looks like so we’ve assigned it to Alice and the developers group and that would be the ownership for the example. text file in this case then we have changing of the group so you can change the group ownership and you would just do it with the chgrp change group uh command and you would first use the uh first assign the group itself that you wanted to be changed to so let’s say again developers and then you would give it the file name and again they’re using the suit pseudo command uh to allow the pseudo or administrator type of permissions to this particular command cuz you are again you’re changing the actual group of a particular file the ownership of that particular file so that should require an administrator’s privilege and this is what it looks like as the actual example right so we’re doing the example. text and we’re assigning the ownership of this particular file to the developers group using the change group command so here’s a sample workflow here so we have the file that’s been created so by using touch we created this file then we did change mode to change the permissions of it and so the owner has a permission of six which represents read and write and then everybody else has a permission of just read which is represented by the four and that’s pretty much it that’s the permission that we have over here and then if we do LSL for this particular file it should show us that it only has read write and then everybody else only has a read permission for that particular file if we wanted to change the ownership of that file we could just use the CH own command same exact file name and then it’s going to be I mean you just saw this in the CH own uh portion of the presentation anyway so this is literally a duplicate command that we just saw but we are assigning the owner uh to Alice so Alice would actually be the user and then the group ownership would belong to the developers group and it’s for the example. text and then again you just do LS and with the dasl option so you can see who the file ownership is and uh it’ll show you uh the First Column that right after the permissions it will show you the name of the owner and then the very next column would represent the name of the group and it would be Alice and developers in this particular example and in this example we’re changing the group which you’ve already seen this command already as well but we just wanted to kind of reaffirm this particular series of commands here so this this is how you change the group and then if you wanted to see who the new group was you would just do ls- L and it would show you that change so in summary understanding and managing file permissions is very important to system security because you don’t want people who should not have access control or should not have access to a specific fer directory to actually have access to that fer directory and to guarantee that to confirm that you have only assigned uh permissions for uh certain items s to certain people or certain groups you would do it with these various options so you would change the permission for everybody right according to what should be the permissions for it some people should not have any permissions right so you can remove the read write and execute permissions from every single person by doing a minus instead of a plus so in the previous example so if we go and look at just the particular example where we had the let me real quick quick skip to it in this particular example where we’re doing a the others group and we’re doing plus read and execute we could just as well do minus read and execute and it would remove the permissions for that same thing with the group we could do group minus read and execute and it would remove the permissions for that so just depending on what the instructions are and what you’ve been told to do cuz they’re going to tell you you know these groups of the company should not have access to these files or directories so need to remove permissions of anybody for this particular directory and all of its contents and when you assign when you bring in a new employee and they have access to a certain group and all of the assets that belong to that group this particular subcategory of those assets should not belong to these new employees unless they’ve hit a senior level secure uh a level of seniority excuse me and then when they hit that level of seniority then you can give them access until then they should not have access to it so you remove access to that specific category of files or folders by doing the owner or group or others and then just doing a minus command for that and then if you wanted to do it numerically then you would change these things to zeros and the zeros would represent no permission so it would be 70 meaning that the group as well as the others category have no permissions they can’t read they can’t write and they can’t execute that specific file so this is what’s important about the change owners ship and the change mode which means the permissions of it and of course the change Group which would change the group that owns that particular asset so this is a very very simple series of commands but they’re very powerful series of commands especially when it comes down to access control lists and making sure that people who should not have access to something don’t have access to those things and then whoever needs to have access to it actually can access whatever those assets are which brings us to access control lists so Access Control lists are a way to provide more more fine grained control over file and directory permissions uh other than the standard Unix file permissions that we just reviewed so with an ACL and access control list you can Define permissions for multiple users and groups in a single file or directory um or on a single fil in directory excuse me uh allowing for these specific access levels beyond the traditional group others and owner model so this is how we would go about doing it for first and foremost we need to have the entries of the access control list uh which means uh it’s basically a list type of a format and it’s uh each entry specifies the permissions for a single user or a single group um and it consists of the type which would be user or group The identifier which would be the user name or group name and then the permissions that are set for that which would be readwrite or execute so it I’m going to show you obviously examples of that um there’s going to be the user access control list the group Access Control list there’s a mask Access Control list and then there’s the default ACL so the user one specifies permissions for a specific user the group specifies permissions for a specific group a mask ACL defies the maximum effective permissions for users other than the owner and the groups so this is the maximum permissions that they have and everything beyond that they don’t have access to and a default ACL specifies the default permissions that are inherited by new files and directories created within a directory so anything that’s inside of a directory would essentially inherit the permissions of that directory uh so if the directory has a permission level of seven meaning you read write and execute essentially everything inside of that directory would inherit all of those permissions so it would also be rewrite and execute for everything that’s inside of that directory so the basic command here would be set f F ACL so set faac would set the ACL itself and in this particular case would be the mask for the user and then the read write and execute for the file name right so U represents user the user itself would be the actual name of the user and these are the permissions that are being assigned for that user um this is the actual breakdown of the command that we just saw so the M I’m sorry so the m is not mask the M actually stands for modify so my apologies for that um the explanation here is that we’re adding read write and exq permissions for these specific user on that file name and then we’re using the m to modify the ACL entries and then this is the actual uh string or the actual format of how to set permissions for that user which is in this case the user would be the the username so this could be user or this could be you this would be Alice and then these are the permissions that are being assigned to Alice now if you wanted to view the ACL for that specific file name you would do get fac so the first one was set faac this is get faac and then it displays the ACL entries for the specified file which in this case would be the file name showing all the users and groups within their uh with all of their defined permissions so whoever they are and the uh the level of permissions that they have on this particular file would be displayed by doing get fic the first one was again set fic and now we’re looking at the file name permission which would be get of ACL so this is the example workflow so we have pseudo because this does require administrator privileges and we’re setting the access control format for this particular file right here by modifying it and then it’s going to be for the user Alice and then she’s getting read write and execute permissions on this particular file so set faac modify for the user Alice readwrite and execute and of course the U is separated by a colon and then the the username and the permission levels are also separated by a colon and so pseudo F pseudo set ficm yada yada Y and this is how we add a user permission for a given file if we wanted to add a group permission it would exactly be the same entry so setf AC still going to be modified don’t only thing that we’re changing is instead of U we’re doing G for group and then we’re going to assign the name of the group which is developers and then they all will have a read and execute permission so they can’t write to the file or modify the file but they can read the file they can execute the file and that would be the example txt file again so this is how we set permissions for a group in this particular case and then we have the Set faac uh for directories right so this is the actual directory and in order to do that we need to assign the D flag or the D option in this particular case to designate that we are modifying uh the access control list for a directory so pseudo set f-d to designate that this is a directory DM to modify and then again for the same user and now we’re giving the path to the directory instead of the name of the file and then we’re going to view it so if you wanted to just view the uh the access controlers permissions for all of the groups and user users that have been assigned to this particular group you just or for this particular file excuse me you just do get faac and then the name of the file very very simple and then if you wanted to view the current example and then we can view the access control list for this particular file so we would do get fic and it gives us the access control list of the groups and the users and whatever permissions they have on this particular file and this is what the example output for that command would be so so the name of the file would be example txt the owner itself would be root the group would be root and then this particular user has this permission and then this particular user has readwrite and execute permission and then the group has this permission the group developers has read and execute the mask itself would be readwrite and execute and the other group or everybody that falls into the other category only has read permissions on this particular file right here to remove ACL entries would be done with the X option so it’s still a set FC command but now you’re doing the X option to remove uh the particular user from this particular file right so you don’t have to worry about the permissions or anything like that because you’re literally removing that entire user from the access control list on this particular file so whatever permissions they had is all being removed because we’re just removing the actual user from the particular files Access Control list to remove all of the access control entries for any given file it would be done with the dasb option and again it’s still a set f-b and then the name of the file itself if this was a directory we would do dasd to designate that it’s a directory then we would do DB to designate that we want to remove all of the entries and then we would give the name of the directory but it’s still a set fac command that is going to remove all of those directory uh entries or the a ACL entries excuse me so DX would be for a singular person for a single removal so it could be a group as well so you could technically do-x and then do G right here and then the name of the group so developers let’s say so we do G for group and then the name of the group developers and then we would give the name name of the file but we would do with a-x to just remove a singular entry if we wanted to clear the entire set of entries if we wanted to remove everybody from the access control list we would do a-b command and then we would just give the name of the file if it’s a directory you would do d d as well as DB and then you would do the path for the directory and then it removes every single entry for the access control list on that particular asset to modify the mask you would also use the set faac in this particular case and then we are using the- M to modify but what we’re doing in this case we’re modifying The Mask which is rewrite execute for this particular file and what that does is that anybody so anybody any user and group specified by this Access Control list for this particular file they all have rewrite and execute permission so even if you did user Alice and read if the mask itself is read write and execute then Alice would actually get read write execute permissions on this particular file and then if the developers group is in here and they only have read as a permission if you do readwrite and execute for the mask then it would allow it that they would be able to read write and execute for everybody that’s inside of the developers group on this particular file so in summary the access control list is an advanced method for managing file permissions in Linux allowing specific access levels for multiple users and groups the commands set fic and get fic enable you to set and view these fine grain permissions easily and then you already know the various options that we have so if you do you know dasm you’re modifying the permissions and that goes with set fic if you do-x you’re removing a singular person or singular entity from the list if you do- D you’re designating that it’s a directory and if you do dashb as in boy you’re removing every single person and every single entity from that access control list and then you would modify them you know specifically if you wanted to add a user you would have to have the U in front of the username if you want to add a group you would have the G in front of the username so on and so forth so I’m not going to repeat all of that stuff because you could just re rewind it and just go watch all of that Al together but uh set fic and get fic are the commands that we would use to uh set the access control list for any given asset and see the access control list for that asset all right now we need to talk about network security which is also very very important concept under the security umbrella and the very first portion of it would be the management of the firewall we’ve already done an intro to ufw which is the uncomplicate ated firewall as well as IP tables but we’re just going to review these things a little bit more so you should already know that firewalls are very important in network security they act as a barrier between your internal and external networks um and they monitor and control incoming and outgoing traffic and these are done with rules that we establish so uh both ufw and IP tables have sets of rules that you can use to ensure that whatever you’re trying to do uh gets done right so uh the first thing that we’re going to talk about is the ufw the uncomplicated firewall the uncomplicated firewall is a very simple firewall but it’s a very powerful firewall it doesn’t have require uh it doesn’t require like very complicated uh commands or understanding a variety of different sophisticated binaries uh to be able to run it it has easy syntax and it’s powerful it actually does what it needs to do right it gets the job done essentially um to first run it to actually activate it you need to run pseudo ufw enable and essentially everything else that we do is going to require a pseudo command to preface the rest of the commands so first you need to enable the firewall you need to actually activate the firewall the next thing you want to allow or deny traffic now to be able to allow traffic uh in this particular case what we’re doing is we’re in the First Command uh this Command right here is allowing incoming traffic on Port 22 which is SSH right this particular Comm and you have to if you want to do outgoing traffic as well if you want to allow outgoing traffic on SSH you would need to also say pseudo ufw allow outgoing excuse the capital S right here it’s a lowercase s for pseudo so anything that you want to uh set up as a rule that would be outgoing outbound traffic you need to put the keyword out in it so this particular command at the very top this first command is only dealing with incoming traffic so it’s allowing incoming Port 22 traffic same thing with this command it’s allowing or it’s denying incoming Port 80 traffic in this particular case we would be denying outgoing particular uh uh HTTP traffic on Port 80 right so uh pseudo ufw deny outport 80 traffic would be denying outgoing outbound HTTP traffic on Port 80 if you want to check the status of what’s going on with all of the rules that you have have that would be done with the ufw status command so it will show you whether or not the firewall is actually active and then what all of the active rules are that are running on the ufw firewall this is a new command that we’re going through which is uh allowing traffic from specific IP addresses right so you can say ufw allow from this IP address to any port on Port to any port 2022 I keep saying 2022 I don’t know why I keep doing that but it would allow anything from this particular IP address to any port 22 so it allows SSA traffic from this IP address on Port 22 um you can do a deny that would be on Port 22 as long as you have this allow from this IP address now you’ve whitelisted this particular IP address and even though everybody else could be denied on Port 22 this particular IP address will be allowed to come in and again this is inbound right cuz we haven’t done allow out we’re only doing allow which means that it’s inbound traffic incoming traffic and it’s coming from this particular IP address on Port 22 so this is how you allow a specific IP address IP tables would be the more complex or we can call it the sophisticated it would be the sophisticated counterpart of ufw um it allows more detailed control over the network it allows ad administrators to create complex rules for packet filtering Network address translation which essentially means that it masks your IP address so that people from the outside can’t see your actual IP address so when you send out traffic from your network it’s uh it’s masked essentially um by a different IP address and that IP address would be what the outbound or the outside world would see which is a very very powerful tool actually and then you know there’s mang and so on and so forth so it’s very very useful but it does require that you have a better understanding of networking Concepts the very first thing would be to just learn and see what your current rules are so you can just do pseudo IP tables DL with a capital l and it will show you all of the current rules for your filter table so this is the other thing right so you have a variety of tables that exist inside of Ip tables so in this particular cas case without designating what table we want to look at the default will be the filter table so it’ll show you the input forward and output chains for the filter table if we wanted to add a rule to our table uh we would do it with the a so capital A which would be either append or add however you want to remember it but it would be a capital A and we’re doing it to the input table and notice that the input uh table is also in all caps so you’re doing capital A input and then the protocol would be TCP so this is TCP traffic that we’re allowing to happen and then we’re doing on the destination Port 22 and then we want to do an accept for it so the J I believe is Jump I think that’s what it stands for um and what we’re doing is accepting the traffic on destination Port 22 and this is for input traffic which represents incoming traffic that’s coming in on our input table or input chain excuse me on the input chain on our particular table so we’re doing adding on the input chain on the protocol TCP destination Port 22 and we want it to be accepted if you want to block traffic again you’re adding it to the input chain protocol would still be TCP and the destination port in this case would be 80 and then what you want to do is you want to drop the traffic so there’s no it’s not deny if you want to block something you’re dropping the traffic so the previous one was accept this one is drop if you want to save the rule that you just created the IP tables rules uh you would do it with the iptables-save command and then you need to add it to the configuration file which is a rules. V4 config file that is inside of the Etsy IP tables directory so inside of once you have IP tables installed inside of the IP tables directory inside of the ETS directory there’s going to be a file called rules V4 and these are all of the rules that will persist right so when you restart the computer the all of the rules that you just created will also be saved so that the next time the computer reboots all those rules are still active um if you want to restore rules you would do restore and then notice the arrow right so in this particular case we have this forward slash which uh I guess is the greater than sign so we are saving the rules that we just created forward uh forward sign and then it’s going to go inside of the rules V before if you want to restore it when you use the restore command you’re using the less than symbol and then you’re going to use the same path which is going to come from the the rules. V4 file and it’s restoring the rules from a saved file uh just in case you whatever reason for whatever reason you need to restore your rules if somebody changed your rules or if you changed your rules and you want to go back to your previous rule set that You’ saved you would pull all of those rules from your rules. V4 file by doing the restore command in summary we have the simplified management with ufw uncomplicated firewall and then we have the IP tables which is Advanced control for detailed Network traffic management you have the network address translator you also you also have the mangles and all that stuff which we covered in the first description of the IP tables and the ufw but we needed to talk about firewall management because we are talking about network security under the security chapter so it was very important to re-review these files or re-review these tools um but we are going to actually use them when we get to the Practical section of this training series anyway so we’re going to run a bunch of commands and we’re going to create a bunch of rules and do a lot of things that will be relevant to your labs and uh ultimately your skill set as a Linux administrator when you want want to manage and you want to configure firewall rules for either ufw or IP tables another really important tool for security is actually SE Linux so security enhanced Linux um I’m just going to call it selinux uh selinux probably I don’t know if it’s selinux SE Linux I kind of like selinux cuz it it removes one syllable so selinux is a security module in the kernel that provides a mechanism for supporting Access Control policies includ including Mac Mac addresses um it’s commonly used in the red hat based distributions like Fedora Centos and red hat Enterprise Linux and the key concept here is that there are policies right so we have policy-based security um and the policies inside of selenex uh Define the rules for which processes and users can access which resources so these policies are strictly enforced providing an additional layer of security Beyond traditional discretionary access control which is also known as DAC so these are an additional layer of security and the they are very strict right so these are strict enforcement to set the rules for which processes and users can access which resources um the modes of operation are to enforce to permit or to disable so you have the enforcing mode you have the permissive mode and you have the disabled mode selinix policy is enforced and access violations are blocked so this is what enforcing is permissive is that the policies are not enforced but violations are logged for auditing purposes and then there’s the disabled which is it’s just turned off and it’s not working so you actually have the enforcing rules that anything that violates the policy is being blocked you have the permissive which is still logging everything but it’s not blocking the actions right so if there are malicious actions they’re not being blocked but everything is being logged so that it can be audited later on and then you have the disabled version of this so our very first basic command would be to see the status of what’s going on so the to be able to uh view the status of selenic including what mode it’s running under and all of the loaded policies or whatever the policy is that’s loaded you just do SE status so selenex status right SE status and then you would be able to see the current status as well as the policy that’s being enforced if you want to set the policy to enforcing mode in this particular case what we’re going to do is do set in force right so set in force and then one I believe represents true so zero would represent false one would represent true and in this particular case we’re saying that we want the enforce policy to actually be activated so we do set in force one and it changes the cenx mode to enforcing and this command ensures that all cink policies are strictly in forced if you do set in force zero it goes into permissive modes yeah so I I was actually uh misguided or misinformed I Mis assumed um the set Linux to permissive mode goes into set en force zero which changes it to permissive allowing violations to be logged so one represents the enforcing mode to actually be active and it is enforcing all of the rules and zero represents permissive mode so that it logs everything but uh it doesn’t block anything it doesn’t enforce anything of the rules or policies that may uh be in place and then we have checking the status as we’ve already established and then we have the Practical example or the the rule set to just kind of re remind us of how to set enforcing mode which is in this case set enforce one and then if you wanted to set the enforce to zero you would be putting it in permissive mode and this is essentially what the selenic Practical examples would be in this particular case uh we are going to go into the Practical commands when we actually get into the Practical section of this training of this training Series so that you can learn how to use this particular tool in depth another useful tool is app armor so app armor uh also known as application armor is another Mac system that provides and Mac we’re talking about Mac not Mac as in uh Apple so it’s another Mac system um that provides an additional layer of security by conf finding programs according to a set of profiles it’s commonly used in De based distributions like Ubuntu which is what C Linux runs on and so on and so forth so anything that essentially runs on davan or Ubuntu um is what app armor uses because it deals with applications specifically um the key concept is that there’s profiles in this particular case so previously we had policies which was in sellx now we have profiles inside of app armor and the profiles Define the access permissions for individual applications they specify which files and capabilities an application can actually access preventing it from performing unauthorized action so think about this as when you try to turn something on inside of your windows or inside of your um Mac OS and there’s a little popup that comes up and it says you know do you want to allow Google Chrome to access your microphone for example and this would be something that’s specific to the Google Chrome application and then it’s now giving uh your you’re giving that specific application access to the microphone or your downloads folder right so if you want to if you download something and it deals with the files inside of your file system uh once you try to run that application there’s going to be a popup that shows up from Mac OS or from Windows that says hey do you want to give this particular application access to your documents folder or your downloads folder or your pictures or so on and so forth it starts asking for permission to navigate across your computer and so this is what app armor is similar to and because of the fact that you have applications that are downloaded on an obuntu based distribution um then you are now dealing with the actual app itself and the app needs to be given permission to do a variety of tasks across your machine so we have the learning and enforcing modes in this particular case so enforce profile is uh authorized to access attempts and uh unauthorized access attempts are blocked excuse me so the app armor profile is enforced and unauthorized access attempts are blocked and then complain would be that unauthorized access attempts are allowed but they’re logged for review so this is similar to the enforce and the passive or permissive I think it was um it’s very similar to those particular profiles that were inside of sellic they’re just named differently so in this case we have enforce that it enforces the rule and any unauthorized access is blocked and then you have complain that uh allows the attempt but it just logs it for review uh the basic command to check the status would be pseudo AA standing for app armor Das status and it displays the current status of app armor including which profiles are loaded and their enforcement mode and then you have setting the app armor to enforcing mode for any given profile so in this particular case you would do AA enforce and then you give it the path to and it’s a it’s a fairly lengthy path but it is the path to uh the specific application that is going to be uh enforced for whatever the rules are so the profile for a specific application to enforcing mode ensuring that the rules are strictly applied to whatever the name of this application is going to be and then you will need to get the name of the application as it is inside of the user binaries uh and it’s very different from what you would see uh for Google Chrome for example which is a capital G Google and then there’s a space and a Capital C Chrome it’s rarely like that it’s typically all one word it’s typically all lowercase so you need to find the name of the application as it stands inside of your binaries or your optionals or wherever that application is actually installed and whatever that path would be to that application and then you would enforce the app armor rules upon it you would enforce the app armor enforcing mode for any given profile if you wanted to set the policy or the profile to complain then you just do AA complain and then you just give it the the path of the application very similar to the previous one the only thing that’s changed in this case has been aa- complain instead of aa- enforce as we saw over here and that is how we designate the complain mode for that specific application just a couple of practical examples here so again you can just do AA status to get the status of app armor you can do AA enforce to enforce a profile for a specific rule or for a specific application so in this case it’s Firefox right so Etsy apppp armour. D user. bin. Firefox and this is the specific application that is now enforced whatever the rules are they’re being enforced upon Firefox in summary we have both selinux and app armor that are robust security mechanisms for l systems through mandatory Access Control policies that’s what Mac Mac stands for so you have discretionary access control which was DAC and then you have mandatory access control which is mac and while selenex is typically used in Red Hats distributions app armor is used in deas distributions and emphasizes application specific profiles uh selenic focuses on systemwide policies across the entire system app armor is honed in on specific applications and they’re both very very useful and they do deal with mandatory Access Control policies which are also very very useful so Mac mandatory access control and that’s really what it means right so it’s like when when you go through coma cyssa plus which is uh an examination that I had the privilege of taking they give you a lot of situations they give you like a breakdown of okay this is mandatory access control and this is what it applies to and so on and so forth but you don’t really get it until you actually go through some tools that enforce those things and you’re like oh okay so when I enforce uh you know Firefox to not have access to my microphone for example that is technically a mandatory Access Control policy that mandates that that specific application cannot access my microphone and then until if I wanted to access my microphone for a zoom call or something like that or for a Microsoft teams come Zoom is actually its own application so Microsoft teams or whatever or let’s say dis uh what is it called Discord uh you can also have voice conversations on Discord so if you want Firefox to actually finally get access to your microphone when you do those voice calls then you do need to go and give it that permission inside of app armor for example if you’re running Linux right so this is very important to understand that this is actually where the rubber meets the road and these specific concepts of Mac there that’s how you enforce them by using something like app armor which it it could be as simple as a popup that shows up on your screen that says hey do you want to give this application permission to do this right it could be that or you can actually be going inside of your settings or go and inside of your terminal and use the the enforce policy or the enforce profile for app armor against the Firefox tool itself so um this is the difference between app armor and selinux all right now we’re going to switch gears a little bit and we’re going to go into user authentication and configuring secure shell which kind of actually do go hand inand so first and foremost let’s talk about user authentication methods um there’s password-based authentication which is the default mode for authenticating any given user everybody gets a password and you have to enter your password correctly to authenticate yourself to prove that you are who you say you are now the users are given a username and they’re given a password to gain access to a system to gain access to an application this is not news so if you don’t know if you don’t know this and you’re watching this tutorial you’re in trouble right so it’s like to understand that there is a username and a password for everything in this world anything and everything has a username and password YouTube that you’re watching this on most likely you have user uh profile with Google that’s connected to your YouTube account and you provided your user uh Gmail as well as your password if you’re watching Netflix you get a username and password if you want to access your phone there is a pass phrase or a PIN number that you have to enter to access your phone and then when you do your face scan or your fingerprint that is still authentication but now it’s going into Biometrics which I’m kind of getting ahead of myself but essentially you’re authenticating yourself in a variety of different ways and the very first one the most basic one is passwords so you get a password-based authentication password-based authentication uh improves the security of any type of an environment and to be able to improve the security of the password you do something like multiactor authentication which is also known as MFA and that could be something like a code that’s been sent to your phone or your uh Gmail account your email account they send you a secret code a onetime code and then you enter that and you can access the system the fingerprint scan is a version of the biometric that we were talking about that helps you multiactor authenticate yourself to be able to get access to that system and these things are done in addition to the password so if you’re accessing your bank account on a new computer or on a new browser or you’ve it’s the same exact browser or same computer but you reset the computer like right you you reformatted the computer and so all of your cachets are wiped or you just wiped your browsing history from chrome and your caches and your cookies are no longer saved when you do something like that it says oh you’re logging in from a new browser we’re going to send you a one-time code in addition to the password right so you enter your email and password you log in and it says okay we’re going to send you a onetime code and then when that happens now then they ask you do you want to save this particular browser for future references and then that’s how you develop new cookies and new cachets right so this is the whole process of multiactor authentication and it enhances the password authentication and it’s a very useful way to reduce the risk of unauthorized access because sometimes somebody may get access to your password but if you get text a code saying hey enter this in your login most likely that person won’t have access to your phone number or your email hopefully I mean it’s scary to think about but it it is possible for people to get access to your email and even your phone it’s just not as easy as somebody running a dictionary attack and finding out your password if you have a really weak password so um yeah again I’m getting I’m going off on tangents and I’m getting ahead of myself but uh we can enhance password based authentication by using something called multiactor authentication which is very simple and you just send a a code to somebody that’s one of the most useful ones and it’s very very useful the next level up would be a public key authentication and it’s more secure than password because it involves the use of a key pair so a private key and a public key the private key remains within the user or with the user while the public key is placed on the server and this goes into the realm of symetric and asymmetric uh encryption that happens typically with transactions that are done with your browser so uh a transaction is you viewing something on your browser but there is the private key and the public key that is available from that website so the private key is something that you can’t see from the certificate authority of that website that they have to authenticate themselves and there’s the public key that is given to you the viewer so that you can verify yourself and you can go back and forth interacting with the conversations or the interaction the transactions within that specific uh browser within that specific website right so you have a public key Authentication that just supersedes it it is uh it’s an amplified version of the password-based authentication um it had enhances your security so it is no longer a uh subject of Brute Force attacks because you can’t brute force a key right and the the private and the private key or private and the public key excuse me um it doesn’t require the transmission of passwords over the network because you’re just dealing with those keys it allows for automated passwordless logins which are particular particularly useful for scripts and applications and this does include at least a one-time login though because there is that initial U authentication that needs to take place with that uh password but then you just you automate the rest of that process because you now have a public and private key so to speak that uh communicates with that website or that application so that you no longer need to do a a password entry and this is how when it remembers you when a browser remembers who you are and you don’t need to provide that password anymore the next time that you log on to Facebook it just logs you in right the next time that you log on to Gmail even if you’ve closed the tab even if you’ve closed your browser the next time that you log on then it’ll actually just log you back in without asking for your password because the key is in place um if you wanted to generate a key you would do it with s SSH key gen and this is a little bit complicated of a process so we’re not going to go too deep into to this uh we will when we get it to the Practical section of this training series and we start doing these things um later on but uh you do SSH key gen and it actually generates a new key pair and you’ll be prompted to enter a file to save the key2 which is typically the SSH ID RSA and then optionally you can set a passphrase for an additional layer of security on that actual key right so if somebody wanted to access that specific key file they would need to enter the password to be able to access that key file so now you have multiple layers of security and so you create a key using SSH key gen and you can even uh designate the algorithm the hashing algorithm of the key that you want to generate so do you want to Shaw one sh1 do you want to sha 256 so on and so forth right so you can designate the hashing algorithm that you want to be used when you’re creating this SSH ID RSA by default it does a Shaw 256 key which is a very very pass uh powerful hashing algorithm this is what the output looks like so when you actually run that SSH key gen this is what the screen will look like so these are individual commands that show up right so first it says is generating it enter the file uh that you want to save the key to and then you would say such and such and then you press enter and then you say it says enter the pass phrase so if you don’t want a pass phrase you just press enter and it moves on but I do recommend you actually have a pass phrase for your key and then you verify your pass phrase again and then it’s been saved and then it’s been saved and then the key fingerprint is a shot 256 as you can see right here as well as the username uh and the host and it’s like a long series of characters that comes in after this particular portion right here and it seems like jargon and I mean for the most part it is uh it’s like you you can’t make out what it is because it it very much looks like a code like a long piece of encryption code so you you can’t make sense of it with a naked eye you need to feed it into something at the very least to try to use some other type of decryptor to be able to get access or decoder I should say you should you need to use a separate tool to try to make sense of what you just found but for the most part you can’t because it’s not designed uh to be able to be decoded or decrypted without its actual key right so whatever is generated nobody can make any sense of it unless they have the key they can’t unlock the lock unless they have that key which is why this concept is so powerful now that we’ve generated the key now we want to copy that key to the actual server so you would do SSH copy ID and then the user at whatever the server is and this copies the public key to the server placing it inside of the authorized keys of the SSH directory um for the specified user and this step allows the server to authenticate the user based on the public key copy the public key to the actual server if the SSH copy ID is not available you can manually copy the public Key by going and finding where it is so this is the actual key location right so where we when we went over here it said that it was saved inside of the home user and then this is a hidden uh directory and we’ll we’ll look at this when we go into the Practical section when you’re looking for a hidden file so if I did a regular LS command without looking at the hidden files this would not show up so this is a hidden directory and then inside of the users home folder um the hidden directory includes this ID RSA file so this is actually being saved inside of the home folder of the user and so if you can’t do it uh autom automatically using the SSH copy ID you can do it manually by going through these various series of commands right here so we can see that we’re going to concatenate this specific key and then we’re going to pipe so this is what this pipe is and we’ll talk about this again when we get into the Practical section but what we do is that we take the output of this command so cat as you should already know will display the contents of this file but when you take the output of this which is displaying the contents and you pipe it into this command which is using the SSH uh binary to log into this specific user and then make the directory of this specific directory right here and then concatenate that so you should already know what these double ends represent right so we’re combining a series of commands here so we’re going to do SSH user server we’re going to make a direct Dory for this specific directory inside of the root and then we’re going to concatenate that into the authorized keys so it’s a series of commands first we read this we take the contents of this and we pipe it into the SSH command that goes and creates a new directory and concatenates the contents of that inside of the SSH authorized keys this you don’t need to memorize you don’t need to know what all of this represents right now I’m just showing you what a manual version of this looks like so you kind of get exposed to it so that as we review it later at least it was embedded in your head at some point and then it’ll make it’ll it’ll be easier to make sense of it when we go into the later portions of this so you don’t need to memorize this you don’t need to know this off top of your head right now you should know what this is right you should know what this specific piece represents oh this is a key file for uh SSH where we’ve generated an RSA key file and it looks like that they’re adding it to the author authorized Keys uh list of authorized keys for this particular user right so you don’t need to know all the details but you do need to recognize these specific points so that you can see what is potentially going on right as long as you can make sense of it you don’t need to know the exact details of everything that’s going on that’s that’s the main point that I’m trying to show you here once the key has been copied into the authorization Keys the authorized Keys then you can actually SSH into that that server with that user and this is a very basic SSH command so you’re just secure shelling as this user into this particular server that’s all you’re doing um once you set it up and copy the public key you can log in without being prompted for password that’s the whole idea here um again this whole piece requires some kind of authentication at some point because you can’t just add a key to the authorized Keys folder without being authenticated at some point this is assuming that you’ve authenticated yourself with a password at some point in this process before you started this process and now the system trusts you therefore it’s allowing you to transfer this key to that authorized keys and once you’ve done that then you can access the server without entering your password if you have never entered your password and you’re trying to access the server as this user it is going to ask you for a password I don’t care I don’t care how uh great you’ve done the rest of the stuff that we just talked about if you never authenticated yourself none of those things are going to work therefore at this point you’re going to be asked for a password so the summary here is that you can have a password-based authentication which is the default mode of verifying who you are with any kind of a system but then you can have multiactor that enhances that password-based authentication something like sending you a onetime code or your fingerprint or your face scan something like that and then you have the public key key authentication which is a stronger security for using key paare it still requires at some point you’ve entered a password to verify who you are so that you can generate a key pair and then transfer the key pair from your where it’s currently sitting as your ID RSA inside of your authorized keys and then you’ll be able to enter whatever that server is without entering your password again right so it’s in some point for this at some point you do need to provide a password otherwise none of the other stuff is going to work so just keep that in mind okay now that we’ve talked about authentication we need to talk about secure shell and configuring secure shell so secure shell is a very powerful tool for remote access and it still is used currently um in the modern era in 2024 going into 2025 secure shell is still one of the most powerful ways to access a Linux server specifically Al um so it runs on Port 22 by default and uh for the most part the only thing that is required to access secure shell is a password unless you do other things to enhance the security which is what we’re going to be talking about when you enhance the security of something you are hardening the security okay so we are going to harden SSH configurations to mitigate any potential threats and these are some of the key steps to do it number one you want to change the default Port so by default it runs on Port 22 simply changing the port can help you reduce reduce the risk of automated Brute Force attacks that Target that default Port because everybody and their mother if you even if you’re like a brand new hacker you know that Port 22 is secure shell and it is the port for remote login so you’re going to attack Port 22 for the most part so the first thing would just be change the default SSH port and you can do that through the SSH configuration file by doing that or the way to do that would be to actually go into the sshd configuration file so you already know that sshd stands for the Damon of SSH and there’s a configuration file for it you’ll find the line that says Port 22 and it’s typically um commented out because it’s not modified so you need to uncomment it meaning just remove this hashtag at the beginning and then change it from Port 22 to 2222 for example and then you can also change that for literally anything else so there’s the top thousand ports which are usually assigned to something so you don’t want to use any of those top thousand ports you want to find anything past thousand or past 2,000 because it is not going to be a common Port anymore and at that point you can just use any of the 65,000 ports or let’s say 63,000 ports if you don’t consider the first two ,000 ports anything past 63,000 you can literally use any of those to be your Port 22 or your SSH Port excuse me it could be the replacement for Port 22 and once you’ve done that using Nano you can then just uh save the file and close it out and now you have uh reassigned your actual port for SSH the next that would be to disable root login so not allowing the root user to actually log in via SSH is a very strong move because if they are if any hacker is allowed to actually log in as a root user they get all the root permissions and I mean just figure out what the rest of the problems will be after the fact so you just disable root login and then it forces whoever the attacker is or whoever anybody is to log in as a standard user account and then they have to escalate privileges if they need to get the rest of the things that they need to get done so first logging in as a standard user is going to be a problem for them especially if you have really strong password policies that you enforce in your company once they log in as a standard user now they have to find a way to escalate Privileges and actually become an administrator or become a root user to be able to do the rest of the things that they want to do so that would be another really simple move that’s very very powerful you just disable the root login now the way to do that would again be inside of the sshd configuration file which would be with the permit root login portion so again you just find this particular line that starts with permit root login and then it says prohibit password is the default so you uncomment it you remove the hashtag and then you just say permit R login no so you remove the prohibit password portion which means that as long as uh they have the password they can get in it’s prohibited unless they have a password that’s essentially what the default is and what you want to do is you just want to change it from from this to no nobody can log in as root root login is not allowed and then you just save the file and you exit the editor the next portion would be to limit the SSH users so you just designate which users can actually log in Via SSH and nobody else can unless they’re on that white list and this is also another very powerful tool that is very very simple to do and it just goes miles as far as security is concerned so you have a handful of people that can log in via the SSH portal um and this is again done on the SSH configuration file so you find the portion uh where it says allow users uh if it’s not there you would just add it yourself and it is case sensitive so it is capital a capital u allow users and then username one username two and obviously those are the actual usernames that can log in Via SSH and I mean I would keep it to a small group of people I would not go crazy with this you don’t want a bunch of people to be able to log in Via SSH you just want uh whoever the admins are and whoever the specific uh it administrator is or the CEO or a CTO whoever those important people are you just want those people to be able to access SSH and then from there nobody else can access SSH and just close it off close it off to everybody else if they want to access their file systems remotely you give them a sep portal that is encrypted and you give them a different way that they can access a file system remotely they should not come in Via Port 22 or the SSH Port whichever Port has been designated for that specific service you do not want them to use SSH you want them to use a different login mechanism that is encrypted and runs across a VPN and a variety of different authentication methods so that they can access the file system remotely and do what they need to do they should not be coming in Via secure shell that’s the whole point here once you have done all of those things you need to restart the SSH service by just doing a system CTL restart command and it will restart the service which means that it’ll apply all the configurations that you just made to the configuration file for SSH so this part is very very important if you don’t restart it then it will not enforce all of those rules that you just added to the configuration file F in summary you want to change the default SSH Port you do this by editing the configuration file finding the portion that has the port 22 and then changing it to whatever your new port number is going to be you’re going to disable root login in the same exact file you’re going to go and find permit root login and you’re going to change it from whatever the current setting is to no simply no no root user is allowed to log in Via SSH and then you will edit the uh config file by adding the allow users parameter the allow users option so that you can uh designate which L users which limited number of users should be able to log in Via SSH and then after the whole thing is done you’ve saved the configuration file you need to restart the SSH system the SSH service so that all of the rules that we just created will now be enforced and that way you can actually conect configure your SSH for secure access and now we need to talk about encryption and the secure transferring of files which is another very very important concept so um encrypting data with gpg so gpg is a tool for communication securely right so securing Communications and data transfer essentially um it uses asymmetric encryption which involves a pair of keys a public key and a private key as we’ve already discussed with our key genen portion the public key is used to encrypt the data and the private key is used to decrypt the data so public key locks it private key unlocks it this ensures that only the intended recipient recipient who possesses the private key can read the encrypted message my accent kicks in sometimes and I’m like oh my god um so only the person who has the private key is the one that’s allowed to decrypt the message or file so that they can get access to its contents so it’s very simple simple concept but again really really powerful concept so we’re doing it with gpg so you generate a key with gpg you do gpg D- gen key it generates a new key pair you’ll be prompted to provide the name email address optional comment and you can also set a passphrase for additional security which I always recommend that you do and then once the key has been generated you select the key type the size of its that you want it usually uh the default is very useful but I do recommend if it gives you like a really massive option I do recommend getting like the largest type of key that you can find because it the bigger the key type is the more powerful it becomes and the harder it becomes to be decrypted um you set an expiration date for the key if you want to do that and then you enter a passphrase if you want to private uh protect a private key which I again I recommend that you do that so these are the steps that you would do to generate your key with gpg this is what the output looks like so you run the gpg uh command and then this is the top portion of it and it says it needs to construct a user ID to identify your key so the real name would be Alice the email address would be this person comment would be this you selected this user ID Alice y y y change email comment or is everything okay and then you just say okay and you press enter and then it continues to do what it does this piece is right here for the rest of what we’re going to be talking about so this is essentially the user ID that will be assigned to this key that’s being generated and when we get to the encryption of a file you need to actually give the uh the recipient to the command right so you’re going to do gpg encrypt the r would be the recipient themselves and in that particular case it would be this person right this is the IDE of the recipient and so when you run this command you would put the ID of the recipient and then the file name that you want to be encrypted and then it will encrypt that file name and then whatever the key is for that specific person that will be what’s used to decrypt this file name this is what that actual command would look like when you actually use somebody’s ID so same Command right so gpg yada y y and then we have Bob at example which would be the user so the recipient in this case that that’s the ID for the person which is Bob at example and then the name of the file that’s going to be encrypted and then once it is been encrypted the encrypted file will have a gpg extension at the end of it so it will stay it still says document.txt it’ll just say. gpg at the end of it implying that this has now been encrypted and then this is the file that will be sent to Bob and then Bob will be the person that has the only key that would be able to decrypt this particular file Bob would then need to run this command to decrypt the file so instead of- e it would just be A- D for decrypt and then it would be the file name with the extension of gpg and then that’s what would happen to actually decrypt the file the key would need to be added to their key log um which we’ll do in a couple of slides but essentially this is how the file would be decrypted and you run the command to decrypt a file you’ll be prompted to enter the passphrase if there was one and then that’s how it would be decrypted so very simple so this is what the actual full thing looks like right so you just do gpg uh- D document text PHP or gpg excuse me and then it’ll output the decrypted content into the actual console um instead of doing that you can output it into a text file or into any given kind of file by using the- o flag so instead of it being printed onto the console which is the default you can just run exactly the same command just do- O and A assign a name which would be for example decrypted document.txt and then you give it the gpg file and then it’ll decrypt it and it’ll output it inside of this file for later use to be able to import a key we will use the import command with gpg and we would do– import and then the public key file and it’ll import the public key from a file into your gpg key string there you go it’s called a key string or key ring sorry key ring not a key log so it’ll import the key file into your key ring and this is what it looks like right so this is Bob’s public key and it’s going to be imported into Bob’s key ring when he runs this and then he can run the decrypt Command right um if you wanted to export the public key you would run the export command uh with the a uh option right here for the user ID themselves and then you would export it into a public key file so it’ll export the public key into a file replace user ID with the user uh with the email or key ID for the person and then the public key file would be the name of the actual output file itself and this would be the file that would be imported into the key ring using the import command later on and the export of the public key would actually look like this in this particular case so you have the export of this particular user’s key that would be exported into this key file right here and then this would be the file that you would email them and then they would need to import it into their key ring or you would not email them it would probably be like a secure copy kind of a situation and then once they have that they would be able to import it into their key ring and then use it to decrypt a file if you want to list the keys that you’ve generated you would just use the list Keys command and it’ll show all the keys that are inside of the GP gpg key ring including the key IDs the user IDs associated with them and the types of keys that they are so in summary it’s a very versatile tool for securing files and Communications using public and private key pairs um with the commands for generating them encrypting and decrypting files managing them it ensures that your data remains confidential and secure and this is as you saw it’s not a complicated tool to run uh the process is fairly simple right you generate the key you encrypt the file and you encrypt it with the ID of the person that should be able to decrypt it and then you create uh you import or excuse me you export sorry you export that person’s key into a key document a key file you get them that key file they would import that key file into their key ring and then they would be able to use that to decrypt whatever the file is that they’re supposed to decrypt and typically if you’ve already added a passphrase to kind of double up the security for that file then they would also need the passphrase to be able to do that so if somebody intercepts that individual key file that you generated for them and then you emailed them or secured copied whatever if somebody intercepts that key file but they don’t have the password to access that key file then they still wouldn’t be able to decrypt the original document which provides an extra layer of security and I would recommend that you send the password to the key file in a separate type of a medium so you text them the password and you email them the actual key file for example so that there’s two different Communications that have happened through two different mediums so if somebody is intercepting their emails for example they won’t be able to get the password that you texted them or you send them through WhatsApp or a different method of communication you could call them and say this is what it is write it down so that nobody can actually there’s no digital layer of evidence for that transfer of information so there’s a lot of different ways that you can secure this but I would highly recommend that every single key file that you’ve generated and every file that’s been encrypted also has a password that’s attached to that key so that it can be decrypted using the password as well as the key and this is the perfect segue into secure file transferring and we can do this with SCP which is secure file copy or SFP which is the secure file transfer protocol um so this is essentially the way that you would transfer those key files that you just generated as well as the the document itself that was encrypted right so the file that was encrypted as well as the key that was generated you can be able to transfer them using either SCP or SFTP so SFTP is the interactive protocol for um file transfer it is the secure version of FTP which is a very very common protocol that was used for a very long time until they found out that it’s not secure because most things are clear text and they developed the encrypted version of it the SFTP version and it’s more flexible and user friendly because it’s interactive right so once somebody has the login for the FTP the file transfer protocol they can kind of navigate it very similar to the way that they would navigate um any any Linux uh file structure any Linux file system so a lot of the same commands that would run inside of a terminal for a file system actually run on the SFTP once the person is logged in so you can do SFTP user at host and you would start the SFTP session as such and then you can put the file inside of this particular server and then somebody else can log in and then access that file and download it onto their computer so it’ll initiate the SFTP session with the specified user on the remote host and then you can run the ls command for example to list the contents in the directory you can change your directory into the path of another directory because you’re inside of a file system right now you’re inside of a file transfer protocol so you’re literally inside of a file system that is just being managed remotely and you can download the file right you could just say get of this file and you can put a file inside of it so there there’s a uh for example the file that you just encrypted using uh the commands that we just ran through um we can take all of those files as well as the the key that we just generated and we can take both of those and put it inside of the FTP file transfer Portion by using put and then the person on the rece side would log in and then they would use get and they would download those files onto their local machine so that they can decrypt them and they can get access to them um you can do a get R for remote directory and you would recursively download everything that is inside of that directory onto your computer so instead of doing a individual file that you would do with get remote file you would get the entire contents of the directory and you could do the same thing with put our local directory so recursively put everything inside of this directory into the file transfer protocol and then they would be able to get it on the receiving side while logging in so this is what an example would look like so the user Alice on this particular IP address so you just do SFTP Alice at this particular IP address you would be prompted to enter the password for Alice so this is not just going to immediately let you to run LS that’s simply not how it works so once you run this command you are going to be prompted to enter a password you enter the password if you have the right password you now are inside of this particular server as Alice and then you can list the contents of that home directory so on and so forth so once you’re there let’s say you want to transfer contents to somebody else you would first do put project zip inside of this particular server and now it exists inside of that file hierarchy and then from there somebody else can log in or you can log in you know Alice for example can log in from a different computer onto to this exact server and then get that exact file onto the machine that they’re now logged into so you can download a file name example from the the file system using this you can put the file inside of the file system using the put command so it’s very simple um secure copy would be the quick and straight forwarded file transfer over SSH which uh is essentially the streamlined quick version of doing this particular command which is the SFTP command um and we’re going to run through some examples for running secure copy as well but SFTP is logging into a file system secure copy is just transferring the file from one host to another one and this is what the SCP example looks like so the command is SCP and then this is transferring from our computer to the remote host so you would do SCP and then the path to the actual file that you want to transfer and then you’re going to do the username at remote host and then the path to where it’s going to land and this essentially you kind of can designate wherever you want it to land this part is very very important and what happens as soon as you press enter you’re going to be asked for the password of this particular person at this host so this is not just going to transfer the file willy-nilly right you need to still have the password of this particular person at this host and then it’ll just transfer where this file into this particular location and it’s really as simple as that there is no logging into a file system running LS and getting and putting and all of those extra commands You’re simply just doing a secure copy very similar to a copy command that you would do locally on your computer you’re just doing it from your location to their location so this is going from your computer to their computer this version is coming from their computer to your computer so essentially you’ve just reversed the order of this particular command and then you’re doing secure copy the username and then the path to that actual file and it’s coming to the path on your current directory or in your current computer and again as soon as you press enter you’re going to be prompted to provide the password for this username at this remote host so that you can copy the location or you can copy the file from that location what’s important is that in this particular example this is actually very important right so in this example it wasn’t really that important cuz you you can just transfer this into wherever you essentially want on their computer that you just need to tell them where you put it so that they know where it is um the path of the local file is important in this version because you just need to know where it is that and what file it is that you want to transfer out in this example this path is very important because you need to know exactly where the file is that you want to transfer to your computer and then this part of it isn’t as important because you could just put it anywhere as long as you know where you just put that file so this is what secure copy is and how you can transfer a file securely and it’s again very simple command it’s one command that does the job instead of having to log into a file system and do all the rest of the stuff that we did with SFTP you’re literally just doing a secure copy from one location to another except you’re just doing it across a uh secure Port which is actually Port 22 or or whatever your Port is for secure shell so this is going across that secure shell port and is copying the file either from their location to your location or from your location to their location and that’s basically what secure copy does so very very very powerful tool to be able to transfer files you just need to know the password for the actual username on that individual host that you want to either pull the file from or send the file to and that’s basically it for secure copy okay so now it’s time to talk about troubleshooting and system maintenance and the first part of this is log files so how to analyze and interpret log files and the very first uh command that we’re going to go over for this is Journal CTL or Journal control for system logs and journal CTL is a very powerful command line utility for viewing and managing logs that are generated by the system D Journal so this is the more modern version versions of Linux that run systemd as their in it processes so it’s particularly useful for system administrators and developers to troubleshoot and maintain system Health on Linux systems that use system D so that’s what Journal CTL is now to be able to view everything on the log you just run Journal CTL and then you press enter and it displays all the logs recorded by System djournal starting from the oldest entry to the newest so chronologically going from the oldest to the newest and it clud system messages kernel logs as well as application logs um the filter by boot so if you just wanted to see logs from the current boot that’s going on um or the current boot session so to speak you would run journal c-b and this is particularly useful for diagnosing issues that occur during system startup and then we have filtering by Boot and then you have the dash one so this is logs from the previous boot uh you can adjust a number to view logs from earlier boots so uh if you go -2 it’ll go prior to that Dash uh three so on and so forth so- B would be the current boot just by itself and then -1 would go back to the previous Boot and then you can keep going back further to be able to find uh all the boots prior to that that are still inside of the log so at a certain point the log will have most likely stopped recording the boots um so from there uh you you can kind of just try to figure out how many you have in store in the log so that you can try to troubleshoot if you need to or go back go as far back as the logs will allow you to go you can filter by a service Name by using the dasu option and then you would provide the service name to it so it’ll display all the logs that are related to a specific service and then obviously replace the service name with the name of it so you would need to run the uh one of the previous commands that we went through for example top as an example to see what all of the various services are that are running and then from there you can look at the logs that would be relevant to that specific service by using journal c-u and then the name of the service and then we have the SSH as the example in this particular case so dasu SSH would display all the logs that would be relevant to the SSH service if you wanted to view real time log updates you would look at Journal c-f which would be similar to the tail – F command that would be applied to uh any log file essentially because tail would show you all of the the bottom entries at the bottom of the log which would be the most recent entries that have been appended to that particular Lo log file the entries that have been added to that log file so it’s similar to running tail-f command on that particular log in this particular case we’re doing journal c-f and it will give you realtime log updates as the entries are added to the log and as the system log or the log itself is being updated so you could combine this with any of the various log options that are available so that you can see the most recent or realtime additions that have been going to that particular log so it’s very useful for looking at live system activity or diagnosing issues as they occur then you can filter by time so you can do Journal CTL D- since and then you would provide it the the time as you see in that particular format and I think that’s called that’s the universal uh time standard I want to say I’m not exactly sure but it’s it’s in the format that you see on the screen so I don’t know what the technical term for that format is where you see the year the year the month and the day and then the hour minute and the second uh you can say since that time you know I want you to show me all of the entries that have come from that time filtering by time will look like this if you wanted to provide the actual time into it so you can see that they didn’t provide the second in this example we didn’t provide the minute in this example we just said from 8:00 on November 15th 2024 I want you to show me uh all of the entries that have come through this particular log right so that’s essentially what it will look like you don’t need to give it the the time the minute and the second unless you really are trying to narrow down on a specific incident that took place so that you can get the the results that you’re looking for to provide uh Mur uh further context for your investigation so to speak uh but usually you can just say at you know 8: a.m. on this particular day I want you to show me everything that happened since that particular time so if you wanted to filter By Priority you would just use the dash p and then the priority um and the level would be needed to be provided to it so if it’s a level zero which would be emergent meaning emergency uh to a level seven which is just a debugging kind of a priority level you can say uh what you want that priority level to look like so it’ll say if you do- P0 it’s going to show you everything from zero on up if you do uh Dash I guess it would go from seven on up so it would show you 7654321 0 uh 654321 0 so on and so forth so that’s what it would look like if you wanted to go by the priority level and uh show essentially every event that happens from that level that you’ve assigned all the way up to all the other levels um you can also do by an error type of a uh message or an error level message and higher so still priority flag so- p and then err would be all of the logs that have the error priority and higher and then it just continues on from there if you wanted to you uh filter by unit and time so we’re going by the service name as well as the time you can combine them so this is uh all of these commands are able to be combined together this is not to say that you get to use one or the other and this is the example that we got in this particular case so you can do – SSH and then since you know November 20th 2024 and on you know what I mean so you can combine filtering by services and time or a variety of other options so you can service and time and priority for example you can combine these various options to filter the log files so you can get the information that you’re looking for so this is what the example itself would actually look like right so – us SSH since November 15 2024 at 8:00 a.m. it will show you all of the log items for SSH since uh 8: a.m. on November 15 2024 so in summary you can look at Journal CTL as a tool for system administrators that are using system dbased Linux system so uh it won’t work for CIS vinet because it doesn’t exist on CIS vinet so Journal CTL will only work with system dbased Linux systems and then from there you can look at a lot of different options for viewing and filtering during logs and of course we’re going to go into all of those options and run a bunch of different formats of the journal CTL command as we go into our practical portion of this training Series so you can get a good understanding of how to use it and all the different filtering options that are available for Journal CTL until then this is this is going to serve as your little cheat sheet so you can view the entire log you can filter by the boot you can filter by the service uh filter real time log updates filter by time itself and filter By Priority or you can combine all of these to create a very specific filter to look at a very specific series of incidents that have taken place or a series of log items so you can combine all of these options to create a very specific viewing rule with Journal CTL but this is just a kind of a sample of the common commands that are run with Journal CTL for looking at system logs as we discussed in the file system hierarchy standard portion where we were looking at the main uh hierarchy of the file system in Linux we figured out that the logs are stored inside of the ver log directory so a lot of logs are in this particular uh directory and uh this is the central location for log files in Linux basically um you can get system events service activities application Behavior security incidents and everything in between uh analyzing these particular logs s admins can troubleshoot issues monitor system performance and enhance security and in a lot of cases you don’t even have to do it manually you can use a security Appliance to do it for you or you can try you know take all of the logs that are in this particular location feed it into Splunk as an instance or you can connect Linux into Splunk so that it gets live updates from your logs and then from there it can help you analyze the events that are going on in your logs uh if you don’t want to pay for something like Splunk you can always use wazu or a variety of different uh tools that we have Cabana if from elastic stack that’s another really good one that you can use that’s an open source tool that can be used to look at log files so very very useful location because it holds all of the logs that have to do with everything that goes on with your system so some of the key logs inside of the ver log directory would be the CIS log or the messages log so to speak so these are the general system log files that record a wide range of system events including kernel messages logs and service activities so it could be either the CIS log or the messages log and so there’s the distribution differences that would be uh relevant to these particular uh types of log so when you see CIS log it’s for deban based systems like Ubuntu when you see the messages that’s for Red Hat based systems like Centos or Fedora so the usage in the again would be to use uh tail f for example or just use Journal CTL for example um but in this particular case since you’re looking at an actual log you want to be using Journal CTL cuz Journal CTL has its own series of logs so in this particular case we’re going to be using tail-f to look at the last series The the bottom portion of this log file you will look at the last 10 lines for example which would be the most recent entries inside of the CIS log file same thing with this one where you would look at the most recent entries inside of the messages file another one would be the authentic a log the off log and this file contains information related to authentication and authorization so anybody who’s trying to log in any login attempts whether they were successful or unsuccessful user authentication processes or privilege escalation attempts all of these things would be uh stored inside of the Au log so anything that has to do with authentication or authorization would be stored inside of the off log and the view again would be tail F to look at the most recent uh additions that have happened to this log and it will show you everything at the bottom of that log file by using the tail command so uh this is very important for unauthorized access attempts and anything that is also security related so the authentication file or the off log I should say this is one of the key files that security analysts will constantly look at and keep an eye on especially in a large environment because you want to see if there are any uh failed login attempts or repeated failed login attempts or successful attempts that have come in during odd hours of the day uh or anything of the sorts just to uh figure out if people who should not be logging in are actually logging in or trying to log in um then we have another one which is the DMG so D message I kind of look at it like that but dmesg could be the one um it’ll records messages from the kernel ring buffer which contains information about Hardware components and the status of the hardware so the initialization of the device itself for example or the drivers that may be connected to your system for your printer or uh or anything that is a physical connection to your computer or any other hardware error that may go on with your individual machine and again it would be used as uh the tail F command to look at the most recent uh entries that have gone into this particular log file so um it’s very useful for diagnosing any kind of a hardware issue that’s going on and understanding the state of Kernel activity so if you remember the kernel is the entity that connects the user and anything that we use as the user to the actual Hardware so it’s the bridge that connects the hardware and everything inside of the actual physical computer to us the user so that we can communicate with it so if there’s any kind of a kernel issue or any kind of a kernel hard troubleshooting that you need to do you would do it with the D message log and then we have the secure log and this is specific to Red Hat based systems like Fedora or uh you know Red Hat Red Hat Enterprise Linux um this is the file that records security related events especially those related to secure shell as well as other secure services so SFTP for example or SCP that we just recently covered in the last chapter so this is where anything that has has to do with secure versions of a specific service all of those things would be uh logged in this particular file uh and this is uh for Red Hat based system so it wouldn’t apply to Ubuntu for example it would be Cent OS it would be Fedora and everything along those lines that would be considered a red hat-based system and this is the viewership so this is just a command and by now you should have memorized this so pseudo tail-f and then you would give the path of the log file and and it would give you the last 10 or 20 or however many lines that you would designate on this particular log which would be the most recent incidents that would happen inside of the secure file so for example failed login attempts and changes in user permissions which all have to do with security so anything that has to do with security and anything that we covered in our security portion of this training series most often than not is going to be inside of this particular log file for red hat-based system systems so this is our first example here so if you want to monitor a gener uh General system log you could look at the Cy log that would be on a Debian based system or and if it’s on a red hat based system you would look at the messages if you wanted to look at recent authentication events you would look at the authentication log or the off log that would be anything that has to do with authentication or authorization and again the tail would give you the last 10 Lin lines or the last number of lines that have been entered which would represent the most recent if you just wanted to look at everything inside of that file you could just open it up with Nano you would still need to do pseudo Nano but then it would open up the entire file which is a very massive file most likely so it will be very overwhelming uh you can search through it with GP right so the various tools that we’ve covered and then we’re going to do a lot of these things as well when we get into the Practical section but typically you would look at tail um or you would use tail to look at it so that you can see the most recent authentication events that have taken place um another example would be konel messages so you could do dmesg and that would give you all of the kernel messages or you could look at the most recent entries to that in real time for example which would be the tail-f which give you all the recent uh entries that have gone into the dmesg file and then you have looking at the secure file so security related events on a Red Hat system uh which could be anything that has to do with secure shell secure file transfer anything that has to do with changing permissions or ownership for example all of that stuff would be in uh inside of the secure log file itself so in summary we have the ver log thear log however you want to pronounce it we have this particular directory that is a treasure Trove of information it contains logs for everything all of the logs are inside of this and when we actually look at the uh the log file when we go into the Practical portion of this and I’ll do an LS command for you so you can see the number of log files that are inside of this thing it’s massive uh the ones that we went through are key log files that everybody should know about but there are literally dozens of log files that are inside of this particular directory and you can get a lot of different information from those log files so it all just depends on what you’re looking for and in a lot of cases individual applications that you install will also get their own log entries that will come inside of this exact directory so there might be something for MySQL there might be something for Apachi and a variety of different software services that would be installed on that machine that they get their own log files as well so um they’re just important location for you to consider right so anything that has to do with security and Security Administration or troubleshooting would all be done from these log files that are in stored in Star inside of our VAR log directory and these are some of our specific logs that you should keep in mind just as a screenshot so uh you already see you’ve already seen all of these so I’m not going to go through all of them but if you wanted to screenshot this one two 3 and ‘s up okay now we need to look at the usage of the disc itself and any cleanup that would need to be done so we’ve already looked at this particular command which is the DF command and uh DF and du kind of go hand inand and they help you look at uh everything that has to do with your disk so um dis usage analysis and cleanup is very important for maintaining system performance because a lot of times clutter tends to add up inside of the system and you just need to make sure that you are good with your storage right especially if you’re managing a bunch of different users and they have a bunch of different files and media and everything that they’re downloading and using there needs to be disk usage uh processes in place place to just make sure that uh you are good with storage this is more important for storage than really anything else in my opinion but it’s also a security issue as well so you just want to make sure that there is no uh you stay ahead of it so that there are no issues and the system doesn’t crash because it uh it doesn’t have enough storage or the ram doesn’t slow down or any of these things because there’s just too many things for the system to be taken care of so the two primary tools that we have are DF and and du so DF is the disk file system as you should already know and it’s a command line utility that displays information about the available and used disk space on our file system and this is just one of the simple commands so the dashh option with the f is human readable that’s what it stands for so it formats the output in an understandable way meaning that it’ll give you kilobyte usage and it may give you a tree breakdown or like a tree format view of it so you can kind of have a good understand of what you’re looking at on the terminal so df- will give you the dis usage as well as the memory usage and all those things in a human readable format so this is what the potential output might actually look like so you run it and then you can see that for this fire file system which is our uh primary SDA file system partition one has a size of 50 GB 30 GB has been used there’s 20 GB that’s available the percentage would be 60% % of this Total Space has been used and it’s mounted on our root directory and then the second partition has a 100 GB assigned to it 70 GB has been used 30 is available which means that there’s a 70% usage and it’s mounted on to the home directory that’s inside of the rout and the home directory specifically is for all of our users so it makes sense that there is more usage that has been done over here because most likely there’s multiple users that have file and media and everything that’s being stored inside of that directory so it’s using up a lot more space so that’s what the disk usage the DF command uh helps us find and du itself is the dis us it stands for dis usage and this is a command line utility that estimates and displays the disk space used by files and directory so it’s actually very similar to what we’re doing with our disk file system Command right so it’s it’s really not that different the intention of it is is the same um just a different command and it gives you a little bit of a different uh response here so you would do dis usage du D- sh and then you have to give it the path to the directory so- s s excuse me stands for the uh summary of the total dis usage and then the H stands for the human readable so same thing as the dis file system portion so the human readable and then give me a summary and so the example command would actually be this so dis usage dsh for the home user directory and it just gives you that there’s 5.2 GB on the home user directory that has been used so disk usage 5.2 GB on that particular directory that has been used um if you wanted to look at the full directory and then do a little bit of sorting and give you the top 10 lines for example so this is what this entire command is doing for us so we’re looking at a human readable format for this particular path and then we’re going to take this result and we’re going to pipe it into the sort command and then we’re going to take the top 10 lines from all of the output that we got so dis usage H will give you the dis usage for the specified path and human readable format sort RH will sort the output in a reverse order that’s what the r stands for and then based on human readable sizes which is what the H stands for and then the head N1 will show you the top 10 largest directories if we did 10 tail n 10 it will give you the bottom 10 uh largest directory so we’re sorting it and typically when you use sort it’s in ascending order so it’ll go in alphabetical A to Z or numerical smallest to largest so if we want to see the largest first we would do it in reverse order which is why we have the DHR and then H would be the human readable portion of it and then now it gives us the top 10 which would be the top largest files so this is potentially what the output itself would actually look like so we’re looking at the VAR directory that has all the logs and everything in it and then we’re sorting it in reverse order and we want the top 10 so it says the log directory understandably because there are so many log files and they take up so much space so the log directory would have the largest uh portion that it’s taking it takes up the most amount of gigb the biggest size right and then you have the cache which is also understandable so 1.8 GB the library has 1.2 GB and then the www that has to do with HTTP or typically it has to do with the Apachi server or any kind of a web server that is holding up 900 megabytes of data so some practical examples here we have the df- will give us the human readable disk file system command that will break it down for you by the file system and show you how much of it is being used and how much is free disk usage dsh will show you everything that’s going on on this particular uh path and it will give you the summary as well as the human readable format version of it and then this would be the the full thing if you wanted to look at everything for the home directory and then sort it in reverse order and human human ridable format and give you the top 10 results from this particular uh command so that’s how you would analyze all of those things so as a summary you have the disk file system the DF that provides an overview of the dis space usage by file system it makes it easy to see which partitions are filling up and it gives you everything on a line by line as you already saw and that’s the command D f-h and then you have the disk usage that gives you detailed insights into the actual usage by directories and files so you can identify which areas are consuming the most space this would be the summarized version for a given directory the summarized Das s and then DH for human readable on this particular directory and then you can analyze it by using the short command in reverse order and then giving you the top 10 results so those are our summaries but again we’re going to be running this as we go through our practical section so you’ll get a lot of opportunities to actually run this and use it uh and that this doesn’t mean that you can’t run these while you’re watching this so this is going to be uh this is technically the lecture format as I’ve I mean I say it literally at the end of every summary section so I just want you to keep all of that in mind so you can definitely be running these as we’re going through the lecture but uh we’re going to do like a deep dive on all of these things when we get into the Practical section so let’s talk about disk cleanup tips so this cleanup it helps maintain the system performance ensures that you have adequate storage for new data there are also essential tips and commands that we’re going to be going through to make sure that you accomplish all of those things this cleanup is a very important concept and it’s something that you should be thinking about all the time as a Linux administrator so why do we do it we want to removed any unused packages because they accumulate over time they take up a lot of space um and it kind of messes with your overall storage um they can also include dependencies that are no longer required by any installed software so you just want to make sure that you get rid of all of those things cuz they’re literally just taking up space and there’s nothing that is actually using those dependencies because they could be outdated they could be upgraded a variety of different reasons why they’re no longer relevant and you you just want to be uh be on top of this you want to do this routinely you want to do it in a scheduled manner so that it doesn’t run away from you you just want to kind of stay on top of this so to be able to remove unused packages you would use the AP um uh package manager command so we would do pseudo AP and then you just do auto remove and it removes packages that were installed as dependencies and are no longer needed by any installed packages that are currently being used so Auto remove removes anything that was installed as a dependency and is no longer actually being used by any other packages which is very very useful you don’t have to go through the list you just run pseudo AP Auto remove and it automatically removes anything that is irrelevant and then we can do it on a red hat based system so same exact thing except you’re just using dnf as your package manager so dnf instead of AP and then you would just do auto remove and then it it removes unnecessary packages and dependencies that are not being used the second reason why you want to do this is to clear all the temporary files which can also accumulate inside of the temp directory which take up disk space so uh for the most part it’s safe to delete all these things but you just got to be sure that there’s nothing that you actually need um or any applications that may be uh using any of this critical temporary files for example so uh for the most part you can delete all of these things cuz if they weren’t meant to be temporary they wouldn’t be inside of the temporary directory so that’s my approach to it if uh they were not important to be put inside of the key directories for the actual uh library for that specific application or the optional uh directory or any of the other directories that can be used if they’ve been put inside of the attemp directory for the most part they’re good to go uh the command would be remove so we’re using the r M command and then you do the RF option which is recursive so you’re doing a recursive um to delete all of the files for the temp directory now notice that there’s an asterisk at the end of this particular path which means asterisk it stands for a wild card character so it’s going to remove literally everything that is inside of the temp directory because it can be anything it could the asterisk stands for anything so TMP for/ asterisk means that anything that comes after this path it can be removed recursively and all of the files and directories uh will be removed so uh this is a very powerful command again it’ll permanently delete everything you just got to make sure that none of these things are relevant but again this is just my process My Philosophy around it if it’s inside the temp folder most likely it was going to be deleted anyways cuz it t typically does get deleted on reboots so most likely it was going to be deleted anyways or at least delete it after a certain amount of time otherwise it would not be placed inside the temp folder so for the most part I think it’s good to go then we have removing older Journal logs so this happens with a lot of uh log entries that have to do with the system for example because it’ll log everything on the system uh even if it’s just informational even if it’s uh nothing that is critical or nothing that needs to be addressed it just keeps logging everything so uh over time it’ll be a lot of log files massive massive log files that take up a lot of space so it’s important to periodically clean these or set up some kind of a script to pre periodically clean these specific log items so that uh you know they’re only maintained for let’s say 6 months or they’re maintained for a year or however long is relevant to you but after that 6-month period is over if it’s older than 6 months it should be deleted and you should move on from it or maybe if it’s older than 6 months it should be transferred out of your system and put into an external drive or something like that and that’s the way that you would manage your log files so um in a lot of environments and a lot of Enterprise environments and based on regulatory environments or compliance issues based on Regulatory Compliance you may be required to keep logs for longer than 6 months but you can transfer them from hot storage which is what’s on the computer and it’s accessible all the time time to warm storage which could be an external drive that’s easily accessible you could just plug it into the computer and get access to it or if it’s super old you can put it into Cold Storage which means it’s now sitting inside of a warehouse somewhere and then somebody would have to go retrieve it in order to be able to access that information but it’s being placed in storage uh just depends on what the compliance environment that you’re in and what they require um but typically uh usually if it’s 6 months or older especially if it’s just you on your computer if it’s like older than 3 to 6 months you can just wipe it and move on from it um in an Enterprise environment is a little bit different but for your computer for a specific personal computer you really don’t need to hold on to log files that long the way that you could do this is by using the journal CTL command and then remove the uh the stuff that is older than two weeks for example by using the vacuum time so this command removes Journal logs older than 2 weeks um and you can adjust the time frame as needed so you could do two days or one month or whatever and journal CTL will remove those items based on the time that they were in uh you can clean up packages by doing the pseudo AP clean or pseudo dnf clean all which will clear the package cache freeing up space that are used by downloaded package files that one’s pretty self-explanatory and then you can do a local Purge which would remove unnecessary localization files um in according to the local machine so uh The Purge command is kind of funny to me but uh it removes unnecessary localization files for languages that you don’t use so you got to install it first if it’s not already available and then you would run it and it would uh e Purge it would Purge all of the languages that aren’t being used from your local machine um we can do find and delete of large files so if it’s uh larger than 100 megabytes for example in this particular case you’re looking at the fine command which we’ve already kind of been introduced to and it’s looking inside of the root folder for the file type so type f would be a file that has a size of a 100 megabytes or larger and then once you find these files you can delete them if they’re no longer of use to you and then analyzing dis usage with GUI tools which are graphic user interface tools like Bob dis usage and analyzer on gnome or ker stat on KDE um and then you would have to install them obviously and then once you have them installed you can sort through your computer and browse the computer to find anything that would be larger than a certain size or anything that’s older than a certain period and remove all of those files and folders if they’re no longer applicable or at the very least transfer them to an external storage so in summary you can use a variety of different tools to clean up your disc um and just doing this regularly in a scheduled manner will help you avoid any kind of storage issues and so it could be done as easily as using the AP package manager and auto remove it could be done with using the dnf manager and doing Auto remove you can remove recursively anything that’s inside of the temp folder you can use Journal CTL with the vacuum time to 2 weeks to delete anything that’s older than 2 weeks or anything of that sort all right now we need to talk about backups and restoration strategies and one of the big pieces about this is going to be the command known as tar and tar is a command that is used to create archives the it’s an acronym for tape archive and it’s a command line utility to create and manipulate archive files um it compresses the multiple files into a singular archive file it can also work with directories as well um which makes it easier to store things and to transfer them to manage Backup so on and so forth it can uh support various methods for for example the gzip bzip and XZ which are it’s separate from the zip file uh ZIP command excuse me um which is another command that can compress uh data or compress files and directories into a singular uh ZIP file a compressed file um it is separate from that but it’s technically not considered uh creating a zip file it creates an archive file um so that’s I would say the key difference but I mean it does compress the multiple files into it so it does uh essentially the same uh activity it serves the same purpose as creating a zip file so to create an archive uh you would do it with the tar command and there’s a variety of options that we’ve attached to this so I’ll explain them in the next slide um and then you create the compressed file which would be this in this particular case and then this would be the path to the files that you want to compress so if we want to break this down the first portion of those options would be the C flag which is to create so it creates a new archive the Z flag compresses it into a gzip so a zip folder a gzip file V would be verbose meaning the files that are being processed are going to be displayed onto the screen so this is not necessarily uh it’s not necessary for the function of tar to work it’s just going to display onto the screen what’s actually being processed and how it’s going and then the F portion is specifying the output file name which in this particular case is backup. tar.gz so this is necessary for the function this is necessary for the function this is necessary for the function if you leave this out if you leave the- F portion out what it’s going to do is it’s going to create its own name and then you would have to rename it after the fact so if you just add the- F you can designate what you want the name of it to be and then the path to the file would be the actual file and directory that you want to be archived so in this particular case it’s just one path that’s been provided and then everything inside of this path is going to be archived into this backup. tar.gz uh compressed file right so you use- C to create – Z to turn it into a gzip and then- F to designate the name and then- V is just a verbos output so that it displays everything onto the screen as it’s being processed then you have the backup of these documents so for example again you’re creating another archive in this particular case but now it’s going to be a backup that is being created of the user documents so the exact pretty much the same exact command that we just ran in this particular case in this case right here except now we’ve actually given the path that we want in this particular case and it’s going to be it looks like this right so we’re creating a backup of the home user documents and it’s being placed into this particular backup file right here which is again same exact options that are being assigned to it and it’s creating a gzip file for us so if you want to extract the archive now you’re going to do uh a similar uh options here um the flags are a little bit different and the the very end right here where you actually create the the destination the path to destination also requires a flag for it as well so if we were to uh dis uh decipher that or if we were to what’s the word that I’m looking for not decipher um not split slice I’m drawing a blank but it it’ll come to me so if we were to break it down basically uh what we’re going to do is the first piece the instead of C we have X so instead of creating a file we’re now extracting a file so the first option the first flag is the X and then the Z Would to designate that we’re decompressing the archive using gzip so this would have to be a gz extension right here in order for this to work if it was a different type of an extension for this archive file you would use a different option inside of this so the Z represents gzip when meaning that we’re decompressing a gzip compressed file the V is still verbose to display everything that’s going on and then the F would specify the input file name which in this case would be the backup. tar and then we have the C for the directory meaning the destination directory for the extracted fil so everything here looks almost identical to creating an archive except in this case we’re extra in it and we’re going to use the X flag here to extract the Z is still the gzip and then the V would be verbose the file would be the name of the file that’s going to be extracted and then we’ve added this Capital C flag over here that would give us the path to the destination of where it’s going to be extracted to so um extracting the archive as an actual example here where we have the backup that we just created from the documents and then we’re going to designate the place that it’s going to be transferred into once it’s been extracted which is going to be the home user restore documents and this is essentially the same exact command that we just saw with actual content filled in the actual path has been filled in in this particular case and then the name of the backup file has also been filled in so if you want to list the contents of an archive you could use the tvf so V and F you should already be familiar with so V is going to be for verbos f is going to be to designate the file name in this case but then you’re using the T flag which stands for list so list T um so it’s going to list the contents of the archive without actually extracting the contents of the archive so we don’t have an actual breakdown of those commands um what we’re going to see in this particular case is that this is literally the closest thing to the breakdown of the command so you’re already familiar with V for verbos f is going to designate what the file is that you’re going to try to look at and then the T flag is going to list the options or list the contents of this particular archive file now if you want to exclude files from the archive then you’re going to use very similar set of commands here except there’s this last piece right here which is going to be the piece inside of this directory that you want to be excluded right so you want to Archive everything inside of this piece with with the exception so excluding this path so notice that path two files is still here so we’re still trying to uh archive this piece right here the only thing that we’re doing is that this exclude me portion is not going to be included in this archive file so you can actually create an archive and then exclude a certain portion of this directory from your overall archive that you’re creating so exclude is the uh the command here so– exclude and then you need to give it the full path of the directory or the file that you want to be excluded from this backup file that’s being created from this archive file that’s being created using tar so to append a file to an existing archive you can do this as well so you can do the r option to append now this unfortunately is not R in any way is doesn’t equal append or it does it’s not included in the word append so you kind of have to learn this one and kind of memorize this one or just save this right save this command for future use um if you don’t if you’re not going to get the the the documents or the the slideshow U from hack holic Anonymous I hope you’re creating your own file with these commands inside of it so at the very least I hope you’re doing that but again you can just come to the video at this portion and just look at the archiving function with tar so there’s a lot of different ways that you can come back and refer to this but anyway so we do have the V for verbos we have the F to designate what the file is going to be and then in this case we’re using the r flag to append something to this backup file right here to this uh archive file so it’ll append additional files to an existing archive and the r option stands for a pen so this is going to be our existing archive and then this is the path to the additional file that you want to be appended into this archive so you don’t this is actually very neat it’s very useful because um when you create a zip file you can’t append something into the zip file but you can do it using the tar command which is actually very handy cuz sometimes you want to just add things to an archive file without creating new archive files you know what I mean you just want to keep the same archive file and then just add something to that archive so for example um new versions of a log file inside of the VAR log directory instead of creating new archives you just keep adding those same log files as the time arrives you just add them to the same archive file so you could just have one archive file for the authentication log for example the off log you can have just one archive file for that and then anytime that you do a backup you just add the new archived file or the new log file that’s been created you just add it to the same authentication backup archive which is again it’s very very handy it’s very useful so this is an example that we have over here that that we’re now going to kind of go through to create a backup so it’s very similar to what you’ve already seen in this particular case we have all of the same flags as we had before and in this backup file we actually have the date that’s been attached to it to kind of give us an understanding of what this backup represents and then we have the documents folder that’s being backed up so backup of 1115 2024 and then it has all of the documents or all of the content of the user documents that’s going to be backed up inside of this file so very very handy little command and then now we have the extraction of that backup so same exact archive file except now we’re extracting it and we’re using the capital c flag over here to designate where we want it to be extracted to and the X flag right here to designate that we’re extracting instead of creating so the rest of it is exactly the same it’s a gzip file it’s verbos and then you’re designating the file name that you’re going to be working with and then it’s going to be extracted into this particular location which is going to be the new location for those restored documents and the third example that we have is to list the contents of that backup file so this should have probably been example number two where we list the contents of the backup file and then we extract the back backup file but T would represent list so you’re listing the contents of this backup file or this tar file this archive uh without extracting the contents of it and that is it so our summary here is that uh it’s obviously very versatile it’s very powerful as you just saw from the examples that we saw um by mastering just a few key commands you can efficiently back up and restore files and honestly create a lot of really good scripts because now that we know that we can append something using the r flag for example when you can append something to an existing archive file that means every time that that script runs and it could be a schedule task that you schedule with KRON tabs and cron jobs and every time that that backup file or that archive uh script runs it’s just going to append the contents of whatever it is that you’re trying to do inside of the same exact archive without creating a new archive file which is honestly I actually love that it’s very very useful it’s very handy and as I’m going through this I’m like okay well why am I not why haven’t I created a script for that so you better believe that I’m going to create a script that’s going to use the tar command to append the contents of my logs uh for my own machines inside of the same exact archive file which again is just super freaking handy so incremental backups that can happen with our sync will be our next piece so it’s a very useful versatile efficient utility for synchronizing files and directories between different locations uh it’s particularly well suited for backups because it transfers only modified files uh reducing the time and bandwidth that’s required for the operation so it essentially detects which files inside a given location have been modified and if they have been modified then it’ll back it up which is again very very useful so uh you have the key features here which are the incremental transfer so only modified portions of the files are transferred so think about what it means to sync something right so if there is one new addition or one new modification that’s been made it’s just going to get resynced it’s very similar to what happens when you have iCloud for example and your iCloud storage will detect any new additions that have been made to your phone’s contents and then it’ll back it up to your current cloud um and it won’t uh it won’t take everything that you previously had it’s only going to take the new additions that have been made into your phone and then just add those into your iCloud backup so it’s very very useful uh it only detects things that have been modified right it’s very versatile it can be used for local backups as well as remote backups over secure shell which is freaking badass uh it preserves file attributes so it maintains permissions time stamps and any other attributes that were originally created on the original file as it synchronizes to the new backup location so very very useful little tool so the basic command structure here is we have rsync with the AV flag and then this is the source location and then the destination directory so a to Archive and it enables archive mode preserves permissions Sim links everything else that’s the attribute for this Source directory and all of its contents and then verbos to provide a detailed output of the synchronization process onto your screen and then the source directory is the path to the source and then the destination would be the path to the destination so very simple command not complicated to understand at all and we’re just running our sync with the two flags right here you got to give it the source directory and then the destination directory and then it’s going to do exactly what it should um a basic sync as an example using actual directories here again same exact command with the the archive and the verbos flags attached to it now we’re just saying the home the documents from this particular user is going to go into the backup documents and that’s pretty much it syncing over SSH which is the the one that I’m most interested in um would be something that the flags the initial flags are staying the same so you still have the archive and the verbos flag but now you have this e flag for here for export I would say um that’s really the the thing that I would attribute the E2 um and it’s going to use the SSH um uh protocol to be able to transfer it and then Source directory and then you have something very similar to what we did with secure copy if you remember the instructions that we went through secure copy that you would have the user at the remote server which would probably be an IP address for example so you’d have the user at the remote IP address and then you have this colon and the path to the destination that this backup is going to take place once you run this okay you need to probably provide a password for this user unless you generated a key that has been uh uh attached to to the the key log not the key log I forget the the technical term but essentially it’s the storehouse of the local keys on your computer that would uh that would not require you to enter a password every time you did this so if you want to run this as a part of a scheduled script for example inside of your KRON jobs then you would definitely need to take the actions that we took when we generated keys so that you can have a passwordless authentication for SSH and then this would run every single time without needing the password to be entered for this particular user so that it can backup this directory inside of this destination so it’s not complicated to do the backup it’s actually quite easy you just need to designate that you’re going to use the SSH protocol and then you have to have the credentials and the location for where this is going to go and then if you have used a uh key based authentication method you you won’t be prompted to enter the password for this particular user and then from there it’ll just run like clockwork so syncing over SSH is a very very powerful tool and then this is it right here right so we have Alice at this particular IP address and then it’s going into the backup documents for uh this particular IP for the Alice user and then that’s it and this is assuming that we already have Alice’s key inside of our keyless authentication or not keyless passwordless authentication or we know Alice’s password so as we run this we would enter Alice’s password and then it would just back up everything inside of this location on Alice’s user profile inside of this particular server so very very useful little command is rsync so if you want to back up something with deletion so if you want to delete right it ensures that the destination directory mirrors The Source directory by deleting files from the destination that no longer exist in in the source this is very useful right so if for example uh you no longer need the contents that you deleted from the source so let’s say you had 100 pieces 100 files you had 100 files you have updated 50 of the files and you deleted 50 of the files okay this destination still has those original 100 because you already did one backup when you run this flag right here it’s going to sync these two destinations and all of the files that were deleted from The Source because you didn’t need them anymore are now also going to be deleted from the destination so it’s going to sync them and make sure that they match exactly and only the files that exist inside of the source are going to be the files that exist inside of the destination so it’s very very useful command and this is something that you can of course combine with these commands as well as if you want to run it with D SSH command you can combine all of those options together and just add this delete portion so that anything that would be duplicate or anything that no longer exists in this Source location will also be deleted from the destination location ensuring that you’re not holding on to old files that are no longer relevant so that’s very very useful so this is actually what it looks like using the delete command and some actual paths here so we have the home user documents and then the backup documents and of course it’s going to to synchron synchronize the contents of Home user documents with backup documents deleting any files in the destination that are not present in the source so deleting any files here that no longer exist in this particular location so here’s our example here so we have the rsync with the AV flag so we’re archiving and we’re going to have a verbos output and we’re taking the contents of this and we’re putting it inside of this Project’s backup folder we we have another example that is being done with the SSH protocol so we have to add the dash e flag here to designate that we want to export into SSH we’re taking the contents of the home user projects and we’re putting it on Alice’s profile on this particular server with the backup projects location and then we have the delete command that’s also being added to this and again this also could be added to this particular command as well right so you could have the av– delete and then- SSH so on and so forth and then that would ensure that the contents of here match this content so it would delete anything inside of this location that no longer exists in this location it deletes the stuff in the destination that are no longer in the source and that’s our destination that’s our source so very useful series of commands we have a couple of additional options in this uh location so we have the Dash progress D- progress which displays detailed progress information for each file during the transfer so it’s similar to verbose ex except I guess it’s showing you kind of like a percentage update we’ll be reviewing this as we go through our practical portion so you can actually see what this looks like but it’s going to display the progress as the transfer for each file is taking place during the overall transfer process and then we have The Preserve hard links command so this is the hard links portion right here which preserves hard links in the source directory so this location the hard links uh if you remember soft links are essentially uh shortcuts to a certain file or a command and a hard link is the duplicated version of that so if you have two files uh the hard link would be a duplicated version of the file whereas the soft link would just be a a shortcut that points to the or original file so in this case it’s preserving all of the hard links in this case which I mean uh assuming that uh you want to uh keep these things I don’t I personally don’t know why you would have to designate this maybe as a part of the sync process it sometimes might delete it but uh I would say include this in everything if you want to keep everything that is inside of the source directory and then delete whatever’s been deleted from the destination that’s all fine but just add this I would say to every single time that you run rsync to ensure that it’s keeping all of the hard links inside of the source directory that that feels like it’s an important command that should be run every single time then we have compressing of the data during transfer this is another thing that’s very very useful so the Z option compresses the file uh data during the transfer to reduce bandwidth usage and to reduce the space that is taken inside of the destination directory as you transfer for the file so this would this feels like another uh option that would happen frequently especially if you’re not going to be transferring this stuff back and you’re mostly doing this for backup so if you’re not going to do regular access of the destination directory and you’re just doing this to back up your stuff for Recovery purposes for example then you would add the Z option to compress the data so that it can take less space as well as reduce the amount of data that’s being used for the transfer so the bandwidth usage that’s being used during the transfer that would be another one of those flags that I would say should be ran every single time unless you’re going to access the contents of the destination folder while you get to the to that location or you’re going to use it from a different computer or something like that and you want to have uh uncompressed files I guess or like live type of files that haven’t been compressed so that they’re easily accessible and you don’t have to go through the extra step of decompressing them without uh with uh decompressing them and get access to them essentially so you don’t want to decompress them and you just want to access them essentially but for the most part if it’s just being for backup purposes so for you know future recovery in case anything happens and your system crashes or something I would say that you should compress all of that data so in summary uh our sync is actually very useful to do incremental backups because what it’s going to do is going to update this uh the destination location only with the changes that have been made in the source location it’s flexible it has a range of options that make it suitable for a variety of backup scenarios both locally and over the network do using something like SSH you have our basic sync command that we’ve gone through we have the SSH sync command that we’ve gone through and then we have the delete version of the command that’s also very useful that will delete anything inside the destination that no longer exists in the source very very useful tool which is rsync which brings us to system performance monitoring so monitoring the CPU memory and the processes that are running using a tool called top now top and htop both uh perform essentially the same function so what they do is they list the the processes that are currently running the services that are currently running on your computer and they show you how much CPU each one of those things is using how much memory or Ram each one of those things are using and the pids and certain details about each one of those processes and it’s a dynamic list meaning that it updates in real time and if something is taking up more memory than the next thing it’ll go up uh in the list and it’s a live list right it’s not something that is just stagnant and then you run at once and you just get this list uh with the line items saying exactly the same so it’s a live uh environment type of a monitoring so what it looks like is you just literally run top and you press enter and it provides the dynamic realtime view of what’s going on in the system so the processes that are running on the system and the amount of resources that each one of them is using uh this is included in most Unix operating systems and Linux obviously and it’s actually on Mac OS as well so if you just run top and press enter it’ll actually show you all the commands that are running on Mac OS in a live environment and how much resource each one of them is used using uh this is used all the time for system administration as well as security so if you just want to see if like a computer slows down or if a Server slows down drastically and you don’t know exactly what’s going on you would run the top command to see what is running and how much resource those things are taking and more often than not you kind of reverse engineer not based on the name of the service but based on how much resource it’s using so if it’s using a lot of the Ram or if it’s using a lot of the CPU then you can say okay well this seems kind of funky what is this particular process and then you start doing the rest of your investigation that way the command itself is literally press top and press enter and it’ll start displaying the output and it’s updated regularly it typically goes 1 second at a time but then you can change it so that the incremental uh updates are done by 5 Seconds or 10 seconds or something like that if the the 1 second update is a little bit too much which I think it has in my opinion it is too much cuz it’s it’s kind of hard to notice what the commands are so you can change it to a 5-second update and it’ll change or it’ll update itself continuously but it’ll do it every 5 Seconds instead of every second um when you navigate this so This is actually something very useful to understand so uh you can use a lot of different commands to interact with it while it’s running so when you press top or you type top and you press enter the output on the screen is just going to stay there until you exit it so what you want to do is you want to interact with the output put as it’s going around so if you press P while top is running it’s going to sort everything by CPU usage so P CPU P CPU so you press p and then it’ll sort everything by the CPU usage making it easy to identify what’s going on and what’s consuming the most CPU if you run M you would be sorting by memory so that one’s very easy to understand so another way that we can look at CPU would be p for processing so processing would be the the uh sorting mechanism when you use P so it’s going to use CPU processing power so Central Processing Unit think about it like that so the central processing unit so P would be for processing M is for memory so the ram random access memory so it’s going to sort by the RAM usage and how much RAM the the process is currently using um if you press k then you would be be ordering top to kill a process so you would press K and then it’s going to ask you uh or it’s going to give you the option to enter the P of the process that you want to kill so let’s say that using P and M you have determined that there’s one specific process that’s taking up a lot of uh resources and now you want to kill that process so you would press K and then you would provide the PID for that process so that it will be killed automatically or immediately it’ll be killed killed by your system and hopefully it will not take any more resources from you and it will just kill sometimes you may need to do Force kill which is a different command but just K by itself it will kill a process if you want to quit top you just simply press q and then it’ll exit the top interface so remember when you enter tops you type in top press enter you’re now inside an interface that is going to be interactable so you can interact with that interface once you’re done interacting with it and customizing the display and getting the information that you need killing a process doing whatever you need to do you would need to exit top and in the way that you do that is by pressing q and then it’ll exit the top interface the next one is H top which is the enhanced version of top that offers a more userfriendly colorful and interactive interface but it essentially provides the same uh processing or it provides the same service and utility that top does except it’s just a little bit more user friendly it’s colorful instead of just a bunch bunch of black and white entries on the screen you actually get color coding that happens with the entries on the screen um and it improves the user friendliness of it so uh you would run install htop if you don’t have it so it’s a pseudo command so pseudo AP install htop or pseudo dnf install htop because it does not come pre-installed so top comes pre-installed with Linux but if you wanted to include if you wanted to install htop you would need to actually install it and then you would run it according to the various commands so for example run htop press enter and then it’ll start the tool and it’s very similar to the interaction that you would have with the top tool so it provides the enhanced user experience with easy navigation if you want to view processes um or viewing the processes would be the main screen and it gives you the list of processes similar to top but with more detailed and accessible information um if you wanted to sort by columns you would do F6 and this don’t know what the the replacement for this is if you don’t have F6 on your uh computer so or on your keyboard I should say so what I’m going to do is I’m going to actually get that information for you now okay so thankfully it’s actually fairly useful so if you don’t have the F6 key on your columns or in your keyboard for example to organize the columns then what you do is just use the left and right arrow keys to move through the columns at the top of the interface and then once you’ve landed or highlighted the column that you want to sort you press enter and that’s it so uh it’s fairly simple fairly straightforward to organize the columns if you don’t have the F6 key on your keyboard which I do realize that not every keyboard has that so a lot of keyboards do my current keyboard the Bluetooth keyboard that I have does but when I’m looking at my MacBook the MacBook does not have F6 on it in uh in inherently what you can do is you can press the um FN key um which brings up the F keys right above my numerical numbers and then I can use the F keys that way but sometimes you don’t even have that option and you have to try to work a way around it so if you have the F Keys that’s amazing if you don’t have them this is how you do it you just use the left and right arrow keys and then you press enter once you’ve highlighted the column that you want and then it’ll organize that column for you so if we you want to kill a process in HTP you would select the process using the arrow keys and then press F9 in this particular case and the signal to send this whoops the uh the signal to send in this particular case would be Sig term which is the default uh which is going to terminate it’s the signal to terminate and it sends the process using the uh it kills the process excuse me using the F9 key so it allows you to interactively terminate the process now I need to find that out how to do this if you don’t have the F keys on your keyboard so let me go get that information for you if you don’t have the F9 key to terminate the process you can kill it using the K key so use the arrow keys to move up and down the list to highlight the process that you want to kill and then you press the K key the lowercase version of the K key to open the action menu which is an alternative to the shortcut to uh or this is an alternative shortcut to F9 and then you select the the the actual option or the the the process that you want to kill and after you’ve pressed that you’ll see a list of signals that you can end the process the default signal to terminate process is Sig term in this particular case which is signal number 15 for terminate and then you press enter to send sck term which will attempt to gracefully terminate the process if you can’t do that then you would force kill which would work with the Sig kill uh which is signal number nine to forcefully kill the process so Sig terminate is signal number 15 and it’s going to attempt to gracefully kill the process but if it’s not killing and if it’s not working or if it’s not dying so to speak then you can forcefully kill the process by using the Sig kill which is signal number nine and then boom you are done we have the F3 option which is the option to search for a process and this essentially allows you to enter the name or part of the name of a process that you want to search for and that way you’ll be able to find it and then if you need to kill it or you know Force kill it or get other information about it you can go ahead and do it that way so now we need to find out what the substitute the alternative for F3 is if you don’t have the F3 key then you can use the forward slash as an alternative so you press the for/ key it’s going to open the search prompt at the bottom of the interface and then you would enter the process name or part of the process name that you want to search for press enter and then you can use the N key to move to the next match if there are multiple instances of the search term and that way you can search for processes by name if you don’t have the F3 key so there we go now we have the alternative to the F3 key next would be quitting htop which in this particular case would be done with the F10 key but then I’m going to find of course the option for you in case you don’t have the F10 key and this one is a exactly the way that top worked as well so you just press q and you press q and while it’s open this will immediately exit the htop application so if you don’t have F10 you just press q and then you are good to go so in summary both top and htop are very useful tools for monitoring system performance each of them have their own unique strengths the provided strength or the strength for top is that it provides a basic yet powerful real-time view of the processes and resource usage you have command commands like P to sort by CPU commands like M to sort by memory for navigation you can quit it using Q htop is a userfriendly version of uh top and it has enhanced features for example interactive Process Management and intuitive search navigation and you can do those with either the F Keys as we saw or by using the arrow keys or the various keyboard shortcuts that you have you can use F9 to kill a process and F3 to search for processes or you can use the the various um options that we actually had to either kill or forcefully kill something and then F3 would be the forward slash to be able to search for something and then Q would be to quit it so on and so forth so uh it still uh essentially offers the same usage that top does except it’s more user friendly because it does provide color coding and the responses and the output that you get and then you can interact with the results a little bit better and you have more options to interact with the results so um that is it for top and H top and now we can move on to free free is a simple and another powerful command line utility that displays information about the systems memory usage including both physical memory and swap space um it’s vital for monitoring system performance and diagnosing memory related issues so the command itself would be free Dash and the DH option stands for human readable so it formats the output in a way that it’s easier to read using uh units like kilobytes megabytes or gigabytes so this is an example out put for example an example output for example welcome to the Department of redundancy Department um so in this case we have ran free H and then we see that we have the memory usage or the total memory that’s available in our uh RAM as well as the swap space and it says that there’s been 3 gbes that’s been that is currently being used by the RAM and then there’s 8 gabt that’s free there’s 239 megabytes that’s shared there’s the buff and cache that’s at 4 GB and then your total actual available after all of these things are considered are 11 GB and then there’s nothing that’s being used inside of the swap so that one is all good the breakdown that we just see just to kind of give you another bre we’ve actually kind of reviewed this already but we’re going to review it one more time because we are now in the troubleshooting section of the the training series but this is a repeated training cuz we did go over this uh earlier in our training Series so um you’ve probably noticed that there are repetitions of various Concepts that we’ve gone through and we’ve looked at them at least twice and this mainly because the fact that they are relevant for a uh multiple amount of things so it’s not just for one use so free can be used for troubleshooting and it can also be used for swap monitoring and memory monitoring in the context of partitioning and file systems so you can be looking you can be using the same tool to serve multiple purposes as we’ve established so the total amount of memory or swap space the used amount the free amount the shared amount of memory that’s being used by the temporary file system the buffering and the cache so the memory that’s been used by the buffer or the cache and the final available memory after all of those things are considered uh without the swap space if you want to check the memory as an example and this is just another uh usage of it but it’s essentially the same command that you’re running just different output results that we have in this case and so the total would be 8 GB 2 GB that are used free or 500 megabytes are shared there’s 1 GB dedicated to the buffer and the cache and then the total available after all of these things are considered would be 6 GB and so this is just checking the memory usage as an example if you wanted to um monitor what’s going on in real time this is a very useful command that we didn’t cover before so so what you’re going to do is you’re going to use the watch command and then the N1 represents every second so if you did N5 it would be every 5 seconds and so on and so forth so you’re going to use the watch command and then run free H every single uh second essentially so this is a separate command from free this is not something that uh is included under the free AG tool so the watch command is by itself and you can apply it to a variety of different command line tools so we have watch- N1 which would say I want you to watch this output every 1 second which would essentially Run free H every second providing a real time version of the output so it’s kind of like running top except when top updates itself now we’ve kind of cheated the system and we’re running free H every second so that we can watch the output of this command every second providing a realtime version of free H for ourselves if you wanted to look at the memory information you can concatenate use a cat command against the process mem info path and this is not part of free but it is detailed information about memory usage directly from the file system itself so from the proc file system so this technically should not fall under free but it was incl uded in the course content and so this is how we’re going to look at it so you can run the cat command and concatenate the content of this particular file which will display detailed information about memory usage that’s directly from the actual process file system which will act similar to the the results that you would get from free free- m displays memory in megabytes Das K would display memory in kilobytes and Das G would display memory in gigabytes in my opinion you should just just do free Dash because it will associate it’ll kind of determine by itself what the best measurement um would be and then it would provide you that measurement in the associated metric so if it’s less than a gigabyte it would give it to you in megabytes if it’s less than a megabyte it would give it to you in kilobytes and so on and so forth you don’t need to necessarily run either one of these things but if you wanted to you could so now you know k for kilobytes M for megabytes and G for gigabytes I was going to say gigabytes G for gigabytes if you want to display the output specifically in those measurements we have free dashb which is the option that displays memory in bytes so it doesn’t even go in kilobytes it’ll go in the number of bytes so that’s the smallest uh form of measurement that we can feed into free free- l would include statistics about low and high memory uh usage so low and high memory statistics and then as a summary we have all of these uh commands just kind of output for you it’s basically a command that’s used uh it’s a very straightforward and essential uh tool for monitoring memory and swap usage on a system if you use the- H option it provides human readable output which essentially gives you the measurements in what it thinks are the best measurement uh uh parameters in the measurement um man I’m drawn a blank again today it’s kind of it’s weird the metrics I guess I should say uh it will give you the human readable format in what it determines to be the best way to measure something so kilobyte gigabyte megabyte so on and so forth and then it’ll also give you the free the used the shared the buffer cache total memory and available memory after all things are considered uh you would run free- for that if you wanted to do realtime monitoring you would do watch N1 so that it gives you a 1 second repetition of this command so it’ll run free H every single second and You’ be able to see kind of a live update for that and then you can run the concatenate or the cat command um on the particular M info file so that you can get detailed memory information which is not necessarily a part of the free command but it kind of falls into the overall conversation that we’ve had where we’re run with top and htop and of course now free VM stat is another tool so it stands for virtual memory statistics and it’s another tool that helps you look at these system statistics like memory usage and CPU performance and input output operations as well so it helps the administrator which is you Monitor and troubleshoot system performance effectively so you would essentially run all of these commands uh to get a variety of different information or to try to see if there’s something that maybe was not caught by htop for example and you were able to find it with VM stats so uh the basic command would would be VM stat 1 and five so the first one is the data every second being populated onto the screen for five iterations so the very first one would be a live count every second and then it’s going to iterate five times so you will get five entries printed onto the screen and then you just get the snapshot of the system activity over the specified interval so this is what it would potentially look like right so you have VM stat5 and then this is the structured in several columns so on and so forth so we have the processes itself you have the memory the swap the input output and the system itself and then the CPU that’s being used used and then you have all of this data that’s associated to every single one of those things but you kind of do see that there is like an overall column that’s associated with this so under memory we have the swap memory we have the free we have the buffer and we have the cache that’s being used under the swap space right here you have swap in or swap out that’s nothing being used in there you have the input output put so basic input basic output you have two that’s being input you have 15 output uh processes that are working you have the system itself incrementally the uh the in and the Cs I have to I think we have it on the next slide that it actually breaks down and then you have the CPU usage as well that gives you the data that is being used by the CPU so this these are the key Fields as we have them here so we have the procs uh column that we saw at the very beginning right here so we have the proc column the r represents the number of processes that are waiting for the runtime so these are runnable processes the B represents the number of processes in interruptible sleep so they’ve been blocked right so that’s what procs is and that’s what R&B stand for under procs memory would be swap D so the amount of virtual memory that’s been used which is the swap space you have the free memory the free amount of idle memory you have the buffer memory the amount of uh memory that’s being used as buffers and then you have the cache which is the amount of memory that’s being used as the cach a then you have the swap column which has the SI ands so so the memory that’s swapped in from the disc in kilobytes you have the memory that is swapped out to the dis in kilobytes so swapped in from the memory so you have the random access memory that can’t handle anything more than 8 gbt for example so when it goes into 8.1 that 0.1 is going to be swapped in from the disc and if there’s anything that’s being swapped out of the memory to the dis so that it’s being used by the disk that would be essentially the reverse of what I just mentioned then you have IO which is for input output so you have bi for the blocks received from the Block device which is in so it’s coming in blocks per second so this is the input that’s coming in in blocks and then you have the blocks that are being sent to a block device which is the going in blocks per second and then you have have the system itself so you have the in and the Cs so the number of interrupts per second including the clock itself and then you have the number of context switches per second which means you’re switching from the text editor to the internet browser and you’re switching from the internet browser to a video player and so on and so forth so what you’re making switches between different applications or different processes that’s what’s happening per second and of course the number of interrupts that are going on per second and then finally we have the CPU portion which gives us the ussy wst so the US would be the time spent running non-kernel code so this is the user time running non kernel code so this is the stuff that’s being done by the user itself the SI would be the time running kernel code so this is stuff that’s being run by the system by the kernel in the background you have Idol time the time that’s been spent on Idol time spent waiting for input or output which is I I guess a little bit different than idle cuz idle would be there’s absolutely nothing going on uh wait time for input output would be the computer is up it’s not asleep it’s not an idol or hibernation or anything but it’s still waiting for something to happen and then St would be time stolen from a virtual machine so just to clarify this for you cuz it is something that was a little bit above my head as well so when you see time stolen from vmstat it refers to the CPU steel time CPU steel time is the percentage of time that a virtual CPU within a virtual machine is waiting for resources because the hypervisor is allocating those resources to another virtual machine on the same physical host so it’s the time when your vm’s vcpu is involuntarily idle because it can’t get the necessary CPU from the physical machine so you have a physical host that physical host has two VMS on it two virtual machines on it and each one of those virtual machines is requiring a certain amount of CPU a certain amount of processing power and so if the CPU is not uh if there isn’t enough CPU then you can’t allocate to both of the machines and one of the machines is taking a lot of the processing power then the second machine is getting its processing time stolen so the steel time means that the processing power the processing time has been stolen from one virtual machine because another virtual machine is overclocking and it’s taking too much so it happens because the hypervisor is managing multiple VMS and has to distribute the physical CPU resources among them if there are more VMS or higher CPU demand than the physical host can handle some VMS may experience CPU steel time so an example would be to just run VM Stat one and so we already established that the first number that we feed it would just be the amount of seconds that it waits before it refreshes the data so it’s just going to continually update the system performance data every second until it’s interrupted using the control C to stop if we were to do one and then give the second number as 10 then it would repeat this iteration 10 times and then it would just automatically stop by itself if you ran VM stat it’s just a single snapshot so it’s not going to repeat itself it’s just going to give you one snapshot of the system performance at the moment that you ran that command and then this is what we see right here so it’s going to update itself every 5 seconds for 10 increments so it’s going to or yeah for 10 increments so it’s going to update itself every 5 Seconds 10 times displaying the performance data for 10 iterations that’s what I was looking for not increments it’s going to do this 10 iterations um every 5 seconds and this is our summary for VM stat so another essential tool monitoring and analyzing system performance provides detailed statistics on CPU memory and IO operations input output operations when you understand and utilize it administrators can effectively identify and troubleshoot system performance issues so this would be used in conjunction with top htop vmstat free so on and so forth these would be on your Suite of tools it would be in Your Arsenal of tools when you’re trying to troubleshoot system resources uh for example the amount of C CPU that’s being used the amount of ram that’s being used the amount of uh processes that are running and how much each one of those processes is taking so uh as you saw you get different results from each one of these tools so it gives you a little bit of uh it gives you more context to what’s going on within any given machine so that you can figure out overall what’s going on vmstat as the name implies is going to the virtual machine virtual machine statistics free would give you all the free space that’s available uh top would give you the uh processes that are running and the amount of uh memory or Ram that they’re using so all of these things serve a different purpose and they can be used in conjunction with each other to give you a full picture a great idea of what’s going on inside of your computer’s environment or your Network’s environment all right now we need to talk about virtualization and cloud computing and this is not going to be the the Linux plus examination or at least the version that uh we mentioned at the top of this training series but they do upgrade and update their certification exams uh relatively uh frequently so uh for whenever this does get included as a part of the examination process I want you to be aware of these Concepts and even if they’re not covered in the exam it makes you a much better administrator when you understand what virtualization is and what cloud computing is and this is in no way limited to Linux but uh I think it is a very important topic to uh talk about especially when we are talking about deploying Linux and this was something that we actually did kind of go over as we uh went through the installation process um in chapter 3 and I was showing you how to boot something from a USB drive and how to install on a USB drive and how to uh reinstall a dual boot for example on your machine so that you can uh boot multiple uh versions of an operating system or different operating systems on the same machine so that all falls under virtualization and cloud computing is a little bit different but it’s kind of in the same ballpark obviously so uh just as an introduction to virtualization the first thing that we need to understand um is essentially if you if you think of your computer as a machine right or the computer that I’m recording this on or even your uh cell phone if you just consider all of these things as a machine and then we talk about the physical version of the machine or a virtual version of the machine so you could have a Windows machine be on a physical computer or you could have a Windows machine that would be a virtual computer a virtual machine and the way that we create the virtualization or the virtual version of it we would do it through virtualization and virtualization is just a technology that allows you to do it that to have multiple virtual machines running on a single physical machine and that physical machine could be something as simple as somebody’s home computer or it could be a meas sized uh server that was created at Amazon AWS or on Google cloud or Microsoft Azure cloud or any of those things right so um it’s overall the same concept it’s just different sizes of physical machines that end up hosting these uh the variety of virtual machines so uh this particular approach this concept of virtualization uh improves the use of resources uh it gives you isolated environments for a variety of different purposes so to create different applications or operating systems or test environments or anything along those lines it’s all very very powerful to use virtualization for those Concepts so uh this is essentially where virtualization is and then there are a few key Concepts that we need to to make sure that you really wrap your head around so as we as I already mentioned virtual machines are basically simulations of physical computers right so you can have a software based version of a physical computer that could live on a USB drive that could be boot booted from a USB drive it could be booted from a cloud service provider it could be booted from your actual computer that you’re watching this video on and uh that it’s basically what it is it’s just a simulation of an actual computer right the operating system of Windows or Mac OS or uh Linux or so on and so forth uh each virtual machine runs its own OS and the applications inside of that OS and they’re all independent from each other that would be on the same physical host so you can have one physical host that has a 100 VMS running on it or a thousand VMS running on it in the case of a mega server that exists at uh you know some Enterprise locations that they haven’t in their own building or they buy it from a cloud service provider right um there is isolation meaning that the failure or compromise of one VM will not affect the others or if you wanted to test something in an isolated environment you could boot up a virtual machine test it on that machine make sure everything’s all good if it crashes who cares because it’s that One Singular virtual machine and you you literally take it down as quickly and easily as you put it up and you can just keep keep testing until it’s actually ready and then you would deploy to the rest of your environment in your Enterprise or uh you just isolate each one of these things against each other for security purposes so in case something happens one machine crashing doesn’t affect the rest of your environment and your network so uh this is the overall understanding and the concept of virtual machines now there’s something that helps you deploy and create and manage virtual machines and that is the hypervisor and the hypervisor is basically a software or a firmware so to speak um depending on the type of hypervisor that you’re using but it essentially serves the same purpose right it helps you to create and manage and deploy and take down virtual machines that’s basically what it does and then with that hypervisor you allocate the amount of resources that will be uh designated to any one of these virtual machines and keep them separate from each other keep them isolated from each other that’s what the hypervisor does so it helps you boot them up deploy them and then allocate how much resources each one should take on that physical machine that they’re running on and then keep them all separate from each other the first type of hypervisor is known as the bare metal hypervisor and I’ll give you a visual so that you kind of understand the difference between the two uh so we have the type one which is known as the bare metal that runs directly on the actual physical Hardware so the CPU the motherboard the Ram uh the power source everything else right so anything that is required to build a computer physically the physical requirements of that computer the hypervisor sits on top of that and then from the hypervisor you boot a bunch of different uh virtual machines and it doesn’t require a host operating system so I’ll explain what that means when we actually get to the second type so it doesn’t require host operating system it just sits right on top of the physical hardware and uh from there you can boot all of the operating systems or applications or everything else that you would run inside of your environment and you would do that through the use of this particular hypervisor um this is more uh focused on performance so this is the higher performing version of a hypervisor because it is sitting directly on top of the physical hardware and it’s very very common in Enterprise environments or environments that have hundreds if not thousands of employees that they need to provide machines to they need to provide computers to the examples here are these are the actual hypervisors that are considered bare metal so the VMware SK or ESC I guess I I don’t know if that’s how you pronounce it um esxi whatever so the VMware version uh it’s very widely used in Enterprise environments it supports a lot of different features for managing virtual machines we have the Microsoft hyperv which is also another powerful hyper uh hypervisor that’s includes with the Windows server and it provides comprehens comprehensive virtualization capabilities and then there’s Zen which is the open source version of the bare metal hypervisor um that’s known for scalability which means it essentially you can use it in an Enterprise environment if they have limited budgets um and it’s still very secure and it’s a notable mention so it’s a good enough uh uh a hypervisor that even though it’s open source it’s still very useful in a large production environment then we have the type 2 hypervisor which is known as the hosted hypervisor and this runs on top of your actual operating system on your computer so imagine if you’re watching this on a computer or let’s say the Mac computer that I’m running this on so the computer that I’m on would act as the host and then we would install one of the hypervisor softwares on here and then with the use of that hypervisor I would deploy multiple virtual machines and it would all use the same resources that is inside of my MacBook or inside of my Mac operating system whatever it is right so if my MacBook has 8 GB of RAM all of the various virtual machines would rely on that 8 gabes of ram that’s on my actual computer right and so this is what it means to have a hosted hypervisor the computer serves as the host and then you’re using the resources of that computer and then the computer has its own operating system so it could be a Mac it could be a Windows anything else it could H it has its own operating system that computer that’s already running as an OS serves as the host and then on top of that you would have the hypervisor that helps you deploy the various virtual machines or applications so on and so forth and this is mostly used for desktop virtualization or uh smaller environment so this is not something that you would do with a thousand employees right it’s typically uh a much smaller scale version of running a uh hypervisor or a virtual type of an environment and it just depends on the resources of the main computer that host computer um the virtual box would be one of the most common ones you’ve probably heard of this because it’s an open source hypervisor that’s used a lot to deploy guest operating systems so if you have your computer being the host everything else would be the guest OS and the guest VM and it’s very very easy to deploy it using virtual box you just download it and then you start deploying as long as you have the iso image as we kind of went over inside of the installation process of chapter 3 you can essentially run uh as many virtual machines as your computer would allow based on its uh resources based on its hardware and uh the the processing power um VM workstation is the commercial hypervisor version of this and uh it basically does the same thing so it is a it’s for a hosted environment and it runs and man uh manages multiple VMS on a desktop computer um of course if it is inside of a commercial environment you would need a much faster uh host computer than something that runs on 8 or 16 GB of RAM with a you know basic CPU so it does still need to be a strong enough computer it’s going to be if it’s going to be used inside of a virtual environment or inside of a commercial environment excuse me to virtualize uh you know over a dozen VMS for example um the last one would be the Parallels Desktop which is designed for Mac OS specifically uh which allows you to run Windows and other operating systems on top of a Mac OS computer and uh of course Mac OS would need its own dedicated hypervisor so Parallels Desktop would serve as the hypervisor that can work on top of a Mac OS computer so there are a lot of advantages to do doing this uh I’m going to go over three main categories or maybe four main categories but uh depending on who you ask there’s an abundance of advantages to virtualization uh the first one would be the efficiency of the usage of the resource and how quickly you can scale up or scale down so um because you can use the same physical resources it’s efficient right you’re you’re not using a lot of money uh to buy 25 computers you can just launch 25 virtual machines from this same hardware and then from there just connect them to monitors keyboards and mouses and each one of those dummy computers we call them dummy computers or uh you can techn technically I think they’re called terminals uh you can use you know 25 terminals that don’t have any hardware they’re just connected to the main physical processor which is the massive uh or the the powerful let’s say let’s say it’s not massive but it’s like a very powerful Central Computer with all of the physical hardware and then that connects to the 25 different terminals that 25 different employees can use and then when you hire somebody or get rid of somebody you don’t have to worry about selling the computer or If you hired a new person you don’t have to buy a new computer you can just launch from that same physical resource and just have a keyboard and a mouse and a a monitor you know what I mean so it’s just much more cost effective the cost of Hardware is way less when you do virtualization instead of having to buy 25 individual computers so that’s the one piece and then the other piece is the scaling up and scaling down portion that you know you uh fire half of your people you don’t need to worry about getting rid of half of your computers you can just delete the half of the virtual machines that are currently deployed on your hypervisor and it’s as easy as selecting them and clicking delete and it’s it’s not complicated to scale up or down using virtualization the next one would be isolating and the connection of isolation to security so these two pieces actually go hand in hand so because of the fact that each virtual machine operates independently so it’s actually operating as its own computer um as long as the usernames and passwords are strong if one of them gets hacked into or fails right so if if a computer fails it doesn’t affect the rest of the network you can just reboot one and launch another one and you should be all good uh if that person gets hacked as as long as their password is strong or as long as the passwords of the rest of the people on your network are strong then whatever ransomware or virus or anything else that’s installed on that particular VM will not affect the rest of the computers because it is literally isolated as its own computer and it’s as if just that one if it’s one physical computer for example if that one physical computer gets hacked then the rest of the physical computers in the network won’t be affected and it’s literally the same thing because each one of the virtual VMS each one of these VMS is essentially a computer separate by itself so it’s isolated in its own environment and if it gets hacked if it crashes if anything happens then it won’t affect the rest of the computers that are on your network so isolation and security actually go hand in hand when you can isolate a compromised machine from the rest of your network that means you’re protecting the rest of your network then there’s the flexibility and Agility of testing and deploying and developing so because you can quickly deploy meaning if somebody new comes in literally you just go through the same uh installation process on your hypervisor using the same ISO and just deploy a new virtual machine and then you can connect that virtual machine to one of the terminals that already exist and just give that person their own username and password and you know when the other employee that was using that uh original terminal if they’re not working that day or if they’re working on separate schedules or something like that they could essentially be running uh their own VMS from that same exact terminal as long as they have their own login information their own dedicated login credentials um and then testing and developing is again another one of these easy things because it’s an isolated environment so if you wanted to test a new product launch or a new software launch and you wanted to make sure that it doesn’t affect the rest of your network then you would do it with a virtual machine test it make sure everything is all good and do any uh extra configurations or development that you need to and then from there once everything’s all good and all your eyes are crossed and or eyes are crossed all your all your te’s are crossed and your eyes are dotted once all of that’s done then you can deploy it to the rest of your environment and make sure all the other computers have it so it’s very flexible it’s very agile you can scale up or scale down as needed very quickly without having to buy new machines or sell the current machines and uh if you add 10 new employees for your night shift then they can all use the same exact terminals just get their own login credentials and that’s it it’s like it’s very very simple and easy to do and finally there’s the disaster recovery portion of it that um you know so let’s go back to the the concept of having 25 physical machines right so if you wanted to take a snapshot a backup of 25 physical machines then instead of having to plug each machine into an external hard drive and then run whatever software you would use to back up that uh computer’s contents on the the external hard drive you would just go to your hypervisor you would select all so select all of the machines and then run the backup within the hypervisor and then go to lunch and come back and all of the contents of all of those 25 virtual machines have been backed up on your external hard drive and if you can think about it so if we go back to the file system hierarchy that we reviewed at the very beginning of this training series and you you consider that you know every computer technically is just a massive file system and it has the root folder the root directory and inside of the root directory there’s a bunch of primary uh directories and then those primary directories are extended into a bunch of other directories and then those directories contain a bunch of files and folders and so on and so forth so when you think about it as a large file basically right so you have one root file and inside of that file there’s a bunch of other files folders when you think about it that way it’s basically as simple as copy pasting right so that’s what it’s like when you have a virtualized environment you click on one little line item inside of the hypervisor and that represents a computer and all of the contents of that hypervisor or all of that VM would be backed up using that hypervisor and it’s so simple to do so it’s it’s much much simpler process than having to back up 25 physical machines so this is one of the probably one of the biggest advant as far as convenience is concerned it’s one of the biggest advantages of using virtual machines and if there’s any disaster or you you lose your power or the building burns down or something like that it’s like all of these things are stored inside of this one virtualized environment that can easily be accessed especially if you’ve developed redundancies which is very important in security and disaster recovery and you have multiple locations that are connected to that same hypervisor and then if one Lo goes through an earthquake that means they’re all good that you can still launch all of those computers all of those virtual machines because you have these redundancies that are essentially connected to the same hypervisor the same virtual environment so very very powerful concept and uh as a summary really what we need to understand is that there are two types of hypervisors so if you know what a virtualization is and if you understand the concept of virtual machine you need to understand that there are two types of hypervisors we have the type one and then type two type one sits on top of the physical infrastructure type two sits on top of a already running computer which is would be the host computer and then from there you would deploy your virtual machines but I do want to show you this visual because I really believe that visuals actually help kind of uh embed Concepts into your brain and really drive the point home so uh let me just show you this real quick so this is a very simple visual representation of the types of hypervisor so we have the type 1 hypervisor on the left side here and this is the hardware so the CPU the motherboard the RAM and the graphics card and the power source and everything else that represents the hardware that is required for a computer to run and then on top of the hardware is the hypervisor so there is no operating system it’s just a hypervisor that is the type one the bare metal hypervisor and then from this hypervisor you launch all of the various uh web applications or applications or operating systems to terminal computer so on and so forth so it’s kind of simple and you can see how the this is more efficient and more uh performance driven because there is no operating system on top of the hardware it’s just the hypervisor that is deploying these various operating systems and then the type two would be my laptop for example or your computer that would be the type two where it has the hardware and then on top of the hardware is a Windows operating system that’s being used as a main computer or a Mac operating system operating as a main computer and then you have downloaded the hypervisor as a software and that hypervisor helps you launch these various operating systems or these various virtual machines and that’s basically it there’s it’s it doesn’t go any deeper than this right so the the details from here on would be okay what hypervisor are you using and how do you use it or are you going to use a cloud service provider to be acting as your hypervisor technically and then you’re borrowing or you’re renting their infrastructure from their massive uh server rooms and their data uh centers and so on and so forth and then you’re just using their interface to launch your virtual machines and then you’re going to give your employees their login and they’ll they’ll access it from their computer or is that how you’re going to do it or you know what I mean it that’s where the details kind of come in but for the overall concept that’s basically it right so if you if you are going to rent services from a cloud service provider you’re technically going to be in a hosted environment which would be this place and then you’re going to log into your web browser and that would be done from your operating system from your current computer you would log into a web browser and then you would go into Google cloud and then you would deploy 100 virtual machines using the hypervisor that is uh from Google cloud and then from there for each one of those virtual machines you would get an IP address or a login link basically and then for that login link there’s going to be a username and password that would be given to a person and then that person from their computer would get access to that particular virtual machine or web application and that’s basically it that’s really as deep as you need to kind of go into to be able to understand how Cloud environments work and how virtual machines work once we get that then we can kind of go into the nitty-gritty and be like okay this cloud service provider does this and this one does this this and so but it’s all essentially the same concept just in a little bit more nuanced of an approach so that’s it that’s the difference between a type 1 and a type 2 hypervisor if we wanted to look at different versions of type 1 hypervisors for example the KVM the kernel-based virtual machine would act as a type one hypervisor and is integrated directly into the Linux kernel it sits right on top of the physical hardware and it transforms the Linux operating system into a very powerful and efficient virtualization host capable of running multiple VMS with various guest operating systems and this is typically done in a a server type of an environment so um we can integrate the KVM with the Linux kernel um as you already know what the kernel is it connects the user to the actual physical infrastructure and makes it highly efficient and able to leverage the existing Linux infrastructure it in allows the the KVM to take advantage of the features of Linux like the memory management process scheduling input output handling robust performance and scalability all of that stuff falls under using something like the KVM or just the KVM itself um it also helps you allocate the resources very easily and efficiently so to speak um it uses the hardware assisted virtualization which is what a type 1 hypervisor is and it’s it’s supported by the processors with uh Intel virtualization or amdv technology and these are essentially the hardware uh AMD for example amdv uh you should recognize AMD just by the name because they create really awesome graphics cards and Intel also creates graphics cards and computer chips and things like that so um the KVM uses uh Hardware that is supported by processors from Intel or AMD and we’re talking about physical processors computer processors so this Hardware support allows the KVM to efficiently allocate resources like CPU memory and input output usage to Virtual machines ensuring that the performance and overhead is all matched up as it’s supposed to be and it’s all balanced out but basically kernel Bas virtual machine KVM is a type 1 hypervisor that sits on top of the physical computer and then from there you would launch a variety of different virtual machines um it has a lot of support for various operating systems meaning that you can actually launch Windows and Linux and BSD and these various types of operating systems from the KVM so it is compatible with variety of operating systems and it’s runs its own OS so uh each virtual machine would run its own OS basically and it would be configured with the hardware specifications like for example how much CPU will it use how much RAM is it going to get how much storage is it going to get so on and so forth so again the same thing that is done with a type 1 hypervisor it’s just this specific one is of notes for what we’re talking about because it works with Linux it is a Linux uh virtual machine manager Linux hypervisor but it is compatible with Windows and Linux as well as any other operating system that you would want to install for your virtual machines ver is the command line tool the command line interface that interacts with KVM so this is essentially the interace that you would use to manage the KVM based VMS KVM based VMS that’s like a kind of a tongue twister um it’s a part of the lib vert virtualization toolkit which provides the API to interact with hypervisors including KVM as well as a variety of different hypervisors so verse is the command line interface to interact with KVM to be able to deploy your virtual machines if you want to start a virtual machine via the command line using Verge this is basically it so you would run verse start and then the name of the VM or whatever the VM name that youve designated previously and it would just start that virtual machine up right and replace it as you can kind of this is like very intuitive it’s a very uh easy to understand command line so you run the command start and then you just give it the name of the VM so that it actually starts DVM uh for example this would be something that the name of the computer would be my virtual machine and you just say verse start my virtual machine and then you can list the running virtual machines using list so ver list very very simple to understand right so it would list all the currently running virtual machines uh displaying the IDS the names and the states that they are currently in and this is what an example output of that would look like so this in this particular case there’s two virtual machines there is my virtual machine and another VM and they’re both running very very simple to understand stand easy to understand if you want to shut something down then you would just use the shut down so the opposite of start you shut it down it reminds me of Kevin Hart I’m hey shut it down um so you would use verse shutdown and then give it the VM name and then it would just shut down gracefully I love this portion it gracefully shuts down that virtual machine uh making sure that everything is all good and this would be the example of it um using the my virtual machine name so verse shut down on my virtual machine um these are a couple of examples as again just to kind of reing grain this all in your head the start command would start up the virtual machine so in this case it would be the Ubuntu VM that we’re starting if we wanted to list the virtual machines you would use ver list and it displays all the currently running virtual machines on your screen and then shut down the virtual machine would be done uh to stop the virtual machine from running so verse shutdown ubun 2vm would stop the UB Bo 2vm from running so KVM is the type 1 hypervisor that’s integrated into the Linux kernel um it enables virtualization on Linux hosts um and then you can run multiple VMS uh with a variety of different operating systems and these are our basic commands so start a VM run the VM or view the the running VMS and then stop the VM to uh deploy the virtual machines using KVM is going to be out side of the scope of this particular interaction I just want you to know that the tool that is used to interact with ver would be or excuse me to interact with KVM to kernel-based virtual machine to interact with that you would be using the Verge command line and then from there the the commands are as simple as starting something that’s been deployed using that particular hypervisor which is KVM so the KVM hypervisor would be the one that would give you the name of the virtual machine for example and then from there you would uh start it stop it list what you have available so on and so forth virtual box is another really commonly used widely used type 2 hypervisor so as we reviewed our hypervisors this one is going to sit on top of a host machine uh it’s been developed by Oracle it’s very popular because it’s compatible with a different operating systems like Linux Windows Mac OS and it’s commonly used for testing and deploying environments um uh testing and development environments excuse me um it provides the easy to setup and flexible platform because it’s I mean it’s literally like running any software and you can go through the prompts and The Wizard for the installation to run multiple operating systems on a single machine uh some of the key features is that it has compatibility with a variety of platforms so you can actually run it on a variety of host operating systems uh which makes it versatile to use on different platforms you can uh run a bunch of guest operating system sys meaning Windows Linux Mac OS Solaris and others so it’s cross compatible right it’s Closs platform compatible you can run it on uh windows or Linux to launch Windows or Linux right it’s compatible across the various platforms very easy to use as I mentioned it’s as simple as doing a couple of clicks to download the install files and then actually installing it and then once you have it installed you would use the GUI the graphic user interface to to just go through the prompts and click the various buttons that you would need to uh start up a new virtual machine and there’s a lot of documentation that’s available for virtual BLX because the fact that it is one of the most commonly used tools for virtualization the one of the most commonly used hypervisors and if you really want to be nerdy about it you can actually use the command line interface for management and automation for scripting and a variety of different tasks uh if you really want to get good at virtual box and just virtualization in general which I recommend that you do uh I would recommend to go look into it um I’m not going to go deep into the virtual box uh command line and it’s use case but there’s a lot of tutorials available and maybe in a future video I will if there’s enough people that want to see it I may just create a video using virtual box because again it’s just one of the most commonly used tools to virtualize VMS um some of the key features here that we have this snapshot functionality so it allows you to take snapshots of the current state of a VM so meaning that you actually can take backups of the computer very easily as I mentioned earlier it’s as easy as clicking on one of the virtual machines in the list of VMS that you have and then just taking a snapshot of it meaning backing up its content so it’s very very useful and very easy to create uh virtual images so to speak or a snapshot of a virtual machine to have it as your backup uh you can have guest additions meaning uh there are tools that you can use um to enhance the performance and usability of the guest operating system so you can improve the graphics for example of an operating system you can have shared folders between all of your operating systems or you can have a mouse integration which is I mean that one’s kind of a gimme but uh the shared folders part is very important to be able to improve the graphics of a operating system is also very very cool to be able to do just from your virtual box from your virtualization hypervisor so that’s a very cool little feature and for the Nerds vbox manage is the command line interface for managing virtual boox VMS so uh this is basically the the platform the command line interface that you would use to virtualize machines via the command line or to create scripts for virtualizing machines which is where the the the scalability comes in in a a easy convenient type of a format when you learn how to script and you can launch you know a dozen machines with the use of a uh a script that was done it’s I mean it’s it makes it even easier to virtualize VMS so uh vbox manage is actually the CLI the command line interface to launch virtual machines so uh vbox manage start VM VM name so this is a little bit uh more of a mouthful than verse but basically the the same concept right so you would uh call on the vbox manage tool and then start VM and then give it the VM name which would be the name of the virtual machine so as an example ubun 2vm would be the name of the virtual machine so you would just say Rebox manage start VM ubun to VM uh if you wanted to list them same thing list VMS very very simple um instead of verse list this is vbuck manage list VMS so as it’s kind of as simple as just using verse and then it would just list all of the vmss that are registered with virtual box and it displays their names their uu IDs and so on and so forth if you wanted to look at the output as an example this is what it looks like so you have the Ubuntu VM and the Windows 10 VM so we have a Linux VM here as well as a Windows VM and Microsoft and these are The UU IDs that are associated with these virtual machines that we’ve launched using virtual box so if you box manage control VM VM name power off again more of a mouthful than using the Verge command but this forces the specified VM to power off so you would replace it with the name of the VM that you want to turn off vbox manage control VM VM name power off this would be the example so ubun to VM being powered off and this is what we would do to do that so A couple of examples just to kind of review these commands that we have so vbox manage start VM Deb and VM would start the DN VM list the VMS with list all of the registered VMS with their names and their uu IDs the stopping of a VM will be done with control VM give it the name of the virtual machine and then give it the power off command and in summary we have the vbox manage commands for managing VMS at the bottom right here and uh we have the reference of virtual box which is I would say probably one of the most HP popular if not the most popular type 2 hypervisor that’s super super easy to download install and then it’s very flexible because you can run multiple operating systems on a single computer and it’s used for testing development and a variety of different tasks because it’s so compatible with various hosts and guest operating system so again it has cross compa compatibility between the host operating system and the guest operating system which makes it again one of the most popular hypervisors that exists on the market doc ERS and containers are The Replacements that we have for file systems and uh partitions essentially so a doer is a popular popular containerization tool that enables us to package applications and their dependencies into uh portable containers essentially um these containers can run consistently across different environments ensuring that the application behaves the same regardless of where it’s being deployed so you have have a container a compartment so to speak that includes a variety of different applications and all of the dependencies for that application to run and then you can launch this Docker uh within a Windows computer a Linux computer or so on and so forth and this would be all done through the virtualized environment right so this is something that a virtual machine would have access to so a container versus the virtual machine itself if you have to think about the comparison between the two uh containers are the isolated environments that share the host operating systems kernel they’re lightweight fast to start and they don’t require a full operating system okay containers include everything needed to run the application like the code the runtime Etc but it’s not an actual virtual machine so the virtual machine is not the uh or excuse me the container is not the OS right so it’s an isolated environment that shares the OS and then it has the application and the dependencies that would need it to run right so the virtual machine is the OS the container is the environment that is using that OS okay so uh you can have uh and so as as the comparison here we also have the virtual machine that is running on the hypervisor and includes the OS right so the container would run on the virtual machine essentially that’s what it is so a container hosts the application it hosts the dependencies so on and so forth but it needs to run on something so it would run on the operating system that’s provided by the virtual machine or it would run on the host operating system that would be your computer for example each VM operates independently with its own OS which increases resource usage as we’ve already established containers don’t run their own OS so they’re more lightweight and they’re faster to run um when you have a Docker container right so the docker and the containers within that Docker essentially they run on anything that can actually support Docker which means that they’re consistent across development testing and production environments uh they’re isolated so you can manage the file system or process everything that goes on within that file system uh on the variety of different operating systems that they will be running on um which means that you can have multiple applications run on the same host without interfering with each other so this is really just a fancy way of saying that you can compartmentalize a group of applications or a series of applications from each other you can separate them from each other run them separate of each other but they would be running on the same host right so they’re portable meaning that they can essentially be transferred via an email or a file share and then they can be isolated they can run on themselves or by themselves excuse me without uh interfering with each other and then uh they will run on the same host they would run on the same host operating system and they’re isolated right so they’re portable and they can be isolated the benefits are many um they’re efficient so they’re lightweight as we mentioned they don’t use as many resources from the hardware or the operating system and it can start at pretty much like you’re starting up a piece of software because that’s kind of what they’re like um they’re scalable meaning that they can be easily scaled up or down based on the demand they can be shared with a variety of different uh operating systems um and they’re very uh they’re ideal for micro services or Cloud native applications and that’s really where they come into play is the cloud environment because Cloud providers uh offer these Dockers and these containers and the software that is installed within them the applications that are installed with them are provided from the cloud service provider and uh you can essentially download it and install it into your virtual machine fairly easily or download it and install it to a 100 of your virtual machines fairly easily because they’re very compatible and they can scale up or scale down um to be able to run one you would use the docker command so the docker is the tool that will run the image of one of the containers that would run one of the containers ERS so the it option allows you to interact with the container via the terminal so you say Docker run interact with this particular image name and then you would replace the image name with the image or the the name of the various container tools uh that you would want to run so as an example you run Ubuntu right so again you’re running Ubuntu as a software technically as an application you’re not running it as an operating system system and that’s why it’s faster to run and it’s using the resources of the host operating system to run it so this is not uh running Ubuntu as an operating system it’s running it as a container which is using your host operating system your computer which is why again this is very native to Cloud environments because when you’re using a cloud environment you’re technically using a type 2 hypervisor which uses your computer’s resources to launch Ubuntu for example right then you have PS to actually list the containers that you have available so it’ll list all the currently running containers displaying their IDs names statuses and more and this is something that is done inside of the cloud command line and this is what it looks like so you have the container ID the image meaning what is actually running um the command uh interface for it which is a bash interface in this case it was created 2 hours ago it was uh up for 2 hours and there are no ports for it and the name is awesome Wing that’s been assigned to it and that’s it that’s what it looks like if you wanted to stop that exact container you would just do stop container ID and then give it the container ID or the container image name in this particular case but H what we’re talking about is this ID so this would be the container ID that you need to give it to stop that container from from running to stop that image from running this is what it looks like to Docker stop and then you give it the ID and just like that it it stops that container from running if you want to pull an image and so pulling an image meaning downloading the specified Docker image from Docker to your actual computer to the local machine your host computer that you’re running it on you would use Docker pull and then give it the image name so for example pull Ubuntu or pull in Jinx in this case it pulls the latest inin image from the docker Hub onto your computer and the docker Hub is the location where everything is listed inside of the cloud service provider that you can uh either install on your virtual machines or run from the cloud service provider or just download onto your local computer right so dockerhub is the location where all of these the variety of Ubuntu and in Jinx and the all of the different images so to speak of these containers are all stored and then from there you would either run them or download them onto your computer or you can install them on your virtual machines if you want to remove something it’s as simple as using the remove command so RM removes that container or the the stopped container right so you actually have to stop the container first and then from there you would remove it from your list of containers it’s fairly simple it’s like very intuitive um you need to give it the ID so as we already mentioned this is how you stop one and when you stop one you can use the same ID to remove it and then if you wanted to remove an image of something you would just use RMI so if you use the RM command by itself you need to give it the ID of the container in this particular case if you wanted to use the image name cuz it’s much it’s a more convenient thing to use the image name instead of the ID itself unless you’re going to copy paste for example but if you just want to remove it by its image name you just do RMI and then give it the image name and then it will remove that specified duck or image and then the same thing example would be here right so Ubuntu is much easier than this ID number for example so you just do RMI Ubuntu and then it’ll remove the Ubuntu image from your local machine so in summary we have the simplification of creating deploying and running applications in isolated containers that’s what basically the docker does right so the application is stored inside a container and then you would use Docker to deploy it or to create one or to download it onto your local machine or to run it or to stop it so on and so forth so Docker is the source that you use to interact with the various containers and then inside of those containers could be a variety of different things so it could be applications it could be what you saw technically as that Ubuntu image that was running running which is technically not its own operating system cuz it’s still running on top of your host operating system which is your computer so it’s more lightwe to do it this way so it’s faster to run them uh compared to traditional virtual machines cuz when you actually start a virtual machine uh it takes up a little bit more time and it takes up more resources uh because it’s a little bit more heavy duty uh than running just a container that has that specific series of applications that you’re looking for for it to run so again not an operating system you’re not launching the Ubuntu operating system you’re launching the variety of applications that would be stored inside of that one container that resembles what it would be like to launch Ubuntu as the operating system right but again it’s not running the actual operating system cuz your host operating system on your computer acts as the OS and then the container has the ninx server in it and the the applications and dependencies that would be connected to running an engine server and then you just launch it and now you have your engine server up and this is your cheat sheet of the commands that we just talked about so if you wanted to take a screenshot of this or uh jot it down you could pause the screen or take a screenshot of the screen one two three and moving on now we’ve talked about virtualization and virtual machines and Dockers and containers and all that good stuff now we can talk about Cloud Administration because essentially this is what virtualization is you are dealing with the cloud with virtual machines and virtual computers so cloud computing it’s the technology that provides OnDemand access to Computing over the Internet so on demand access to Virtual machines or Dockers or containers so on and so forth uh it includes servers storage databases networking software and a variety of other things and you can provision them and manage them with the click of a button and cloud computing allows businesses and or individuals to leverage these powerful resources without the need for physical Hardware or extensive it infrastructure which is what makes it so freaking popular so the infrastructure as a service is the first one and infrastructure as a service is something that’s provided as it says as a service and it’s provided from a cloud service provider so infrastructure as a service is the virtualized hardware resources like uh which is virtual machines for example storage networks which allows the user to deploy and manage operating systems applications and development environments so this is infostructure as a service uh the example would be AWS ec2 and AWS has a variety of different Services under their umbrella of services ec2 is the compute capacity in the cloud and enabling users to run applications on Virtual servers that’s what their version of IAS infrastructure as a service is the next one is Microsoft Azure virtual machines which is the uh essentially the equivalent of aws’s ec2 and it provides a range of virtual machine sizes configurations which support both windows and Linux operating systems and then we have the Google compute engine which is the Google version of this whole thing which is essentially offering the same thing uh scalable flexible compute resources for running large scale workloads on Google’s infrastructure and then you have platform as a service platform as a service would be the next level up so if we looked at infrastructure as a service as the base level because they’re providing the infrastructure platform as a service is the development and deployment environment in the cloud so it provides you with the tools and services to build test deploy and manage applications without worrying about the underlying infrastructure so for example the virtual machines and the physical Hardware that’s required you’re just using their platform to develop your applications to deploy and test your applications so on and so forth so it’s the next level up from buying infrastructure or renting infrastructure uh the elastic beant stock would be the AWS version of this um which helps you develop and scale and deploy web applications and services using popular languages Google app engine is the Google version of this that helps you deploy applications on Google’s infrastructure without automatically uh with automatic scaling excuse me automatic scaling and management and then we have the Microsoft Azure version which is the Microsoft Azure app service that helps you uh develop help developers build deploy and scale uh web apps and apis quickly with the integrated support for various development languages so on and so forth so basically these three companies essentially offer the same exact thing across the board uh the platforms have different interfaces to use they uh some of them are more userfriendly than the other ones but for the most part they offer similar services for everybody then you have the software as a service so if we were to look at infrastructure as the base level platform would be the next level software would be the next level as well which would be the application over the internet on a subscription BAS so you can access these via a web browser without the need for installing or maintaining anything you’re just log into any of these companies so Microsoft Office 365 is like one of the most com uh common ones it’s access to Microsoft Office like word excel PowerPoint so on and so forth and you it’s very similar to Google uh drive right so Google Drive would be another one of those things where it’s kind of the software as a service you get access to Google Sheets and you get access to Google Docs and so on and so forth and that’s you’re using the software now right right the the software is the word editor the software is the spreadsheet Creator editor and that’s what that is and with the connection with these things to one drive and teams so teams would be the sharing portion of Office 365 and one drive is the storage portion of Office 365 which again is very similar to Google uh Google Drive excuse me um Google workspace is the software as a service that kind of combines everything together right so you have the productivity collaboration because you’re now working with your uh co-workers and everybody that’s on your team and inside of this would be Gmail Google Drive doc sheets and Google meet which would be their video uh chatting video conferencing and then Salesforce which is a really big one um it’s a customer relationship management so which is a CRM tool and it helps businesses manage their customer relationships streamline s processes so on and so forth and Salesforce again is a software and it’s actually available on most cloud service providers as well and you can buy a license to it and have it installed on everybody’s local computer so that would kind of be another version of the software as a service so cloud computing as we’ve discussed in multiple instances so far under this uh particular chapter it helps you scale up or scale down right so based on the demands of your uh company you can get a bunch of different softwares deployed to all of your users all of your employees you can have a bunch of different virtual machines launched using the infrastr infrastructure as a service you can have uh the platform launch as well for them to be able to develop code and run uh or test their uh uh Cloud resources or test their applications that they’ve developed so on and so forth and you can do this up or down meaning as your company grows or as your company shrinks and downsizes you can add things with the click of a button or you can remove things with the click of a button very easily scalable in both directions and it’s coste efficient so this is this is one of the big things that companies are like uh one of the main reasons why they go into cloud computing because literally I mean for $20 for $30 a month you can start uh launching something for whatever environment for as much power and processing as you would need instead of buying a $5,000 computer you know what I mean depending on what you’re looking for depending on what you would need the cost The Upfront cost is so much cheaper and then you know when you don’t need it anymore instead of worrying about what to do with that $5,000 computer you just stop paying for the thing and let’s say you used it for 6 months and you don’t need it anymore and now you stop paying for it and you just turn it take it down you know what I mean it’s it’s in the grand scheme of things it’s so much more efficient cost-wise to use cloud computing for a Compu uh for a company especially maybe not for a person just depending on who you are as a person and what you do and what you you need it for maybe you do need it but for the most part for comp companies it’s kind of it’s kind of a no-brainer to go into cloud computing uh to use their services um flexible obviously accessible so cloud services are accessible from anywhere with an internet connection you just need your laptop and now you can log into your cloud service providers your CSP and then from there get access to whatever it is that you were using from your home office for example um they’re reliable they’re available because they have redundant locations so AWS micro uh Amazon’s uh web services they have warehouses all over the globe that house these servers that are connected to the AWS service and if one of those things goes down it just defaults into the next available one in that region and now the whoever the person is that’s using it never has to worry about losing access to what it is that they need to use and this is very very important because the redundancy portion of this the Redundant server warehouses that exist all over the world literally all over the world these redundant warehouses are the main reason why these services are so reliable and available all the time uh Disaster Recovery as long as you have enabled some kind of a backup you will never use your lose your data and as long as obviously you’re paying your bill that’s the other part so it’s like as long as you’re paying your bill and you’ve done a scheduled backup uh you will never lose your data and you will never lose your service your server will always be on your web server your application server will always be on and running so the and these were just the three major Heavy Hitters we were talking about there are a lot of different cloud service providers that are also very legitimate and they have a lot of great resources that are available so uh these were just the three big ones the three Heavy Hitters that we were talking about and then you have of course the automatic update so you don’t need to update your computer uh you don’t need to update the service it just gets updated on your behalf and you don’t have to worry about making sure your it team is on top of it or that you if you’re the the sole proprietor the administrator and the CEO and everything you don’t have to worry about updating anything because it’ll be automatically done for you the security will be updated for you uh all of it is very very convenient process there there’s a lot of different conveniences that are available for using cloud computing but this is another really big one because with updates and uh automatic updates with automatic updates there is a lot of security that is uh enforced and a lot of convenience that is also enforced because as things get more streamlined uh then you will just get those updates as things get um hacked into so as pen testing is done and they upgrade their security infrastructure you will also inherit those benefits as well so cloud computing is a very flexible scalable cost efficient and just awesome way of going into virtualization and you have the infrastructure as a service I AAS platform as a service p p p a a oh my God and then s AAS as a service and I’ll show you what the visual of that also looks like this is kind of the best image that I was able to find all the other ones are they’re similar to spreadsheets or something like that or like a pyramid that doesn’t really give you much um so I AAS the infrastructure as a service you can see that this is basically what the web server looks like or the servers that are inside of those massive warehouses that’s what it looks like and this is what you’re renting and these things can help you deploy virtual machines and virtual computers for people to use and then those people can then use uh install their own um for example operating systems on top of them or they can install their own applications and develop new applications so on and so forth so this is the most uh base level of ser service that you would get which on top of this you can essentially do anything that you want but you would need it people you would need a team of people to be able to do this for you right and then you have platform as a service right so it they give you the servers obviously but then they also give you the operating systems so you could just launch a Windows machine or a Linux machine instead of only launching a server and then on top of the server having to use your hypervisor to install the operating systems you would just launch one you would just launch launch the operating system and then from there you can start developing your code or you can start running your business whatever it may be right so this one requires the most configuration because you do need to use a hypervisor to be able to install the virtual machines and then install the operating systems and then launch them this one you just launch the operating system with a couple of clicks and now you have access to a Linux computer or Windows computer right and then the last one is assuming that all these things are already configured for you you have to do no configuration and then you just launch the application so Google Drive for example you just launch Google Drive right and that’s the the the most convenient version of this uh Trio because there is very little configuration that’s required from your behalf you just log into the application and you start using the software and that’s basically it right so this one requires the most configuration because somebody needs to use a hypervisor and install the uh operating system install the Linux or uh ver windows or whatever it is uh launch the virtual machine all of those things this is the most configuration this is the least configuration and then this is the middle ground right here that with a couple of clicks you’re now launching a Linux machine and then now you use the Linux machine and use the applications inside of it or you launch a Windows machine so on and so forth so this is the trio that we have over here which falls under as a service this infrastructure plat platform and then software as a service it’s probably also a good idea for us to talk about some of these Cloud survivers uh providers excuse me uh these are again the big three that we got over here so Amazon AWS Microsoft as well as Google um they offer a comprehensive list of services um and they’re comparable you know they’re very comparable to each other it’s just I guess kind of defends on Personal Taste and personal preferences um where they would uh differ or where somebody would choose one over the other maybe pricing would also play into that but for the most part uh they all pretty much offer uh similar range of services so AWS is you know the most popular Believe It or Not Amazon is not just a Marketplace their biggest profit comes from their web services their AWS web services because for the most part they have the same infrastructure that’s set up and then on top of that they’re just selling uh access to their resources on a variety of different tiers and it includes computering power of course storage options networking abilities um I host my website one of my websites actually all my websites but I don’t have one specific that’s launched for this channel yet uh I did buy the hack Alix Anonymous domain through AWS so that’s a really cool one and uh that’s basically it like it it just offers uh computing power storage options and networking capabilities that makes it great for everybody individual users Enterprises government entities so on and so forth um their key services are the elastic compute Cloud so ec2 uh lamb Lambda excuse me which is serverless Computing so you don’t need an actual computer or server excuse me and you can just get access to uh computing power just using their platform and then elastic beanock which is the platform as a service and then storage would be the simple storage service which is S3 elastic Block store and then Glacier which is for long-term storage Amazon web services databases uh runs across the variety of different services or levels of services as well so you have the relational database service you have Dynamo DB which is no SQL database and then you have the red shift for warehousing data warehousing which is kind of falling into cold stor storage and then you have networking so virtual private Cloud Route 53 for DNS services cloudfront for Content delivery Network and this is their list of key services and then we have the management consoles and the management tools so uh the interface for managing AWS resources would be the Management console the AWS CLI for the command line interface to interact with their services programmatically as well as script wise and then you have the software development kits the sdks for integrating AWS Services into applications using programming languages so python sdks are very common thing to understand when you’re interacting with the API of any service you will look for a python SDK and you will import it into your uh software or not your software excuse me you will import it into your project file and then that SDK will allow you to interact with the API of that service for example in this case AWS AWS has an SDK you can import that into your python project and then now you can interact with your AWS management uh environment as long as you have your your token and your API key so on and so forth to confirm that you actually have access to their services now you can interact with those services using a variety of apis and software development kits so uh very very cool portion of it and again also all of these guys have sdks just to let let you know so Microsoft Azure is another one it’s the competitor to AWS uh seamless integration with Microsoft products Azure offers a wide range of cloud services including compute analytic storage and networking you have the compute and storage portion so aure virtual machines Azure functions which is the serverless Computing Azure kubernetes service which is a really big one a lot of people actually ask for this uh inside job descriptions you have the Azure blob storage Azure disk storage and Azure files you have the databases which is the SQL database Cosmos DB which is the nosql database asure database for post SQL and MySQL and then we have Azure virtual Network Azure load balancer and content delivery Network which is for their CDN and then you have their management tools which would be the Azure portal the command line interface and of course Powershell scripting language and then they also have an SDK that you can interact with and then we have Google cloud which is for capabilities in data analytics and machine learning uh they have a robust set of cloud services that leverage the infrastructure of Google the massive infrastructure that they have like Computing storage application development uh Google compute engine the kubernetes engine and Cloud functions are all under their Computing section you have the storage so cloud storage persistent disk file store Google Drive for example and then you have the databases cloudsql Cloud spanner which which is the regional database and of course fir store which is the nosql document database and then networking for virtual private Cloud so VPC Cloud load balancing and Cloud CDN and then this is their management tool so Cloud console the cloud CLI and the client libraries for integrating into applications using programming languages so the client libraries would be similar to the SDK the CLI as you can tell every single one of them has a CLI and every decent cloud computing provider should have a CLI and then the console which is the GU the graphic user interface for management of all of your resources and assets via Google Cloud so that’s what we got over here obviously scalable obviously cost-efficient we’ve talked about all these things flexibility security all of those things and the Advanced Services like the AI stuff machine learning stuff and the big data analytics enabling you to innovate and stay competitive uh this is becoming more and more uh required to be competitive in the modern business world to have some kind of an AI assistant or some kind of AI uh integration with your infrastructure to make things more efficient and respond faster machine learning to just be on top of your competition with the research that’s being done and to improve your data sets and your infrastructure because machines learn faster and better and more consistently than human beings do and then you have the big data analytic stuff which is uh it’s very very crazy that how much data actually exists in the world and how important it is to be able to ingest all of that data and make sense of it and these things are connected to each other so the big data analytics connected to the machine learning which is done by an AI all of those things are interconnected and they’re I feel like they’re almost mandatory at this state of the the modern technology world and how technology has been connected to business I feel like these things are absolutely mandatory and if you want to be really really good as a cyber security person as a administrator a tech person it you need to really get comfortable with the concepts of machine learning and big data and using AI Integrations so that you can just stay ahead of the curve when it comes down to your competition inside of the IT world and the cyber security and pentesting and system administration World in summary we got these major Cloud providers but they are not the only ones and uh I would challenge you to Google what the major Cloud providers are and see what you can find out there and how they rank as far as their competitiveness to these big three right here um there is a lot of tools available as we already talked about there’s infrastructure platform as well as software that is available as a service and they have a lot of great management tools including the consoles as well as the command line interfaces as well as the SDK the software development kits that integrate well with programming languages to provide further automation capabilities and these are the big three as we talked about all right so you got your VMS up you got your containers and all the things that are provided to you by your virtual EnV environment how do you manage these things that’s what we’re going to talk about so lib vert would be the first command line tool that is available for us and it’s a toolkit that is connected W via API and it’s used for interacting with VMS across a bunch of different platforms like KVM uh kemu Zen and VMware it’s consistent because it works across all of these different platforms and it’s a very popular choice we’ve already mentioned it actually hopefully you remember uh as we talked about D virtualization portion earlier we did mention lib vert already um it offers a unified API for managing the VMS across different hypervisors uh simp it simplifies the VM management as a result and makes it so that the use of a consistent scent of commands and tools would basically work everywhere depending on doesn’t matter where you are and what you’re using as your virtualization technology Tech ology um ver would be one of the key features that comes with lip vert and then vert install with the command line tool for creating and installing a new virtual machine so these are the two command line tools that come with vert and it’s compatible again as we already talked about so it works with the various virtualization Technologies it’s a versatile tool for various virtualization environments and this is kind of what a command looks like to use uh lib verts specifically to create a virtual machine so there are a few elements here that we need to go over so vert install would be the command line tool to install a new machine the name is what you’re going to give it and then this is how much memory should be allocated to it the vcpus that would be allocated to it and then the dis which is this is going to be the path that’s going to go for this particular dis and then the variation of the OS that you’re going to be installing which is essentially the OS image that you want to use to complete your installation so this is the breakdown the full breakdown of this we have the vert install which is to create and install a new virtual machine inside of lip vert then you have the name that you’re going to give it so what you want the virtual machine to be called the allocation of 248 megabytes of ram to this particular VM which is basically 2 gigs worth of ram that you’re giving to this virtual machine the process processing power that it has vcpu which is the virtual CPU that’s going to be assigned to this virtual machine which is in this case two CPUs that are going to be assigned to it and then this is the path so the dis image for the VM with a size of 20 Gigabytes and the 20 GB in this particular case is storage so we’re not talking about the processing power of Ram or the processing power of the CPU we’re just talking about storage for this particular machine and it’s going to get 20 GB and then of course the OS variant which is the operating system for this particular case which is Ubuntu 20.04 so this is how you create a new virtual machine using vert install inside of lib vert and this is the full thing if we were to actually provide some fillers in here so the name would be my Ubuntu VM same uh RAM capacity that we’re assigning to it same vcpus that we’re assigning to it and then this will be the full path that’s going to this thing and then we’re going to allocate 20 gigs of storage to it and then you have the OS variant which is Ubuntu and in the CD ROM wow look at that there’s a CD ROM image as well that’s going to be the path for this particular case it’s being pulled from a CD ROM so this OS variant is being pulled from the CD ROM partition which is the path to that particular cdrom partition and then this is how you destroy one I love that word so you destroy a virtual machine using the verse destroy so vert install is how you create one and then to it destroy it you would need to use ver and so again you just give it the virtual machine name that we got here so my Ubuntu VM would be the name that we give it and then it destroys it it forcibly stops the specified VM with the name that you want and actually destroys it so my Ubuntu VM in this particular case is being destroyed if you wanted to list your virtual machines you would do list list all and it lists everything that’s being managed by lib vert showing their IDs their names their current state whether or not their run uh running paused or shut off and then this would be an example output for that which is in this particular case you got three so the May Ubuntu VM the test virtual machine and the old virtual machine and because of the fact that this one is shut off and this one is paused their IDs are not activated so the only one that actually has this ID that’s associated to it is the one that’s currently running running and this would be the output that you would get from listing the all virtual machines with ver so this is the the flow here that we got so as an example we want to create a virtual machine in this case which is the my sentos Cent OS being one of the red hat distributions of Linux there is 4 GB of RAM assigned to this particular one there is four CPUs assigned to it the dis location the path to this virtual VM is going to be here and then there’s 40 GB of storage that’s assigned to it and then the orus variant is Centos and then it’s being pulled from the cdrom partition for Centos installation that has the iso image on it and that’s it this is how you install one with the Cent OS if you wanted to destroy that same machine you would use ver so again vert install installs this one ver is the one that we use to destroy this thing and it immediately stops it and then if you want to list everything you would do the list all command so lip vert is the toolkit for managing virtual machines across a variety of virtualization platforms it’s consistent meaning that you can use those same exact commands across a variety of different uh hypervisors and virtualization environments and all of those commands would run exactly as they are and then you would have ver for the management and vert install for the installation of the various virtual machines that you got and this is our summary of commands so 3 2 1 moving on okay so now we got to manage our Dockers okay um Dockers significantly or excuse me the containers of the docker my bad so the docker is the tool and the containers are the stuff that we manage with the docker tool so uh Docker simplifies the deployment and management of the applications which are stored inside of these containers and these are the microser Services uh and the architectures around them and the packages of applications along with all of their dependencies inside their isolated containers right so you have the docker which is the tool and the container is the container that houses the the applications the dependencies of those applications and everything else that would be used in something called a micro service so basically just running the various applications uh for example that Ubuntu example that we saw earlier that would just run uh the various applications and the dependencies that would be uh inherent that would be ingrained inside of a Ubuntu environment without actually launching an Ubuntu operating system so you would pull the image meaning downloading it from the docker Hub which would be housed on a variety of different cloud service providers or any other container registry really and then you would pull that from that location into your actual local machine your computer you would download that into your local machine so for example we’re going to pull inin in this particular case and pull that image into our local machine and then you want to run it right so you start the new container in detached mode meaning that it’s going to run in the background and then you would replace that with the name of the docker image that you want to use for example inin what we had in the previous case so run the inin and it’ll run the inin container in the background and it’ll return the container ID for you and once you have that you can do a variety of different things with that container like run it and actually use it and then when you’re done with it if you want to remove it you would need to provide the container ID so we got the container ID by running it inside of the uh background in the background and you can also run the list command as we saw uh to be able to give you the list of the containers that you have as well as their container IDs and then you can use the RM command to remove that container ID or the RMI command to remove the named uh image that you download the named container that you downloaded so remove my container in this particular case uh this would be the one that removes the container um this is my container uh that would be the name I would assume but not the the container ID itself so um we have the logs of those containers so uh the logs of the specified container which does not mean that it’s displaying the contents of those logs um you would need to access that either with Nano or some kind of a text editor or viewer or let’s say IDs or a Sim tool something like that but essentially you can view all of the logs that are associated with that individual container meaning the authentication authorization log the the logs for any errors that may have taken place or just regular interactions that were done with that particular container you can retreat retrieve all of those via the logs command so logs of my container would be retrieved by running Docker logs my container and then you have the PS which will list all of the current running containers so this is how you would get the actual container names their IDs the statuses and any other details and then this is how you would get the ID to be able to remove it for example or to be able to get the logs from it for example so this will be the example output in this particular case I know it’s a little bit small but this is where the container ID is uh listed you have the image which is in Jinx in this case the command line entry for this which is the uh Docker entry point and it it runs much longer than that uh you have the fact that it was created 2 hours ago it’s been running for 2 hours there are no ports associated with it because it’s running on the local machine and then the name is the Serene bassi that has been associated with this in Jinx server so PSA not the public service analment but PS sa a as the listing of all of the containers including all the ones that are stopped so we had the PS that was running something that is running right so it’s going to show you the stuff that’s currently running PSA will show you everything that is stopped or uh deleted or any maybe not deleted but probably the ones that are stopped uh either paused or uh deactivated and then you would see what those uh IDs are and what their names are so on and so forth and then you can stop one that’s currently running so if if you wanted to stop it Docker stop and then give it the container ID these are very very easy commands to remember but you already have this video as a cheat sheet as well so it stops Serene Bassie right so that was in the container ID this was the name that was associated with it the container ID is this piece right here so there’s a little bit of a glitch with these instructions so my bad that’s my bad on this particular case but you you get it right so if you give it the container ID you would need to give it this information right here which is the container ID not the the nickname that has been assigned to it which is Serene bassi in this particular case um if you want to remove it by its image name then you would do RMI and then you would give it the image name which is in this particular case the in Jinx which would be the image name right here so that is the the value that we would give the RMI version which is in this case we see in jinx as our example and so that’s it Docker is the tool that helps you uh deploy and manage applications uh that are stored inside of containers and containers house the applications as well as all of their dependencies to be able to run them and we already went through all of the key commands and this is just kind of the cheat sheet for you so to be able to pull it meaning download it onto your local machine you would use Docker pull to be able to run it you would do Docker run and to remove it you would do RM or RMI you can do PS to list them PSA to list all of them including the ones that are not running running and that’s basically it as far as the docker itself is concerned you also do need to learn how to orchestrate or manage your containers uh to just make sure that uh number one they’re not past their life cycle or if they’re done being used you get rid of them especially in a large scale environment because it does take up a lot of storage to be able to do something like that so kubernetes is something that is done or is used for that for the orchestration of the uh container ERS that would be in that environment Docker swarm is another one that can help you automate deployment scaling up or scaling down management of containerized applications to make sure that they’re running when they need to they’re efficient everything is all good and then when they’re done you get rid of them uh kubernetes is a very popular one uh it’s also it’s also abbreviated as k8s so kubernets uh it’s an uh open source platform for automating the deployment scaling and management of containerized applications and it’s great for really anything um especially in a large environment so automatic deployment and scaling because it actually works with um a variety of different script templates that come into play as well as rules and policies that you can assign it so you can automatically deploy something um and say that you know for this particular uh image for this in Jinx I want you to to create 100 different versions of this and deploy those 100 different versions and it’s very very good at automatically deploying something like that you can load balance and Route traffic so just based on the traffic size of your environment and the company that you’re working with you can make sure that the physical servers don’t crash because load balancing is very directly related to that so uh balancing the load of traffic that’s coming in and routing that traffic efficiently so that the containers run smoothly but the physical infrastructure is also running and it’s available without anything crashing that’s really basically what load balancing and traffic routing is uh selfhealing is an interesting one um because if anything fails to start so it’ll just be replaced or it will be automatically restarted um if anything that isn’t working repetitively will be killed I love that it kills containers that don’t respond to userdefined health checks and it doesn’t advertise them to clients until they are ready to serve right so it’ll just do everything that it needs to do and once it’s actually ready for the person who wants to use it it’ll just say this is ready to serve but until then in the background it’ll restart something it’ll heal it if it needs to be done it’ll kill it and wipe it and then go get a version of it that’s working and then bring it up and then give it to to the client or the user so that they can actually use it and then it can help you manage the storage as well to make sure that the persistent storage is actually uh deployed as needed it’s mounted as it’s needed and this is all something that kubernetes is done beautifully uh automatically so to speak um then there’s the security portion of this so if you have sensitive information which a lot of people do like passwords and API Keys it helps you manage those things securely which basically means either it won’t display them or if it does display them it’ll be done as hash values which is basically encrypted values that look like a bunch of randomized text that nobody can make sense of and they can’t be decrypted without the key that has been assigned to the administrator or to the user so there there’s no way to decrypt them or to decode them uh because it’s just a very nice algorithm that’s been used for the encryption and then there’s a key that’s attached to it that anybody without that key would not be able to decrypt those contents Docker swarm is docker’s native clustering and orchestration Tool so it essentially does uh what kubernetes does it’s just a little bit simpler um and it helps you orchestrate and manage your containers uh especially environments that actually already are using Docker so simplified setup management um it’s integrated with Docker tools seamlessly so it actually works very well if you’re already using Docker for your particular environment um there’s an ability to scale your services up or down very similar to kubernetes um by adjusting the number of replicas as I already mentioned so you can just say I want you to create 50 versions of this 100 versions of this so on and so forth load balancing same thing that it’ll help with the network traffic so that these Services don’t crash or they don’t overexert the physical infrastructure um it’s secure by default meaning that it actually has TLS encryption which is the more advanced version of SSL encryption that typically runs on top of Port 443 for uh https for web traffic um so TLS is actually very powerful encryption standard for secure communication between nodes in the Swarm cluster which is a very fancy way of saying a variety of different containers or tools that are trying to communicate with each other inside of this massive environment so it provides secure encrypted conversation between all of these various nodes or these various different uh containers that are trying to communicate with each other and it does it seamlessly and securely so to want to do all of these things with kubernetes as an example if you want to deploy a in Jinx environment you would do kubernetes so CP CTL is the command that interacts with kubernetes you do Coupe CTL create a deployment of in Jinx and then use the image inin right so it creates a deployment named inin using the official ninx image very very simple to do uh scale it is very interesting and this can a lot of these things can also be embedded inside of scripts that can be running so you just run the script and it’ll do this for you but again scale the deployment of in Jinx and multiply that by three so it scales this inin to deployment to three replicas this deployment would be done in three replicas how easy is that that’s crazy then you wanted to get the pods so list all the pods that are running in the cluster which essentially contain these three replicas or as many replicas as you would have so you would get them you would list all of those pods contained in that cluster and then the doer version of it would be to use the Swarm initialization first so you need to use Docker to actually initialize swarm first and then that cluster would be that uh stuff that has been managed and so you would interact with that swarmed cluster so you would do service creation and then give it a web so you’re using the web name in this particular case you’re going to create three replicas it’s going to be on Port 80 and it’s going to be inin so it creates a service named web with three replicas using the inin image in this particular case and Maps the port 80 on the host to Port 80 on the container meaning that’s it’s actually what’s used for web traffic so Port 80 is for HTTP traffic so it’s only appropriate that we use port 80 for our web service that has been created with the docker swarm if you want to list all of the services you would just use the ls command which is very intuitive to Linux cuz that is the command that we use to list anything inside of Linux containers or inside of Linux directories I guess and that’s it for the container or orchestration these two tools obviously go much deeper and there is a lot of documentation and a lot of uh tutorials that are available for both Docker swarm as well as kubernetes it’s just I want you to know about them so if you wanted to do more homework and self- teing you know where to go and what you’re supposed to look for and they’re very powerful tools because they both automate the deployment of multiple replicas as they said uh of anything really so it could be a Linux machine Linux virtual machine it could be an in Jinx uh web server it could be anything that can virtually be deployed uh inside of a container and it would be done times a thousand if needed and it’s as simple as saying hey create replicas equals a th000 and then all of a sudden now you have a thousand replicas of that same container so it’s a very very powerful series of tools kubernetes and Docker swarm and there’s obviously other versions of container orchestration tools as well these are just the most popular and the most relevant to the conversations that we’ve had so I do encourage you to check into uh orchestration of containers using either Docker Docker swarm kubernetes or anything that would be similar to it because it would make you much more functional as a Linux admin administrator and overall as just a system administrator this training series is sponsored by hackaholic Anonymous to get the supporting materials for this series like the 900 page slideshow the 200 Page notes document and all of the pre-made shell scripts consider joining the agent here of hack holics Anonymous you’ll also get monthly python automations exclusive content and direct access to me by a Discord join hack Alix Anonymous today this training series is sponsored by hack alic an on to get the supporting materials for this series like the 900 page slideshow the 200 Page notes document and all of the pre-made shell scripts consider joining the agent tier of hack holics Anonymous you’ll also get monthly python automations exclusive content and direct access to me via Discord join hack alic Anonymous today

By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog

Affiliate Disclosure: This blog may contain affiliate links, which means I may earn a small commission if you click on the link and make a purchase. This comes at no additional cost to you. I only recommend products or services that I believe will add value to my readers. Your support helps keep this blog running and allows me to continue providing you with quality content. Thank you for your support!
March 3, 2025